0% found this document useful (0 votes)

84 views7 pages

Software Transactional Memory: Why Isitonlya Research Toy?

The promise of Software transactional memory may likely be undermined by its overheads. The overall performance of a highly optimized STM is much worse at low levels of parallelism. Different implementations of transactional memory systems make tradeoffs that impact both performance and programmability.

Uploaded by

Sławek Żak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views7 pages

Software Transactional Memory: Why Isitonlya Research Toy?

Uploaded by

Sławek Żak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

practice

doi:10.1145/ 1400214.1400228
and observe the overall performance of
The promise of STM may likely be undermined TM is much worse at low levels of paral-
lelism, which is likely to limit the adop-
by its overheads and workload applicabilities. tion of this programming paradigm.
Different implementations of
by Călin CAS
˛ caval, Colin Blundell, Maged Michael, transactional memory systems make
Harold W. Cain, Peng Wu, Stefanie Chiras, tradeoffs that impact both performance
and Siddhartha Chatterjee and programmability. Larus and Ra-
jwar16 present an overview of design

Software
trade-offs for implementations of trans-
actional memory systems. We summa-
rize some of the design choices here:
˲˲ Software-only (STM)7, 10, 12, 14, 18, 23, 25 is

Transactional
the focus here. While offering flexibility
and no hardware cost, it leads to over-
head in excess of most users’ tolerance.
˲˲ Hardware-only (HTM)2, 4, 9, 13, 19, 20, 35

Memory: Why
suffers from two major impediments:
high implementation and verification
costs lead to design risks too large to
justify on a niche programming model;

is it Only a
hardware capacity constraints lead to
significant performance degradation
when overflow occurs, and proposals for
managing overflows (for example, sig-

Research Toy?
natures5) incur false positives that add
complexity to the programming model.
Therefore, from an industrial perspec-
tive, HTM designs have to provide more
benefits for the cost, on a more diverse
set of workloads (with varying transac-
tional characteristics) for hardware de-
signers to consider implementation.a
˲˲ Hybrid1, 6, 24, 28 is the most likely plat-
form for the eventual adoption of TM
Transactional mem o ry (TM)13 is a concurrency by a wide audience, although the exact
control paradigm that provides atomic and isolated mix of hardware and software support
remains unclear.
execution for regions of code. TM is considered by A special case of the hybrid systems
many researchers to be one of the most promising are hardware-accelerated STMs. In this
scenario, the transactional semantics
solutions to address the problem of programming are provided by the STM, and hardware
multicore processors. Its most appealing feature is primitives are only used to speed up
that most programmers only need to reason locally critical performance bottlenecks in the
STM. Such systems could offer an at-
about shared data accesses, mark the code region to tractive solution if the cost of hardware
be executed transactionally, and let the underlying primitives is modest and may be further
amortized by other uses in the system.
system ensure the correct concurrent execution. This Independent of these implementa-
model promises to provide the scalability of fine-
grained locking while avoiding common pitfalls of a Reuse of hardware for other purposes can also
lock composition such as deadlock. In this article, we justify its inclusion, as the case may be for
Sun’s implementation of Scout Threading in
explore the performance of a highly optimized STM the Rock processor.32

40 com municatio ns o f th e acm | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

practice

Figure 1: STM operations.

STM _ BEGIN() STM _ VALIDATE()

read global version number /* gv# */ read global version number /* gv# */
if global version number changed /* gv# */
for each read set entry
if metadata changed return FALSE
return TRUE

(a) Pseudo-code for STM begin (b) Pseudo-code for STM validate

STM _ READ(A) STM _ END()

if already written goto written path lock metadata for write set
read metadata of A if already locked goto conflict path
if metadata is locked goto conflict path if ! STM _ VALIDATE() goto conflict path
log A and its metadata in the read set /* Success guaranteed */
read value at A increment global version number /* gv# */
if ! STM _ VALIDATE() goto conflict path execute writes
return val update/unlock metadata for write set

tion decisions, there are transactional ing private data. Furthermore, the non- state of the art STM runtime system and
semantics issues that break the ideal determinism introduced by aborting compiler framework, the freely avail-
transactional programming model for transactions complicates debugging— able IBM STM.31 Here, we describe this
which the community had hoped. TM transactional code may be executed and experience, starting with a discussion of
introduces a variety of programming is- aborted on conflicts, which makes it dif- STM algorithms and design decisions.
sues that are not present in lock-based ficult for the programmer to find deter- We then compare the performance of
mutual exclusion. For example, seman- ministic paths with repeatable behav- this STM with two other state of the art
tics are muddled by: ior. Both of these dilute the productivity implementations (the Intel STM14 and
˲˲ Interaction with non-transactional argument for transactions, especially the Sun TL2 STM7) as well as dissect the
codes, including access to shared data software-only TM implementations. operations executed by the IBM STM
from outside of a transaction (tolerating Given all these issues, we conclude and provide a detailed analysis of the
weak atomicity) and the use of locks in- that TM has not yet matured to the point performance hotspots of the STM.
side a transaction (breaking isolation to where it presents a compelling value
make locking operations visible outside proposition that will trigger its wide- Software Transactional Memory
transactions); spread adoption. While TM can be a STM implements all the transactional
˲˲ Exceptions and serializability: how useful tool in the parallel programmer’s semantics in software. That includes
to handle exceptions and propagate portfolio, it is our view that it is not go- conflict detection, guaranteeing the
consistent exception information from ing to solve the parallel programming consistency of transactional reads, pres-
within a transactional context, and dilemma by itself. There is evidence ervation of atomicity and isolation (pre-
how to guarantee that transactional ex- that it helps with building certain con- venting other threads from observing
ecution respects a correct ordering of current data structures, such as hash ta- speculative writes before the transac-
operations; bles and binary trees. In addition, there tion succeeds), and conflict resolution
˲˲ Interaction with code that cannot are anecdotal claims that it helps with (transaction arbitration). The pseudo-
be transactionalized, due to either com- workloads; however, despite several code for the main operations executed
munication with other threads or a re- years of active research and publication by a typical STM is illustrated in Figure
quirement barring speculation; in the area, we are disappointed to find 1. We show two STM algorithms, one
˲˲ Livelock, or the system guarantee no mentions in the research literature that performs full validation and one
that all transactions make progress of large-scale applications that make that uses a global version number (the
even in the presence of conflicts. use of TM. The STAMP30 and Lonestar17 additional statements marked with the
In addition to the intrinsic semantic benchmark suites are promising starts, gv# comment).
issues, there are also implementation- but have a long way to go to be represen- The advantage of an STM for system
specific optimizations motivated by tative of full applications. programmers is that it offers flexibility
high transactional overheads, such as We base these conclusions on our in implementing different mechanisms
programmer annotations for exclud- work over the past two years building a and policies for these operations. For

n ov e mb er 2 0 0 8 | vo l. 51 | n o. 1 1 | c om m u n ic at ion s of t he acm 41
practice

Figure 2: . Scalability results for three STM runtimes on a quad-core these overheads can become a high
Intel Xeon server: IBM, Intel STM v2, and Sun TL2. hurdle for STM to achieve performance.
The sequential overheads (that is, con-
flict-free overheads that are incurred re-
delaunay — Intel — IBM — Sun TL2
2.5 gardless of the actions of other concur-
Scalability normalized

rent threads) must be overcome by the

2
concurrency-enabling characteristics of
to sequential

1.5 transactional memory.

˲˲ Semantics: In order to avoid incur-
1
ring high STM overheads, non-transac-
0.5 tional accesses (such as loads and stores
0
occurring outside transactions) are typi-
0 2 4 8 cally not expanded. This has the effect
Threads of weakening—and hence complicat-
ing—the semantics of transactions,
kmeans
2.5 which may require the programmer
Scalability normalized

to be more careful than when strong

2
transactional semantics are supported.
to sequential

1.5 The following are some of the weakened

guarantees that are usually associated
1
with such STMs:
0.5 ˲˲ Weak atomicity: Typically the STM
runtime libraries cannot detect conflicts
0
0 2 4 8
between transactions and non-transac-
Threads
tional accesses. Thus, the semantics of
atomicity are weakened to allow unde-
vacation tected conflicts with non-transactional
2.5 accesses (referred to as weak atomic-
Scalability normalized

2 ity3), or equivalently put the burden on

to sequential

the programmer to guarantee that no

1.5
such conflicts can possibly take place.
1 ˲˲ Privatization: Some STM designs
prohibit the seamless privatization of
0.5
memory locations, that is, the transi-
0 tion from being accessed transaction-
0 2 4 8 ally to being accessed privately—or
Threads non-transactionally in general, by us-
genome ing locks. For some STM designs, once
2.5 a location is accessed transactionally,
Scalability normalized

2
it must continue to be accessed trans-
actionally. With some STM designs, the
to sequential

1.5 programmer can ease the transition by

1
guaranteeing that the first access to the
privatized location—such as after the
0.5 location is no longer accessible by other
0 threads—is transactional.
0 2 4 8 ˲˲ Memory reclamation: Some STM
Threads designs prohibit the seamless reclama-
tion of the memory locations accessed
transactionally for arbitrary reuse, such
end users, the advantage of an STM is in higher sequential overheads than tra- as using malloc and free. With such
that it offers an environment to trans- ditional shared-memory programming STM designs, memory allocation and
actionalize (that is, porting to TM) their or HTM. This is the result of the software deallocation for locations accessed
applications without incurring extra expansion of loads and stores to shared transactionally are handled differently
hardware cost or waiting for such hard- mutable locations inside transactions from other locations.
ware to be developed. to tens of additional instructions that ˲˲ Legacy binaries: STM needs to ob-
Conversely, an STM entails nontriv- constitute the STM implementation serve all memory activities of the trans-
ial drawbacks with respect to perfor- (for example, the STM_READ code in actional regions to ensure atomicity and
mance and programming semantics: Figure 1c). Depending on the transac- isolation. STMs that achieve this obser-
˲˲ Overheads: In general, STM results tional characteristics of a workload, vation by code instrumentation gener-

42 comm unicatio ns o f the acm | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

practice

Figure 3: Scalability results for manual and compiler instrumented benchmarks on AIX PowerPC with IBM XLCSTM compiler.

Genome Vacation — STM manual — STMXLC

1 1
0.8 0.8
Speedup

Speedup
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

# of threads # of threads

ally cannot support transactions calling Figure 4: Single-threaded overhead of the STM algorithms.
legacy codes that are not instrumented
(for example, third-party libraries) with- fv gv#
out seriously limiting concurrency, such 8
as by serializing transactions. 118.1 43.8 49.2
7

Evaluation
runtime (norm. to sequential)

6
Here we use the following set of bench-
5
marks:
˲˲ b+tree is an implementation of da- 4
tabase indexing operations on a b-tree
3
data structure for which the data is
stored only on the tree leaves. This im- 2
plementation uses coarse-grain trans-
1
actions for every tree operation. Each
b+ tree operation starts from the tree 0
root and descends down to the leaves. b+tree delaunay kmeans genome vacation
A leaf update may trigger a structural
modification to rebalance the tree. A
rebalancing operation often involves version 0.9.4. For a detailed description STM barely attains single thread perfor-
recursive ascent over the child-parent of these benchmarks see STAMP.30 mance at 4 threads, while on vacation
edges. In the worst case, the rebalanc- Baseline Performance. In Figure 2 we none of the STMs actually overcome the
ing operation modifies the entire tree. present a performance comparison of overhead of transactional memory even
Our workload inserts 2,048 items in a three STMs: the IBM,31, 34 Intel,14 and with 8 threads.
b+tree of order 20. For this code we have Sun’s TL27 STMs. The runs are on a Compiler Instrumentation. The com-
only a transactional version that is not quad-core, two-way hyperthreaded Intel piler is a necessary component of an
manually instrumented, therefore ex- Xeon 2.3GHz box running Linux Fedora STM-based programming environment
perimental results are presented only Core 6. In these runs, we used the manu- that is to be adopted by mass program-
in configurations where we can use our ally instrumented versions of the codes mers. Its basic role is to eliminate the
compiler to provide instrumentation; that aggressively minimize the number need for programmers to manually in-
˲˲ delaunay implements the Delaunay of barriers for the IBM and TL2 STMs. strument memory references to STM
Mesh Refinement algorithm described Since we do not have access to low-level read- and write-barriers. While offering
in Kulkarni et al.15 The code produces APIs for the Intel STM, the curves for the convenience, compiler instrumenta-
a guaranteed quality Delaunay mesh. Intel STM are from codes instrumented tion does add another layer of over-
This is a Delaunay triangulation with by its compiler, which incur additional heads to the STM system by introducing
the additional constraint that no angle barrier overheads due to compiler in- redundant barriers, often due to conser-
in the mesh be less than 30 degrees. strumentation.36 The graphs are scal- vativeness of compiler analysis, as also
The benchmark takes as input an un- ability curves with respect to the serial, observed in Yoo.36
refined Delaunay triangulation and non-transactionalized version. There- Figure 3 provides another baseline:
produces a new triangulation that sat- fore a value of 1 on the y-axis represents the overhead of compiler instrumen-
isfies this constraint. In the TM imple- performance equal to the serial version. tation. The performance is measured
mentation of the algorithm, multiple The performance of these STMs is most- on a 16-way POWER5 running AIX 5.3.
threads choose their elements from a ly on par, with the IBM STM showing For the STMXLC curve, we use the un-
work-queue and refine the cavities as better scalability on delaunay and TL2 instrumented versions of the codes
separate transactions. obtaining better scalability on genome. and annotate transactional regions and
˲˲ genome, kmeans, and vacation are However, the overall performance ob- functions using the language exten-
part of the STAMP benchmark suite19 tained is very low: on kmeans the IBM sions provided by the compiler.31

n ov e mb er 2 0 0 8 | vo l. 51 | n o. 1 1 | c om m u n ic at ion s of t he acm 43
practice

Figure 5: Percentage of time spent in different STM operations. instrumentation and provides an accu-
rate breakdown of the STM overheads.
other end malloc begin desc We study the performance of two
read free write stack_range kernel
100
STM algorithms: one that fully validates
(“fv") the read set after each transac-
90 tional read and one that uses a global
runtime (norm. to sequential)

80
version number (“gv#") to avoid the full
validation, while maintaining the cor-
70 rectness of the operations. The fv algo-
60 rithm provides more concurrency at a
much higher price. The gv# is deemed
50
as one of the best trade-offs for STM im-
40 plementations.
Figure 4 presents the single-thread-
30
ed overhead of these algorithms over
20 sequential runs, illustrating again the
substantial slowdowns that the algo-
10
rithms induce. Figure 5 breaks down
0 fv gv# fv gv# fv gv# fv gv# fv gv#
these overheads into the various STM
b+tree delaunay kmeans genome vacation components. For both algorithms, the
overhead of transactional reads domi-
nates due to the frequency of read op-
Figure 6: Percentage of time spent in STM read sub-operations. erations relative to all other operations.
The effectiveness of the global version
return add metadata to read set check read after write number in reducing overheads is shown
validate check if metadata is locked setup
sync read metadata call
in the lower read overhead of “gv#.”
read data calculate metadata other Figure 6 gives a fine-grain breakdown
100
of the overheads of the transactional
90 read operation. As expected, the over-
head of validating the read set domi-
80
% of cycles (norm. to fv)

nates transactional read time in the “fv”

70 configuration. For both algorithms, the
isync operations (necessary for ordering
60
the metadata read and data read as well
50 as the data read and validation) form a
substantial component. In applications
40
that perform writes before reads in the
30 same transaction (delaunay, kmeans),
20
the time spent checking whether a loca-
tion has been written by prior writes in
10 the same transaction forms a significant
0
component of the total time. Interest-
fv gv# fv gv# fv gv# fv gv# fv gv#
b+tree delaunay kmeans genome vacation ingly, reading the data itself is a negligi-
ble amount of the total time, indicating
the hurdles that must be overcome for
Compiler over-instrumentation is STM Operations Performance. Given the performance of these algorithms to
more pronounced in traditional, un- this baseline, we now analyze in detail be compelling.
managed languages, such as C and C++, which operations in the STM cause the Figure 7 gives a similar breakdown
where a compiler instrumentation with- overhead. For this purpose, we use a of the transactional commit operation.
out interprocedural analysis may end cycle-accurate simulator of the Power- As before, the “fv" configuration suf-
up instrumenting every memory refer- PC architecture that provides hooks for fers from having to validate the read set.
ence in the transactional region (except instrumentation. The STM operations Other dominant overheads for both con-
for stack accesses). Indeed, our compil- and suboperations are instrumented figurations are that of having to acquire
er instrumentation more than doubled with these simulator hooks. The reason the metadata for the write set (which in-
the number of dynamic read barriers in for this environment is that we want volves a sequence of load-linked/store-
delaunay, genome, and kmeans. Interpro- to capture the overheads at instruc- conditional operations) and the sync
cedural analysis can help improve the tion level and eliminate any other non- operations that are necessary for order-
tightness of compiler instrumentation determinism introduced by real hard- ing the metadata acquires, data writes,
for some cases, but is generally limited ware. The simulator eliminates all other and metadata releases. Once again, the
by the accuracy of global analysis. bookkeeping operations introduced by data writes themselves form a small

44 comm unicatio ns o f the ac m | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

practice

component of the total time. posed STM systems.7 Conflict detection hence this scheme is amenable for use
Overhead Optimizations. There have is simplified significantly by the static in environments where memory man-
been many proposals on reducing STM nature because conflicts can be ruled agement is explicit.
overheads through compiler or runtime out already when ownership records are Recent work explored algorithmic
techniques, most of which are comple- acquired (at transaction start). optimizations and/or alternative imple-
mentary to STM hardware acceleration. DSTM12 is the first dynamic STM mentations of the basic STM algorithms
˲˲ Redundant barrier elimination. One system; the design follows a per-object described here. Riegel et al. propose the
technique is to eliminate barriers to runtime organization (locator object). use of real-time clocks to enhance the
thread-local objects through escape Variables (objects) in the application STM scalability using a global version
analysis. Such analysis is typically quite heap refer to a locator object. Unlike number.22 JudoSTM21 and RingSTM29 re-
effective identifing thread-local access- in a design with ownership records (for duce the number of atomic operations
es that are close to the object allocation example, Harris and Fraser10), the loca- that must be performed when commit-
site. It can eliminate both read- and tor does not store a version number but ting a transaction at the cost of serial-
write-barriers, but is often more effec- refers to the most recently committed izing commit and/or incurring spurious
tive on write-barriers. For example, we version of the object. A particularity of aborts due to imprecise conflict detec-
observe that an intra-procedural escape the DSTM design is that objects must be tion. Several proposals have been made
analysis can eliminate 40–50% of write explicitly ‘opened’ (in read-only or read- for STMs that operate via dynamic bina-
barriers in vacation, genome, and b+tree. write mode) before transactional access; ry rewriting in order to allow the usage
However, its impact on performance is also DSTM allows for early release. The of STM on legacy binaries.8, 21, 33
more limited: from negligible to 12%. authors argue that both mechanisms fa- Yoo et. al36 analyze the overhead in
To target redundant read-barriers, a cilitate the reduction of conflicts. the execution of Intel’s STM.14, 23 They
whole-program analysis called Not-Ac- The design principles of the RSTM18 identify four major sources of overhead:
cessed-In-Transaction analysis27 elimi- system are similar to DSTM in that it as- over-instrumentation, false sharing,
nates some barriers to read-only objects sociates transactional metadata with ob- amortization costs, and privatization-
in transactions; jects. Unlike DSTM however, the system safety costs. False sharing, privatiza-
˲˲ Barrier strength reduction. These op- does not require the dynamic allocation tion-safety, and over-instrumentation
timizations do not eliminate barriers, of transactional data but co-locates it are implementation artifacts that can
but identify at runtime special locations with the non-transactional data. This be eliminated by either using finer
that require only lightweight barrier scheme has two benefits: first, it facili- granularity bookkeeping, more refined
processing, such as dynamic tracking of tates spatial access locality and hence analysis, or user annotations. Amortiza-
thread-local objects11, 27 and runtime fil- fosters execution performance and tion costs are inherent overheads in an
tering of stack references and duplicate transaction throughput. Second, the dy- STM that, as we demonstrated here, are
references;11 namic memory management of trans- not likely to be eliminated.
˲˲ Code generation optimizations. One actional data (usually done through a A large amount of research effort
common technique is to inline the fast garbage collector) is not necessary and has been spent in analyzing the opera-
path of barriers. It has the potential
benefit of reducing function call over- Figure 7: Percentage of time spent in STM end sub-operations.
head, increasing ILP, and exposing re-
return write data check for read-only
use of common sub-barrier operations.
cleanup transactional state validate setup
In our experiments, compiler inlining release metadata sync call
achieved less than 2% overall improve- increment gv# acquire metadata other

ment across our benchmark suite;

˲˲ Commit sequence optimizations. 100

Eliminating unnecessary global version 90

number updates37 improves the overall
performance of several micro-bench- 80
% of cycles (norm. to fv)

marks by up to 14%. 70
Such optimizations have a positive
impact on STM performance. However, 60

the results presented here indicate how 50

much further innovation is needed for
the performance of STMs to become 40

generally appealing to users. 30

Related Work 20

The first STM system was proposed by 10

Shavit and Touitou26 and is based on
0
object ownership. The protocol is static, fv gv# fv gv# fv gv# fv gv# fv gv#
b+tree delaunay kmeans genome vacation
which is a significant shortcoming that
has been overcome by subsequently pro-

n ov e mb er 2 0 0 8 | vo l. 51 | n o. 1 1 | c om m u n ic at ion s of t he acm 45
practice

Symposium on Computer Architecture. ACM, NY, 2007. Symposium on Principles and Practice of Parallel
tions in TM systems. Recent software 3. Blundell, C., Lewis, C., and Martin, M.M.K. Subtleties Programming. Mar. 2006, ACM, NY, 187–197.
optimizations have managed to accel- of transactional memory atomicity semantics. IEEE 24. Saha, B., Adl-Tabatabai, A.R., and Jacobson, Q.
TCCA Computer Architecture Letters 5, 2 (Nov 2006). Architectural support for software transactional
erate STM performance by 2%–15%. We 4. Bobba, J., Goyal, N., Hill, M.D., Swift, M.M., and Wood, memory. In Proceedings of the 39th Annual
D.A. TokenTM: Efficient execution of large transactions International Symposium on Microarchitecture. Dec.
believe such analysis is a good practice with hardware transactional memory. In Proceedings 2006, 185–196.
that should be extended to every piece of the 35th International Symposium on Computer 25. Shavit, N., and Touitou, D. Software Transactional
Architecture. IEEE Computer Society, Washington, Memory. In Proceedings of the ACM Symposium of
of system software, especially open D.C., 2008, 127–138. Principles of Distributed Computing. ACM, 1995.
source. However, the gains are only a mi- 5. Ceze, L., Tuck, J., Cascaval, C., Torrellas, J. 26. Shavit, N. and Touitou, D. Software transactional
Bulk disambiguation of speculative threads in memory. In Proceedings of the 14th ACM Symposium
nor dent in the overheads we observed, multiprocessors. In Proceedings of the 34th Annual on Principles of Distributed Computing. ACM, NY, 1995.
indicating the challenge that lies before International Symposium on Computer Architecture. 27. Shpeisman, T., Menon, V., Adl-Tabatabai, A-R.,
ACM, NY, 2006, 237–238. Balensiefer, S., Grossman, D., Hudson, R., Moore, K.F.,
the community in making STM perfor- 6. Damron, P., Federova, A., Lev, Y., Luchangco, V., Moir, and Saha, B. Enforcing isolation and ordering in STM.
mance compelling. M., and Nussbaum, D. Hybrid transactional memory. In Proceedings of Proceedings of the Programming
In Proceedings of the 12th International Conference Language Design and Implementation Conference.
on Architectural Support for Programming Languages ACM, 2007, 78–88.
Conclusion and Operating Systems, Oct. 2006. 28. Shriraman, A., Spear, M.F., Hossain, H., Marathe,
7. Dice, D., Shalev, O., and Shavit, N. Transactional V.J., Dwarkadas, S., and Scott, M.L. An integrated
Based on our results, we believe that the Locking II. DISC, Sept. 2006, 194–208. hardware-software approach to flexible transactional
road ahead for STM is quite challeng- 8. Felber, P., Fetzer, C., Mueller, U., Riegel, T., Suesskraut, memory. In Proceedings of the 34th Annual
M., and Sturzrehm, H. Transactifying applications International Symposium on Computer Architecture.
ing. Lowering the overheads of STM to using an open compiler framework. In Proceedings ACM, NY, 2007, 104–115.
a point where it is generally appealing of the ACM SIGPLAN Workshop on Transactional 29. Spears, M.T., Michael, M.M., and von Praum, C.
Computing. Aug. 2007. Ringstm: Scalable transactions with a single
is a difficult task and significantly bet- 9. Hammond, L., Wong, V., Chen, M., Carlstrom, B.D., atomic instruction. In Proceedings of the 20th
Davis, J.D., Hertzberg, B., Prabhu, M.K., Wijaya, H., ACM Symposium on Parallelism in Algorithms and
ter results have to be demonstrated. If Kozyrakis, C., and Olukotun, K. Transactional memory Architectures. ACM, NY, 275–284.
we could stress a single direction for coherence and consistency. In Proceedings of the 30. STAMP benchmark; http://stamp.stanford.edu/ (2007).
31st Annual International Symposium on Computer 31. (IBM) XL C/C++ for Transactional Memory for AIX;
further research, it is the elimination of Architecture. IEEE Computer Society, June 2004, 102. www.alphaworks.ibm.com/tech/xlcstm/ (2008).
dynamically unnecessary read and write 10. Harris, T. and Fraser, K. Language support for 32. Tremblay, M. and Chaudhry, S. A third generation
lightweight transactions. In Proceedings of Object- 65nm 16-core 32-thread plus 32-scout-thread CMT.
barriers—possibly the single most pow- Oriented Programming, Systems, Languages, and In Proceedings of the IEEE International Solid-State
erful lever toward further reduction of Applications. Oct. 2003, 388–402. Circuits Conference. Feb. 2008.
11. Harris, T., Plesko, M., Shinnar, A., and Tarditi, D. 33. Wang, C. Chein, W-Y, Wu, Y., Saha, B., and Adl-
STM overheads. However, given the dif- Optimizing memory transactions. In Proceedings Tabatabai, A.R. Code generation and optimization for
ficulty of similar problems explored by of the Programming Language Design and transactional memory constructs in an unmanaged
Implementation Conference. 2003, 388–402. language. In Proceedings of International Symposium
the research community such as alias 12. Herlihy, M., Luchangco, V., Moir, M., and Scherer III, on Code Generation and Optimization. 2007, 34–48.
analysis, escape analysis, and so on, this W.N. Software transactional memory for dynamic- 34. Wu, P., Michael, M.M., von Praun, C., Nakaike, T.,
sized data structures. In Proceedings of the 22nd ACM Bordawekar, R., Cain, H.W., Cascaval, C., Chatterjee,
may be an uphill battle. And because Symposium on Principles of Distributed Computing. S., Chiras, S., Hou, R., Mergen, M., Shen, X., Spear,
the argument for TM hinges upon its July 2003, 92–101. M.F., Wang, H.Y., and Wang, K. Compiler and
13. Herlihy, M. and Moss, J.E.B. Transactional memory: runtime techniques for software transactional
simplicity and productivity benefits, we Architectural support for lock-free data structures. memory optimization. To appear in Concurrency and
are deeply skeptical of any proposed so- In Proceedings of the 20th Annual International Computation: Practice and Experience, 2008.
Symposium on Computer Architecture. May 1993. 35. Yen, L., Bobba, J., Marty, M.M., Moore, K.E., Volos,
lutions to performance problems that 14. Intel C++ STM compiler, prototype edition 2.0.; http:// H., Hill, M.D., Swift, M.M., and Wood, D.A. LogTM-SE:
softwarecommunity.intel.com/articles/eng/1460.htm/ Decoupling hardware transactional memory from
require extra work by the programmer. (2008). caches. In Proceedings of the 13th International
We observed that the TM program- 15. Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Symposium on High-Performance Computer
Bala, K., and Chew, P.L. Optimistic parallelism requires Architecture. Feb 2007.
ming model itself, whether implement- abstractions. In Proceedings of the PLDI 2007. ACM, 36. Yoo, R.M., Ni, Y., Welc, A., Saha, B. Adl-Tabatabai,
ed in hardware or software, introduces NY, 2007, 211–222. A-R. and Lee, H-H.S. Kicking the tires of software
16. Larus, J.R., and Rajwar, R. Transactional Memory. transactional memory: why the going gets tough.
complexities that limit the expected Morgan Claypool, 2006. Proceedings of the 20th Annual ACM Symposium on
productivity gains, thus reducing the 17. The Lonestar benchmark suite; http://iss.ices.utexas. Parallelism in Algorithms and Architectures, 2008.
edu/lonestar/ (2008). 37. Zhang, R., Budimlić, Z. and Scherer III, W.N. Commit
current incentive for migration to trans- 18. Marathe, V.J., Spear, M.F., Heriot, C., Acharya, A., phase in timestamp-based STM. In Proceedings of the
actional programming, and the justifi- Eisenstat, D., Scherer III, W.N., and Scott, M.L. 20th Annual Symposium on Parallelism in Algorithms
Lowering the overhead of software transactional and Architectures. ACM, NY, 326–335.
cation at present for anything more than memory. Technical Report TR 893, Computer Science
a small amount of hardware support. Department, University of Rochester, Mar 2006.
Condensed version submitted for publication. Călin Ca˛scaval ([email protected]) is a Research
19. Minh, C.C., Trautmann, M., Chung, J., McDonald, A., Staff Member and Manager of Programming Models and
Acknowledgments Bronson, N., Casper, J., Kozyrakis, C., and Olukotun, K. Tools for Scalable Systems at IBM TJ Watson Research
An effective hybrid transactional memory system with Center, Yorktown Heights, NY.
We would like to thank Pratap Pattnaik strong isolation guarantees. In Proceedings of the
for his continuous support, Christoph 34th Annual International Symposium on Computer Colin Blundell is a member of the Architecture
Architecture. ACM, NY, 2007, 69–80. and Compilers Group, Department of Computer and
von Praun for numerous discussions, 20. Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., and Information Science, University of Pennsylvania.
Wood, D.A. LogTM: Log-based transactional memory.
work on benchmarks and runtimes, In Proceedings of the 12th Annual International Maged Michael is a Research Staff Research Member at
and Rajesh Bordawekar for the B+tree Symposium on High Performance Computer IBM TJ Watson Research Center, Yorktown Heights, NY.
Architecture, Feb 2006.
code implementation. 21. Olszewski, M., Cutler, J., Steffan, J.G. Judostm: A Trey Cain is a Research Staff Member at IBM TJ Watson
dynamic binary-rewriting approach to software Research Center, Yorktown Heights, NY.
transactional memory. In Proceedings of the 16th
References International Conference on Parallel Architecture Peng Wu is a Research Staff Member at IBM TJ Watson
1. Baugh, L., Neelakantam, N., and Zilles, C. Using and Compilation Techniques. 2007. IEEE Computer Research Center, Yorktown Heights, NY.
hardware memory protection to build a high- Society, Washington D.C., 365-375.
performance, strongly-atomic hybrid transactional 22. Riegel, T., Fetzer, C., and Felber, P. Time-based Stefanie Chiras is a manager in IBM's Systems and
memory. In Proceedings of the 35th International transactional memory with scalable time bases. Technology Group.
Symposium on Computer Architecture. IEEE In Proceedings of the 19th ACM Symposium on
Computer Society, Washington, DC, 2008, 115–126. Parallelism in Algorithms and Architectures, 2007. Siddhartha Chatterjee is director of the Austin Research
2. Blundell, C., Devietti, J., Lewis, E.L., Martin, M.M.K. 23. Saha, B., Adl-Tabatabai, A.R., Hudson, R.L., Minh, C.C., Laboratory, IBM Research, Austin, TX.
Making the fast case common and the uncommon and Hertzberg, B. Mcrt-stm: A high performance
case simple in unbounded transactional memory. software transactional memory system for a
In Proceedings of the 34th Annual International multi-core runtime. In Proceedings of the 11th ACM © 2008 ACM 0001-0782/08/1100 $5.00

46 communicatio ns o f th e ac m | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

Transactional Memory Book
100% (1)
Transactional Memory Book
226 pages
ZMM Nfe Gecex Fcreate
No ratings yet
ZMM Nfe Gecex Fcreate
11 pages
Thread and Data Mapping in STM
No ratings yet
Thread and Data Mapping in STM
14 pages
Compiler
No ratings yet
Compiler
12 pages
Stmbench7 Report
No ratings yet
Stmbench7 Report
17 pages
Designing An Analytical Framework For Software Transactional Memory ICODC 2010
No ratings yet
Designing An Analytical Framework For Software Transactional Memory ICODC 2010
8 pages
Transactional Locking II
No ratings yet
Transactional Locking II
15 pages
Transactional Memory PHD Thesis
100% (3)
Transactional Memory PHD Thesis
7 pages
2007 - PHTM Phased Transactional Memory
No ratings yet
2007 - PHTM Phased Transactional Memory
11 pages
Pulsating STM - The In-Memory Optimistic Concurren
No ratings yet
Pulsating STM - The In-Memory Optimistic Concurren
7 pages
Investigation of Hardware Transactional Memory - 2015 (Andrew-Nguyen-Thesis)
No ratings yet
Investigation of Hardware Transactional Memory - 2015 (Andrew-Nguyen-Thesis)
47 pages
Transactional Memory: David Chisnall
No ratings yet
Transactional Memory: David Chisnall
21 pages
Unlocking Concurrency: Computer Architecture
No ratings yet
Unlocking Concurrency: Computer Architecture
10 pages
Programming Assignment #2 STM Treap: Due Tue, Feb 1, 11:59PM
No ratings yet
Programming Assignment #2 STM Treap: Due Tue, Feb 1, 11:59PM
6 pages
FAST2021
No ratings yet
FAST2021
15 pages
Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit
No ratings yet
Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit
64 pages
Mvto Icdcn2014
No ratings yet
Mvto Icdcn2014
15 pages
STM User Guide
No ratings yet
STM User Guide
84 pages
Fast21 Castro
No ratings yet
Fast21 Castro
16 pages
Fast Read Sharing Mechanism For Software Transactional Memory
No ratings yet
Fast Read Sharing Mechanism For Software Transactional Memory
6 pages
Software Transactional Memory Introductory Paper
No ratings yet
Software Transactional Memory Introductory Paper
18 pages
Transactional Memory 1st Edition James Larus Instant Download
No ratings yet
Transactional Memory 1st Edition James Larus Instant Download
52 pages
Transactional Memory 2e PDF
100% (2)
Transactional Memory 2e PDF
265 pages
Transactional Memory: Architectural Support For Lock-Free Data Structures
No ratings yet
Transactional Memory: Architectural Support For Lock-Free Data Structures
34 pages
TCC Thesis BDC Defense
No ratings yet
TCC Thesis BDC Defense
51 pages
Transactional Memory in Practice - Brett Hall - CppCon 2015
No ratings yet
Transactional Memory in Practice - Brett Hall - CppCon 2015
62 pages
Design Issues SMT and CMP Architectures
No ratings yet
Design Issues SMT and CMP Architectures
25 pages
E ?111 Vu ' A : (E? If"? R'L?G'F 2004/0260972 A1 12/2004 Jietal
No ratings yet
E ?111 Vu ' A : (E? If"? R'L?G'F 2004/0260972 A1 12/2004 Jietal
21 pages
Written Asst5
No ratings yet
Written Asst5
29 pages
Dynamic Simultaneous Multithreaded Architecture
No ratings yet
Dynamic Simultaneous Multithreaded Architecture
11 pages
Early Experiences On Accelerating Dijkstra's Algorithm Using Transactional Memory
No ratings yet
Early Experiences On Accelerating Dijkstra's Algorithm Using Transactional Memory
8 pages
A Hardware/Software Framework For Supporting Transactional Memory in A Mpsoc Environment
No ratings yet
A Hardware/Software Framework For Supporting Transactional Memory in A Mpsoc Environment
8 pages
Herlihy 93 Transactional
No ratings yet
Herlihy 93 Transactional
12 pages
Transactional Memory: Architectural Support For Lock-Free Data Structures
No ratings yet
Transactional Memory: Architectural Support For Lock-Free Data Structures
12 pages
Power8 TM
No ratings yet
Power8 TM
58 pages
Technical Foundations of Emulation: Definitive Reference for Developers and Engineers
From Everand
Technical Foundations of Emulation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Atmel Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming Atmel Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Jiong 2009
No ratings yet
Jiong 2009
6 pages
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
From Everand
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
No ratings yet
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
4 pages
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
5 Software - Architectures - Detailed - PPT
No ratings yet
5 Software - Architectures - Detailed - PPT
12 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
ACA Unit 4
No ratings yet
ACA Unit 4
27 pages
Comprehensive Study of Conict Resolution Policies
No ratings yet
Comprehensive Study of Conict Resolution Policies
23 pages
Cloud-Native Transactions and Analytics in SingleStore
No ratings yet
Cloud-Native Transactions and Analytics in SingleStore
13 pages
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
From Everand
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concurrency Analysis Report
No ratings yet
Concurrency Analysis Report
42 pages
Deploying Scalable Systems with Nomad: Definitive Reference for Developers and Engineers
From Everand
Deploying Scalable Systems with Nomad: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Implementation of Artificial Neural Network and Fuzzy Logic For Concurrency Control in CAD Data Base
No ratings yet
Implementation of Artificial Neural Network and Fuzzy Logic For Concurrency Control in CAD Data Base
6 pages
Shared Memory Synchronization
No ratings yet
Shared Memory Synchronization
223 pages
Object Replication For: Make Your Ajax Web Application Clustered Application Smart Client Simpler, Faster, More Reliable
No ratings yet
Object Replication For: Make Your Ajax Web Application Clustered Application Smart Client Simpler, Faster, More Reliable
99 pages
HP-UX 11i Knowledge-on-Demand: Performance Optimization Best-Practices From Our Labs To You
No ratings yet
HP-UX 11i Knowledge-on-Demand: Performance Optimization Best-Practices From Our Labs To You
12 pages
Ut Be 1
No ratings yet
Ut Be 1
70 pages
Micro08 Notary
No ratings yet
Micro08 Notary
12 pages
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
Qn:Explain Different Latency Hiding Techniques /mechanisms? (Ans:Describe Sections 6.1.2,6.1.3, 6.1.5, 6.2.2.)
No ratings yet
Qn:Explain Different Latency Hiding Techniques /mechanisms? (Ans:Describe Sections 6.1.2,6.1.3, 6.1.5, 6.2.2.)
28 pages
(2023) PARALLELC-ASSIST - Productivity Accelerator Suite Based On Dynamic Instrumentation
No ratings yet
(2023) PARALLELC-ASSIST - Productivity Accelerator Suite Based On Dynamic Instrumentation
15 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
Unit 1 - Part - 3
No ratings yet
Unit 1 - Part - 3
29 pages
EMC VMAX Gatekeepers
No ratings yet
EMC VMAX Gatekeepers
5 pages
CDMA Security
100% (1)
CDMA Security
13 pages
Sahabpreet C++
No ratings yet
Sahabpreet C++
36 pages
Solving Transportation Problem Using Object Oriented Programming
No ratings yet
Solving Transportation Problem Using Object Oriented Programming
9 pages
Introduction To Motorola 68HC11: 1.1 Objectives
No ratings yet
Introduction To Motorola 68HC11: 1.1 Objectives
36 pages
Information Theory & Coding: Understand
No ratings yet
Information Theory & Coding: Understand
126 pages
Internal Marks Assessment Sy Stem: Master of Computer Applications
100% (2)
Internal Marks Assessment Sy Stem: Master of Computer Applications
24 pages
Bluetooth Low Energy BLED112 - Overview
No ratings yet
Bluetooth Low Energy BLED112 - Overview
15 pages
Sysview Install ENU
No ratings yet
Sysview Install ENU
190 pages
Type Object Pattern
No ratings yet
Type Object Pattern
16 pages
Powerbuilder Intrview Q and A: Argument Description
No ratings yet
Powerbuilder Intrview Q and A: Argument Description
137 pages
Interview Questions: Assembly Line Operation
No ratings yet
Interview Questions: Assembly Line Operation
1 page
MDGP WhitePaper Performance
No ratings yet
MDGP WhitePaper Performance
37 pages
Requirement Analysis
No ratings yet
Requirement Analysis
3 pages
HTML & DHTML
No ratings yet
HTML & DHTML
187 pages
Module-1 1
No ratings yet
Module-1 1
23 pages
Conversion From String To Date
No ratings yet
Conversion From String To Date
6 pages
(AS/RS) Are Storage Systems Capable of Providing Random Access To All Stored Items
No ratings yet
(AS/RS) Are Storage Systems Capable of Providing Random Access To All Stored Items
8 pages
Dell India - Services - Escalation Matrix
No ratings yet
Dell India - Services - Escalation Matrix
3 pages
How To Trigger Jobs in ECC From SAP
No ratings yet
How To Trigger Jobs in ECC From SAP
10 pages
Data Communications: Course Teacher: Md. Firoz Mridha Assistant Professor University of Asia Pacific
No ratings yet
Data Communications: Course Teacher: Md. Firoz Mridha Assistant Professor University of Asia Pacific
72 pages
What Is The Difference Between Hackers and Intruders
No ratings yet
What Is The Difference Between Hackers and Intruders
3 pages
Install Multikeys x64 Win8
No ratings yet
Install Multikeys x64 Win8
3 pages
GDB Book
No ratings yet
GDB Book
746 pages
Proposal Draft 2
No ratings yet
Proposal Draft 2
7 pages
Question Paper Code: 57236: Cseannauniv - Blogspot.in
No ratings yet
Question Paper Code: 57236: Cseannauniv - Blogspot.in
3 pages
Answer Shruti
No ratings yet
Answer Shruti
6 pages
141-Online Human Resource Management System - Synopsis
No ratings yet
141-Online Human Resource Management System - Synopsis
6 pages
Dm8 Modbus User Instructions: Meter Reading and Writing Parameter
No ratings yet
Dm8 Modbus User Instructions: Meter Reading and Writing Parameter
2 pages

Software Transactional Memory: Why Isitonlya Research Toy?

Uploaded by

Software Transactional Memory: Why Isitonlya Research Toy?

Uploaded by

practice

40 com municatio ns o f th e acm | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

Figure 1: STM operations.

STM _ BEGIN() STM _ VALIDATE()

STM _ READ(A) STM _ END()

rent threads) must be overcome by the

1.5 transactional memory.

to be more careful than when strong

1.5 The following are some of the weakened

2 ity3), or equivalently put the burden on

the programmer to guarantee that no

1.5 programmer can ease the transition by

42 comm unicatio ns o f the acm | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

Genome Vacation — STM manual — STMXLC

nates transactional read time in the “fv”

44 comm unicatio ns o f the ac m | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

ment across our benchmark suite;

Eliminating unnecessary global version 90

the results presented here indicate how 50

generally appealing to users. 30

The first STM system was proposed by 10

46 communicatio ns o f th e ac m | Nov e m ber 2008 | vo l . 5 1 | n o. 1 1

You might also like