0% found this document useful (0 votes)
206 views51 pages

TCC Thesis BDC Defense

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 51

Programming with Transactional Memory

Brian D. Carlstrom

Computer Systems Laboratory


Stanford University
http://tcc.stanford.edu
The Problem: “The free lunch is over”
Chip manufacturers have switched from making faster uniprocessors to
adding more processor cores per chip
 Software developers can no longer just hope that the next
generation of processor will make their program faster
10000

??%/year?
Uniprocessor
1000 Performance
Performance (vs. VAX-11/780)

Trends
(SPECint) 52%/year

100

10
25%/year

1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, Sept. 15, 2006
Programming with Transactional Memory 2
Parallel Programming for the Masses?
Every programmer is now a parallel programmer
 The black arts now need to be taught to undergraduates
Year Microprocessor Proc/chip Thread/proc Thread/chip
• IBM and Sun 2004 IBM POWER5 2 2 4
went multi- 2005 Azul Vega 1 24 1 24
core first on 2005 Sun Niagara 1 8 4 32
the server 2005 AMD Opteron 2 1 2
side 2006 Intel Woodcrest 2 2 4

• AMD/Intel 2006 Intel Barcelona 4 1 4


2006 Azul Vega 2 48 1 48
now in core
2007 AMD Barcelona 4 1 4
count race for
2007 Sun Niagara 2 8 8 64
laptops, 2008 Intel 4 2 8
desktops, and 2009 AMD 8 1 8
servers 2009 Intel 8 2 16

Programming with Transactional Memory 3


What Makes Parallel Programming Hard?
Typical parallel program
 Single memory shared by multiple program threads
 Need to coordinate access to memory shared b/w threads
 Locks allow temporary exclusive access to shared data

Lock granularity tradeoff


 Coarse grained locks - contention, lack of scaling, …
 Fine grained locks - excessive overhead, deadlock,…

Apparent tradeoff between correctness and performance


 Easier to reason about only a few locks…
 … but only a few locks can lead to contention

Programming with Transactional Memory 4


Transactional Memory to the Rescue?
Transactional Memory
 Replaces waiting for locks with concurrency
 Allows non-conflicting updates to shared data
 Shown to improve scalability of short critical regions

Promise of Transactional Memory


 Program with coarse transactions
 Performance like fine grained lock

Focus on correctness, tune for performance


 Easier to reason about only a few transactions…
 … only focus on areas with true contention

Programming with Transactional Memory 5


Thesis and Contributions
Thesis:
If transactional memory is to make parallel programming
easier, rather than just more scalable, the programming
interface requires more than simple atomic transactions

To support this thesis I will:


• Show why lock based programs cannot be simply
translated to a transactional memory model
• Present the design of Atomos, a parallel programming
language designed for transactional memory
• Show how Atomos can support semantic concurrency
control, allowing programs with coarse transactions to
perform competitively with fine-grained transactions.

Programming with Transactional Memory 6


Overview
Motivation and Thesis
 How to make parallel programming of chip multiprocessors
easier using transactional memory
Transactional Memory
 Concepts, implementation, environment
JavaT [SCP 2006]
 Executing Java programs with Transactional Memory
Atomos [PLDI 2006]
 A transactional programming language
Semantic concurrency control [PPoPP 2007]
 Improving scalability of applications with long transactions

Programming with Transactional Memory 7


Locks versus Transactions
Lock Transaction
... ...
synchronized (lock) { atomic {
x = x + y; x = x + y;
} }
... ...

Mapping from lock to protected data Transaction protects all data


 lock protects x  No need to worry if another lock
is necessary to protect y

Programming with Transactional Memory 8


Transactional Memory at Runtime
What if transactions modify the same data?
 First commit causes other transactions to abort & restart
 Can provide programmer with useful feedback!

Transaction A

LOAD X Transaction B

Time STORE X Violation! LOAD X Original Code:


STORE X ... = X + Y;
Commit X
X = ...
LOAD X
Re-execute
with new
data STORE X

Programming with Transactional Memory 9


Transactional Memory Related Work
Transactional Memory
 Transactional Memory: Architectural Support for Lock-Free Data
Structures [Herlihy & Moss 1993]
 Software Transactional Memory [Shavit & Touitou 1995]
Database
 Transaction Processing [Gray & Reuter 1993]
4.7) Nested transactions [Moss 1981]
4.9) Multi-level transactions [Weikum & Schek 1984]
4.10) Open nesting [Gray 1981]
16.7.3) Commit and abort handlers [Eppinger et al. 1991]
Recent Transactional Memory
 Language support for lightweight txs [Harris & Fraser 2003]
 Exceptions and side-effects in atomic blocks [Harris 2004]
 Open nesting in STM [Ni et al. 2007]

Programming with Transactional Memory 10


Hardware Environment

Chip Multiprocessor
 up to 32 CPUs
Bus Arbiters
 write-back L1
 shared L2 CPU 1 CPU 2 CPU N
 x86 ISA
...
L1 L1 L1

Lock evaluation Bus & Snoop Control Bus & Snoop Control Bus & Snoop Control

 MESI protocol
Commit Bus

TM evaluation Refill Bus


 L1 buffers speculative data On-chip L2 Cache
 Bus snooping detects data
dependency violations Changes for TM support

Programming with Transactional Memory 11


Software Environment
Virtual Machine
 IBM’s Jikes RVM (Research Virtual Machine) 2.4.2+CVS
 GNU Classpath 0.19

HTM extensions
 VM_Magic methods converted by JIT to HTM primitives

Polyglot
 Translate language extensions to VM_Magic calls

Programming with Transactional Memory 12


Overview
Motivation and Thesis
 How to make parallel programming of chip multiprocessors
easier using transactional memory
Transactional Memory
 Concepts, implementation, environment
JavaT [SCP 2006]
 Executing Java programs with Transactional Memory
Atomos [PLDI 2006]
 A transactional programming language
Semantic concurrency control [PPoPP 2007]
 Improving scalability of applications with long transactions

Programming with Transactional Memory 13


JavaT: Transactional Execution of Java Programs

Goals
 Run existing Java programs using transactional memory
 Require no new language constructs
 Require minimal changes to program source
 Compare performance of locks and transactions
Non-Goals
 Create a new programming language
 Add new transactional extensions
 Run all Java programs correctly without modification

Programming with Transactional Memory 14


JavaT: Rules for Translating Java to TM
Three rules create transactions in Java programs
 synchronized defines a transaction
 volatile references define transactions
 Object.wait performs a transaction commit
Allows supports execution of a variety of programs:
 Histogram based on our ASPLOS 2004 paper
 STM benchmarks from Harris & Fraser, OOPSLA 2003
 SPECjbb2000 benchmark
 All of Java Grande (5 kernels and 3 applications)
Performance comparable or better in almost all cases

Many developers already believe that synchronized


means atomic, as opposed to mutual exclusion!

Programming with Transactional Memory 15


JavaT: Defining transactions with synchronized

synchronized blocks define transactions


public static void main (String args[]){
a(); a(); // non-transactional
synchronized (x){ BeginNestedTX();
b(); b(); // transactional
} EndNestedTX();
c(); c(); // non-transactional
}

We use closed nesting for nested synchronized blocks


public static void main (String args[]){
a(); a(); // non-transactional
synchronized (x){ BeginNestedTX();
b1(); b1(); // transaction at level 1
synchronized (y) { BeginNestedTX();
b2(); b2(); // transaction at level 2
} EndNestedTX();
b3(); b3(); // transaction at level 1
} EndNestedTX();
c(); c(); // non-transactional
}

Programming with Transactional Memory 16


JavaT: Alternative to rollback on wait
JavaT rules say that Object.wait commits transaction
 Other proposals rollback on wait (or prohibit side effects)
• C.A.R. Hoare’s Conditional Critical Regions (CCRs)
• Harris’s retry keyword
• Welc et al.’s Transactional Monitors

Rollback handles one common pattern of condition variables


sychronized (lock) {
while (!condition)
wait();
...
}

Programming with Transactional Memory 17


JavaT: Commiting on wait
• So why does JavaT commit on wait?
• Motivating example: A simple barrier implementation
synchronized (lock) {
count++;
if (count != thread_count) {
lock.wait();
} else {
count = 0;
lock.notifyAll();
}
}
Code like this is found in Sun Java Tutorial, SPECjbb2000, and Java Grande
• With commit, barrier works as intended
• With rollback, all threads think they are first to barrier

Programming with Transactional Memory 18


JavaT: Commit on wait tradeoff
Major positive of commit on wait
 Allows transactional execution of existing Java code
Major negative of commit on wait
 Nested transaction problem
 We don’t want to commit value of “a” when we wait:
synchronized (x) {
a = true;
synchronized (y) {
while (!b)
y.wait();
c = true;}}
 With locks, wait releases specific lock
 With transactions, wait commits all outstanding transactions
 In practice, nesting examples are very rare
• It is bad to wait while holding a lock
• wait and notify are usually used for unnested top level coordination

Programming with Transactional Memory 19


JavaT: Keeping Scalable Code Simple
TestCompound benchmark from Harris & Fraser, OOPSLA 2003
Atomic swap of Map elements
Java HashMap,
Java Hashtable,
ConcurrentHashMap
 Simple lock around
swap does not scale
ConcurrentHM Fine
 Use ordered key
locks to avoid
deadlock
JavaT HashMap
 Use simplest code of
Java HM, performs
best of all!

Programming with Transactional Memory 20


SPECjbb2000 Overview

Client Tier Transaction Server Tier Database Tier

Driver Threads Warehouse order


(B-Tree)
nextID
newOrder
Transaction YTD (B-Tree)
Manager
history
Driver Threads (B-Tree)

Warehouse order
• Java Business Benchmark (B-Tree)
 3-tier Java benchmark modeled on TPC-C newOrder
 5 ops: order, payment, status, delivery, stock level (B-Tree)
• Most updates local to single warehouse history
 1% case of inter-warehouse transactions (B-Tree)
Programming with Transactional Memory 21
JavaT: SPECjbb2000 Results
SPECjbb2000
• Close to linear scaling for transactions and locks up to 32 CPUs
 32 CPU scale limited by bus in simulated CMP configuration

Programming with Transactional Memory 22


JavaT: Transactional Execution of Java Programs

Goals (revisited)
 Run existing Java programs using transactional memory
• Can run a wide variety of existing benchmarks
 Require no new language constructs
• Used existing synchronized, volatile, and Object.wait
 Require minimal changes to program source
• No changes required for these programs
 Compare performance of locks and transactions
• Generally better performance from transactions

Problem
 Conditional waiting semantics not right for all programs
 What can we do if we can change the language?

Programming with Transactional Memory 23


Overview
Motivation and Thesis
 How to make parallel programming of chip multiprocessors
easier using transactional memory
Transactional Memory
 Concepts, implementation, environment
JavaT [SCP 2006]
 Executing Java programs with Transactional Memory
Atomos [PLDI 2006]
 A transactional programming language
Semantic concurrency control [PPoPP 2007]
 Improving scalability of applications with long transactions

Programming with Transactional Memory 24


The Atomos Programming Language
Atomos derived from Java
 atomic replaces synchronized
 retry replaces wait/notify/notifyAll
Atomos design features
 Open nested transactions
• open blocks committing nested child transaction before parent
• Useful for language implementation but also available for applications
 Commit and Abort handlers
• Allow code to run dependant on transaction outcome
 Watch Sets
• Extension to retry for efficient conditional waiting on HTM systems

Programming with Transactional Memory 25


Atomos: The counter problem
Application JIT Compiler
atomic { // method prolog
... ...
id = nextId(); invocationCounter++;
... ...
} // method body
static long nextId() { ...
atomic { // method epilogue
nextID++; ...
}}

• Lower-level updates to global data can lead to violations


• General problem not confined to counters:
 Application level caching
 Cooperative scheduling in virtual machine
Programming with Transactional Memory 26
Atomos: Open nested counter solution
Solution Benefits
 Wrap counter update in  Violation of counter just replays open
open nested transaction nested transaction
 Open nested commit discards child’s
atomic {
read-set preventing later violations
... Issues
id = nextId();  What happens if parent rolls back
... after child commits?
}  Okay for statistical counters and UID
 Not okay for SPECjbb2000 YTD
static long nextID () { (year-to-date) payment counters
open { • Need to some way to coordinate with
parent transaction
nextID++;
}
}

Programming with Transactional Memory 27


Atomos: Commit and Abort Handlers
Programs can specify callbacks at end of transaction
 Separate interfaces for commit and abort outcomes
public interface CommitHandler { boolean onCommit();}
public interface AbortHandler { boolean onAbort ();}
Historical uses for commit and abort handlers
 DB technique for delaying non-transactional operations
 Harris brought the technique to STM for solving I/O problem
• See Exceptions and side-effects in atomic blocks.
• Buffer output until commit, rewind input on abort
Atomos applications
 EITHER Delay updates to shared data until parent commits
• Update YTD field only when parent is committing
 OR Provide compensation action to open nesting
• Undo YTD update when parent is aborted

Programming with Transactional Memory 28


Atomos: SPECjbb2000 Results
SPECjbb2000
 Difference between JavaT and Atomos result is handler overhead
 Overhead has negligible impact, Atomos still outperforms Java

Programming with Transactional Memory 29


Atomos Summary
Atomos similarities to other proposals
 atomic, retry, and commit/abort handlers
Atomos differences
 Open nested transactions for reduced isolation
 watch allows for scalable HTM retry implementation

Open nested transactions controversial


 Some uses straight forward
 More sophisticated uses require proper handlers
Can we give programmers the benefits of open nesting
without expecting them to use it directly?

Programming with Transactional Memory 30


Overview
Motivation and Thesis
 How to make parallel programming of chip multiprocessors
easier using transactional memory
Transactional Memory
 Concepts, implementation, environment
JavaT [SCP 2006]
 Executing Java programs with Transactional Memory
Atomos [PLDI 2006]
 A transactional programming language
Semantic concurrency control [PPoPP 2007]
 Improving scalability of applications with long transactions

Programming with Transactional Memory 31


What happens to SPECjbb with long transactions?

Old: SPECjbb could scale


 Open nesting High-contention SPECjbb Results
addresses counters
 Only 1% of operations
touch other warehouse
data structures

New: high-contention SPECjbb


 All threads in 1
warehouse
 All transactions touch
some shared Map

Open nested results not much


better than Baseline

Programming with Transactional Memory 32


Violations in logically independent operations

Map
TX #1 starting TX #2 starting

size=2
size=3

put(3,…) {1 => …, put(4,…)


closed-nested 2 => …,
…} closed-nested
transaction transaction
3 => …}

TX #1 commit TX #2 abort

Programming with Transactional Memory 33


Unwanted data dependencies limit scaling
Data structure bookkeeping causing serialization
 Frequent HashMap and TreeMap violations updating size
and modification counts

With short transactions


 Enough parallelism from operations that do not conflict to
make up for the ones that do conflict

With long transactions


 Too much lost work from conflicting operations

How can we eliminate unwanted dependencies?

Programming with Transactional Memory 34


Reducing unwanted dependencies
Custom hash table
 Don’t need size or modCount? Build stripped down Map
 Disadvantage: Do not want to custom build data structures
Open-nested transactions
 Allows a child transaction to commit before parent
 Disadvantage: Lose transactional atomicity
Segmented hash tables
 Use ConcurrentHashMap (or similar approaches)
• Compiler and Runtime Support for Efficient STM, Intel, PLDI 2006
 Disadvantage:
Reduces, but does not eliminate, unnecessary violations
Is this reduction of violations good enough?

Programming with Transactional Memory 35


Semantic Concurrency Control
Database concept of multi-level transactions
 Release low-level locks on data after acquiring higher-level
locks on semantic concepts such as keys and size
Example
 Before releasing lock on B-tree node containing key 7
record dependency on key 7 in lock table
 B-tree locks prevent races – lock table provides isolation
4
TX# Key Mode
… … …
2 6
#2317 7 Read
1 3 5 7 … … …

Programming with Transactional Memory 36


Semantic Concurrency Control
Applying Semantic Concurrency Control to TM
 Avoid retaining memory level dependencies
 Replace with semantic dependencies
 Add conflict detection on semantic properties

Transactional Collection Classes


 Avoid memory level dependencies on size field, …
 Replace with semantic dependencies on keys, size, …
 Only detect semantic conflicts that are necessary

No more memory conflicts on implementation details

Programming with Transactional Memory 37


Benefits of Transactional Collection Classes
Programmer just uses the usual collection interfaces
 Code change as simple as replacing
Map map = new HashMap();
 with
Map map = new TransactionalMap();
Similar interface coverage to util.concurrent
 Maps: TransactionalMap, TransactionalSortedMap
 Sets: TransactionalSet, TransactionalSortedSet
 Queue: TransactionalQueue

Only library writers deal directly with open nesting


 Similar to java.util.concurrent.atomic

Programming with Transactional Memory 38


Implementing Transactional Collection Classes

General Approach Simplified Map example


Read operations get(key) add dependencies on key
Acquire semantic dependency returns value from underlying map
Open nesting reads underlying state
Write operations put(key,value) writes to thread
Buffer changes until commit local buffer
On commit Apply buffer to underlying map,
Apply buffer to underlying state violate transactions that depended on
Check for semantic conflicts the keys we are writing

On commit and abort Remove key dependencies


Release semantic
dependencies

Programming with Transactional Memory 39


Example of non-conflicting put operations

Underlying
TX #1 starting TX #2 starting
Map
size=4
size=2
size=3

put(c,23) {a => 50, put(d,42)


open-nested b => 17,
17} open-nested
transaction transaction
23,
c => 23}
d => 42}
TX #1 commit TX #2 commit
and handler and handler
execution Depend- execution
encies
{c
{d =>
{}[1]}
[1],
[2]}
Write Buffer Write Buffer
d => [2]}
{} 23}
{c => {d =>
{} 42}

Programming with Transactional Memory 40


Example of conflicting put and get operations

Underlying
TX #1 starting TX #2 starting
Map
size=3
size=2

put(c,23) {a => 50, get(c) open-


open-nested 17,
b => 17} nested
transaction transaction
c => 23}

TX #1 commit TX #2 abort
and handler and handler
execution Depend- execution
encies
{c{c
=>{}
=>[1]}
Write Buffer [1,2]} Write Buffer

{} 23}
{c => {}

Programming with Transactional Memory 41


Benefits of Semantic Concurrency Approach
Transactional Collection Class works with abstract type
 Can work with any conforming implementation
 HashMap, TreeMap, …

Avoids implementation specific violations


 Not just size and mod count
 HashTable resizing does not abort parent transactions
 TreeMap rotations invisible as well

Programming with Transactional Memory 42


High-contention SPECjbb2000 results
Java Locks
Short critical sections
Atomos Baseline
Full protection of logical ops
Atomos Open
Use simple open-nesting for
UID generation
Atomos Transactional
Change to Transactional
Collection Classes

Performance Limit?
Semantic violations from
calls to
SortedMap.firstKey()
Programming with Transactional Memory 43
High-contention SPECjbb2000 results
SortedMap dependency
SortedMap use
overloaded
1. Lookup by ID
2. Get oldest ID
for deletion

Replace with Map and


Queue
1. Use Map for
lookup by ID
2. Use Queue to
find oldest

Programming with Transactional Memory 44


High-contention SPECjbb2000 results
What else could we do?
 Split larger
transactions into
smaller ones
 In the limit, we can
end up with
transactions
matching the short
critical regions of
Java

Return on investment
 Coarse grained
transactional version
is giving almost 8x on
16 processors
 Coarse grained lock Focus on correctness
version would not
have scaled at all tune for performance

Programming with Transactional Memory 45


SPECjbb2000 Return on Investment
Version Speedup on Effort Atomos 14 changes 7.8x
16 CPUs Java 272 changes 13x
Baseline 1.6 1 atomic statement

Open 2.7 4 open statements

Transactional 4.1 2 Transactional Map, 1 TxnSortedMap


2 transactional counters
Queue 7.8 Change TxnSortedMap to TxnMap/TxnQueue
(2 new calls: Queue.add & Queue.remove)
Short 12.5 272 atomic statements

Java 13.0 272 synchronized statements

Programming with Transactional Memory 46


Semantic Concurrency Control Summary
Transactional memory promises to ease parallelization
 Need to support coarse grained transactions

Need to access shared data from within transactions


 While composing operations atomically
 While avoiding unnecessary data dependency violations
 While still having reasonable performance!

Transactional Collection Classes


 Provides needed scalability through familiar library
interfaces of Map, SortedMap, Set, SortedSet, and Queue
 Removes need for direct use of open nested transactions

Programming with Transactional Memory 47


Overview
Motivation and Thesis
 How to make parallel programming of chip multiprocessors
easier using transactional memory
Transactional Memory
 Concepts, implementation, environment
JavaT [SCP 2006]
 Executing Java programs with Transactional Memory
Atomos [PLDI 2006]
 A transactional programming language
Semantic concurrency control [PPoPP 2007]
 Improving scalability of applications with long transactions

Programming with Transactional Memory 48


Summary
Thesis:
If transactional memory is to make parallel programming
easier, rather than just more scalable, the programming
interface requires more than simple atomic transactions

JavaT
 Transactions alone cannot run all existing Java programs
due to incompatibility of monitor conditional waiting
Atomos Programming Language
 Features to support reduced isolation and integration non-
transactional operations through handlers
Transactional Collection Classes
 Using semantic concurrency control to improve scalability of
applications using long transactions

Programming with Transactional Memory 49


Future Work
Transaction-aware I/O libraries
 Semantic concurrency control for structured files such as b-trees
 Support for automatically buffering OutputStreams and Writers
 Support for application logging within transactions

Integrating with other transactional systems (distributed transactions)


 Treat TM as resource manager like DB or transactional file system

Programming Language
 Language support for loop based parallelism
 Task-based, rather than thread-based, models

Virtual Machines
 Garbage Collector

Programming with Transactional Memory 50


Acknowledgements
• My wife Jennifer and kids Michael, Daniel, and Bethany
• My parents David and Elizabeth
• My advisors Kunle Olukotun and Christos Kozyrakis
• My committee Dawson Engler, Margot Gerritsen, John Mitchell
• Jared Casper, Hassan Chafi, JaeWoong Chung, Austen McDonald
and the rest of TCC group for the simulator and everything else
• Andrew Selle and Jacob Leverich for all those cycles
• Normans Adams, Marc Brown, and John Ellis for encouraging me to
go back to school
• Everyone at Ariba that made it possible to go back to school
• Olin Shivers and Tom Knight and the MIT UROP program for inspiring
me to do research as an undergraduate
• Intel for my PhD fellowship
• DARPA, not just for supporting me for the last five years, but for
employing my father for my first five years…

Programming with Transactional Memory 51

You might also like