Transactional Memory in Practice - Brett Hall - CppCon 2015
Transactional Memory in Practice - Brett Hall - CppCon 2015
Memory in Practice
Brett Hall
Principal Software Engineer
1
“May You Live in
Interesting Times”
• atomics
• mutexes
5
Atomics
• Too fiddly for programming in the large
- Work OK for small systems where you can get them to play nicely with each other and avoid deadlock
- But deadlock destroys the ability to compose solutions
- you can try to come up with all sort of ways to avoid it
- but eventually there’s a decent chance that you’ll run into something like…
10
Have to hope that the documentation tells you whether a lock is taken or not, and hope that the documentation is correct
12
• Transactional Memory
14
15
- Start here since interview candidates give blank stares when asked about TM
Transactions
16
17
19
20
- fixing the threading issues led to a cascading series of changes that was going to require large parts of the program to be rewritten
- competitive pressures for features caused us to shelve fixing this for a few years
- In the interim was playing with Haskell STM
- wrote my own STM system in C++ for kicks
- it worked and TM seemed like a good fit for parts of Dynamics
- originally only going to use it in a limited scope so that we could easily replace it if there were problems
- ended up all over the place, more on that later
- The higher ups were more comfortable with using my TM system than with Haskell
22
- Before we get too much farther, should tell you a bit abut Dynamics
- 800K ~ 900K lines of code
- not huge, but not trivial either
- files are broken down into measurements that are further broken down into acquisitions
- calculations on acquisitions are independent
- measurements depend on their acquisitions, but not on other measurements
- lots of easy parallelism to exploit
Our System
23
Starting a transaction
24
- atomically creates the transaction object and passes it to the function that the caller passes in
- transaction is not copyable and only atomically can create one
- atomically handles validating and committing or rolling back
- more on the “args” in a bit
- transaction has some methods, but we don’t need to worry about that for now
- If the transaction function throws an exception then the transaction is rolled back
- kind of a boring transaction…
Transactional Variables
25
26
27
28
29
30
31
32
33
The Canonical
Synchronization Example
Bank Accounts
34
Simple Transfer
35
- Note: no chance of deadlock if another thread does the same transfer in the opposite order
- No other thread can observe either operation without observing the other
- can’t see debit of account1 without seeing credit of account 2
- No race: If either account balance changes while we’re in our transaction then we start over and try again until we can do it without any conflicts
Transfer with notification
36
- Going to be sticking with the Wyatt system from now on, SG5 doesn’t have some of these features yet
- “After” queues an action to take when the TOP-LEVEL commit finishes
- multiple actions can be queued
- needed to interface with non-transactional code
- can’t just wait for the transaction commit because we might be in a nested transaction
37
38
Pitfalls
Unfortunately it’s not all unicorns and rainbows
39
40
41
42
- had put this in preemptively in some spots, but had never seen the code fire
- of course has a bug when we finally needed it (weird edge case we didn’t have a unit test for)
- heavy-handed response, but works for us given how rare this is
- other ways to handle, but much more complicated to implement
Retry Deadlock
43
44
Fixes things, but what if some bozo comes along and does this…
Nested, split transactions
45
- broken again
- modified from example given by Hans Boehm to the SG5 mailing list
- SG5 is wrestling with this now in discussions of retry in TS
- Fixed by applying NO_ATOMIC to the set_…. functions
- relying on commit happening can be considered a side-effect
- Can be more subtle, the second transaction could be buried in library code
Inconsistent Values
46
47
48
49
50
-… as a policy we required stuff stored in transactional variables to be either immutable or “internally transacted”
- “internally transacted” = all mutable state is in transactional variables
- enforced through code review
Invasive
51
- original plan was to only use TM in data store and maybe experiment parameters
- it sort of crept like the kudzu above to take over a lot more
- this was due to TM usefulness and complication of interfacing transactional and non-transactional code
- Used TM for everything except low-level instrument communication (data reception)
- GUI isn’t transactional (MFC), but calls transactional code
No one’s heard of it
52
- When I ask about TM in job interviews usually just get blank stares in response
- Hasn’t really mattered, people come up to speed quick
- But there’s also not a lot of places to turn for “best practices”
- lots of academic papers on implementing TM systems
- but you’re mostly on your own when it comes to figuring out how to apply the system
- that’s it for pitfalls…
Performance?
53
- Much like this car: not the fastest, but fits our purposes well
Performance
• Dynamics is not a “low latency” application (for the most part)
54
55
- Calculations were the main thing we wanted to push into background threads
- Calculations are structured such that only a few transactional reads need to be done at the start and then some transactional writes at the end
- the transactional overhead negligible compared to what Calculate result is doing
Performance
• More overhead
56
57
- Only one thread can call use_a_and_b at a time when using mutexes
- Any number of threads can use the transactional version
- transactional version might have more overhead, but the contention win might weigh in its favor as long as writes are RARE
- the mutex version can probably be restructured to have less contention (or use a shared_mutex)
- that has to be done by hand
- in transactional case its automatic
Hardware Transactional
Memory
• Exists today: Intel TSX
• Transacts cache-lines
• Implicit
• Bounded
58
59
- Had two new people join the team since we started using it
- one junior, one senior
- Neither had TM experience
- learning curve is short
- Usually stop making obvious mistakes in less than a month
- mutexes set the bar pretty low
- TM has lower cognitive load for the most part
Open Source?
60
Soon!
61
- Got the go ahead from the higher ups at Wyatt to do it a few weeks ago
- Haven’t had time to clean it up
- some embarrassing code that we've been living with that needs to be fixed
- open sourcing it is already improving our code quality
- Other goodies along with TM system
- DeferredResult, Channel, etc.
• Wyatt: http://www.wyatt.com
• SG5 papers:
• N4514: standard-ese
62