100% found this document useful (1 vote)

13K views31 pages

Foundations of The C++ Concurrency Memory Model: John Mellor-Crummey and Karthik Murthy

This document discusses the foundations of the C++ concurrency memory model. It outlines issues with prior memory models, such as allowing compiler optimizations that could introduce data races. It proposes a new data-race-free memory model that provides sequential consistency for programs without data races. The model defines legal reordering of memory operations and solutions for issues like non-blocking lock calls. It aims to balance usability and performance.

Uploaded by

Gamindu Udayanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

13K views31 pages

Foundations of The C++ Concurrency Memory Model: John Mellor-Crummey and Karthik Murthy

Uploaded by

Gamindu Udayanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Foundations of the C++

Concurrency Memory Model

John Mellor-Crummey and Karthik Murthy

Department of Computer Science
Rice University

[email protected]

COMP 522 27 September 2016

Before C++ Memory Model
• Prior practice
— threaded programming within a shared address space, e.g.
Pthreads
— implementations prohibit reordering memory operations with
respect to synchronization operations
– treat synchronization operations as opaque procedure calls

• Problem
— C and C++: single threaded languages using thread libraries
– unaware of threads
— compilers are thread unaware
– optimize programs for a single thread
– problem: may perform optimizations that are valid for single-thread
programs but violate intended meaning of multithreaded programs
— prior informal specifications don’t precisely define
– data races
– semantics of a program without data races
2
A Familiar Example

Without a precise memory model: yes!

T1 and T2 respectively speculate values of X and Y  
as 1, validating each other’s speculation
Does this program have a data race?
yes, for the aforementioned execution 3
Structure Fields and Races

4
Register Promotion

Is the register promotion

in the optimized code a
problem?

Yes: reads and writes to x

outside the critical section
5
Why a C++ Memory Model

• Problem: hard for programmers to reason about

correctness
• Without precise semantics, hard to reason if the compiler
will violate semantics
• Compiler transformations could introduce data races
without violating language specification
— e.g., previous register promotion and field update examples

6
Trylock and Ordering

Problem: undesirably strong semantics for programs using

non-blocking calls to acquire lock
—e.g., pthread_mutex_trylock()

What’s odd about with this code?

T2 waits for T1 to acquire lock instead of  
waiting for lock to be released
Why would we want the assert to succeed?
7
The Problem with Trylock

• Problem: in a sequentially consistent execution, example is

data-race free by current semantics and assertion can’t fail
• Assertion can fail if compiler or hardware reorders the
statements executed by T1
• Prohibiting such reordering requires memory fence before
lock
— fence doubles cost of the lock acquisition
• For well-structured uses of locks and unlocks, reordering
T1‘s statements is safe and no fence is necessary 8
C++ Memory Model Goals

• Sequential consistency for race free programs

• No semantics for programs with data races
• Weakened semantics for trylock

9
Why Undefined Data Race Semantics?

• There are no benign data races

— effectively the status quo
— little to gain by allowing races other than allowing code to be
obfuscated
• Giving Java-like semantics to data races may greatly
increase cost of some C++ constructs
• Compilers often assume that  
objects do not change  
unless there is an intervening  
assignment through a potential  
alias

10
Possible Memory Models

• Sequential consistency
— intuitive, but restricts optimizations
• Relaxed memory models
— allow hardware optimizations
— specified at a low level: makes it hard for programmers to
reason about correctness
— can limit compiler optimizations
– e.g., at least one relaxed model disallows global analysis or RRE

• Data-race free model

— properties
– guarantee sequential consistency for data-race free programs
– no guarantee for programs with races
— simple model with high performance

11
Why Data Race Free Models?

• Simple programmability of sequential consistency

— any program without data races is guaranteed to execute with
sequential consistency
• Implementation flexibility of relaxed models
• Different data race free models define notion of a race to
provide increasing flexibility
— data-race-free-0: no concurrent conflicting accesses (as for
Java)

12
Definitions - I

• Memory location
— each scalar value occupies a separate memory location
– except bitfields inside the same innermost struct or class

• Memory action consists of

— type of action
– data operation: load, store
– synchronization operation (for communication)
lock/unlock/trylock, atomic load/store, atomic RMW
— label identifying program point
— values to be read and written

13
Definitions - II

• Thread Execution
— set of memory actions
— partial order corresponding to the sequenced before ordering
– sequenced before applies to memory operations by same thread

• Sequentially Consistent Execution

— set of thread executions
— total order, <T, on all the memory actions which satisfies the
constraints
– each thread is internally consistent
– T is consistent with sequenced-before orders
– each load, lock, read-modify-write operation reads value from last
preceding write to same locations according to <T
— effectively requires total order that is interleaving of
individual thread actions

14
Definitions - III

• Two memory operations conflict if

— they access same memory location, and
— at least one operation is a
– store
– atomic store
– atomic read-modify-write

• Type 1 data race

— in sequentially consistent operation, two memory operations
from different threads form a type 1 data race if they conflict
– at least one is a data operation
– adjacent in <T

15
C++ Memory Model

• If a program (on a given input) has a sequentially

consistent execution with a type 1 data race, its behavior is
undefined
• Otherwise, the program behaves according to one of its
sequentially consistent executions

16
Legal Reorderings

Hardware and compilers may freely reorder memory operation

M1 sequenced before memory operation M2
if the reordering is allowed by intra-thread semantics and
1. M1 is a data operation and M2 is a read synchronization operation

2. M1 is a write synchronization and M2 is a data operation

3. M1 and M2 are both data with no synchronization sequence-ordered

between them

17
Legal Lock Optimizations

• When lock & unlock are used in “well-structured” ways,

following reorderings between M1 sequenced-before M2
are safe
— M1 is data and M2 is the write of a lock operation
— M1 is unlock and M2 is either a read or write of a lock
• Note: data writes and writes from well-structured locks and
unlocks can be executed non-atomically

18
C++ Memory Model Solution for Trylock

• Modify specification of trylock to not guarantee that it will

succeed if the lock is available
• Failed trylock doesn’t tell you anything reliable
— can’t infer that another thread holds the lock
• New semantics
— successful trylock is treated as lock()
— unsuccessful trylock() is treated by memory model as no-op
• Why is this useful?
— promise sequential consistency with an intuitive race
definition, even for programs with trylock

19
Sequential Consistency vs. Write Atomicity
• Independent read, independent write doesn’t guarantee
sequential consistency if writes don’t execute atomically

• T1 and T2 are on same node with shared write-through cache

• T3 and T4 are on separate nodes
• Execution
— T2 reads T1’s value x=1 early, executes a fence and reads old Y
— T3 writes y=1, executes fence, writes x=2; all writes commit
— T4 reads x=2, executes fence, now T1’s write x=1 commits
• Violation of sequential consistency: non-atomic write x=1
20
Making C++ Memory Model Usable

• Problem: a departure from sequential consistency for sync

operations can lead to non-intuitive behavior
• Approach: retain sequential consistency for default
atomics and sync operations

21
C++ Atomics

• C++ qualifier denote variables with atomic operations

— operations needs to be executed atomically by HW
• Sequentially-consistent atomics: data-race-free models
require that all operations on C++ atomics must appear
sequentially consistent
• Initially opposed by many HW and SW developers
— sufficiently expensive on some processors that an “experts
only” alternative was deemed necessary
— existing code (e.g. Linux kernel) assumes weak ordering.
easier to migrate such code with primitives closer to
assumed semantics
• Why was it resolved this way?
— models were too hard to formalize and use otherwise

22
Some Problems

• Significant restrictions on synchronization operations

— synchronization operations must appear sequentially
consistent with respect to each other
• Performance problem in practice
— only Intel Itanium processor distinguished between data and
sync operations
— other processors enforce ordering through fence or memory
barrier instructions
• Atomic operations must execute in sequenced-before and
atomic writes must execute atomically
— read can’t return a new value for a location until all older
copies of a location are invalid
— with caches, atomic writes are easier with invalidation
protocols

23
Implications for Current Processors

Guaranteeing sequentially consistent atomics

• Atomic writes need to be mapped to xchg: AMD64 and Intel64
• HW now only needs to make sure that xchg writes are atomic
• xchg implicitly ensures semantics of a store|load fence
• Why is this OK?
—better to pay a penalty on stores than on loads
– loads more frequent than stores
—store|load fence replaced by read-modify-write is just as
expensive on many processors today

24
Problematic Examples

• Common to use counters that are frequently incremented

but only read after all threads complete
— problem: requires all memory updates performed prior to
counter update become visible after any later counter update
in another thread
• Atomic store requires two fences
— one before and one after the store
— could imagine example where memory updates before or
after store could be reordered

25
Case for Low-Level Atomics

• Current model provides too much safety for some use

cases
• Expert programmers need way to relax model when code
does not require as much safety to maximize performance
• Don’t want to make memory model hard to reason about for
the rest of us

26
Low-Level Atomics

• Why?
— enable expert programmers to maximize performance
• What?
— can explicitly parameterize an operation on an atomic
variable with its memory ordering constraints
– e.g., x.load(memory_order_relaxed)
allows instruction to be reordered with other memory
operations
load is never an acquire operation, hence does not contribute
to the synchronizes-with ordering
— for read-modify-write operations, programmer can specify
whether an operation acts as an acquire, release, neither, or
both

27
Additional Definitions

• Happens-before (HB)
— if a is sequenced before b, then a happens before b
— if a synchronizes with b, then a happens before b
— if a happens before b and b happens before c, then a
happens before c
• Type 2 data race
— two data conflicting accesses to the same memory location
are unordered by happens before

28
C++ Memory Model for Low Level Atomics

• If a program (on a given input) has a consistent execution

with a type 2 data race, then its behavior is undefined
• Otherwise, program (on the same input) behaves according
to one of its sequentially consistent executions

A theorem shows that this is equivalent to the previous

model phrased in terms of type 1 data races

29
Java/C++ Comparison

• Primary goal of Java is safety and security

— make large effort to ensure semantics of codes
• C++ focus is performance
— no such safety concerns
– ignore semantics of codes with data races
— low-level atomics enable performance fine tuning

30
Summary

• Need multithreaded programs to harness the power of

modern multicore architectures
• A semantics for multithreaded execution is essential
• Previously, C++ memory model was ambiguous
• C++ memory model provides
— well defined semantics for data race free programs
— high performance by leaving programs with data races
undefined
— low level atomics for expert programmers to squeeze out
maximum performance
— compilers may assume that ordinary variables don’t change
asynchronously
• Remaining problems: adjacent bitfields

Learn React in 20 Days Linked in
No ratings yet
Learn React in 20 Days Linked in
44 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
63 pages
Hardware Memory Models
No ratings yet
Hardware Memory Models
13 pages
L4-Atomics
No ratings yet
L4-Atomics
56 pages
L3-Synchronization
No ratings yet
L3-Synchronization
38 pages
3_concurrency
No ratings yet
3_concurrency
52 pages
Lect06 Consistency Models
No ratings yet
Lect06 Consistency Models
64 pages
12 Distributed2
No ratings yet
12 Distributed2
92 pages
S62192
No ratings yet
S62192
127 pages
Programming Language Memory Models
No ratings yet
Programming Language Memory Models
20 pages
Lock-Free Programming (Or, Juggling Razor Blades) - Herb Sutter - CppCon 2014
No ratings yet
Lock-Free Programming (Or, Juggling Razor Blades) - Herb Sutter - CppCon 2014
47 pages
CSE211 Computer Architecture
No ratings yet
CSE211 Computer Architecture
18 pages
L4a-MM-Examples
No ratings yet
L4a-MM-Examples
10 pages
Promising
No ratings yet
Promising
19 pages
ICLP c5 CPP STM
No ratings yet
ICLP c5 CPP STM
20 pages
Uh Oh Its IO Ordering Will Deacon Arm
No ratings yet
Uh Oh Its IO Ordering Will Deacon Arm
38 pages
06 Consistency
No ratings yet
06 Consistency
46 pages
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
No ratings yet
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
69 pages
Data races are evil
No ratings yet
Data races are evil
10 pages
Memory Consistency Model
No ratings yet
Memory Consistency Model
17 pages
CSE211 Computer Architecturemodule 18-21
No ratings yet
CSE211 Computer Architecturemodule 18-21
19 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
Threading Dos and Donts
No ratings yet
Threading Dos and Donts
82 pages
Unit III.pptx
No ratings yet
Unit III.pptx
58 pages
CT-6611 Chapter-4-5-Class
No ratings yet
CT-6611 Chapter-4-5-Class
8 pages
Lecture 11: Consistency Models: Topics: Sequential Consistency, HW and HW/SW Optimizations
No ratings yet
Lecture 11: Consistency Models: Topics: Sequential Consistency, HW and HW/SW Optimizations
18 pages
Lab Synchronization
No ratings yet
Lab Synchronization
20 pages
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
No ratings yet
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
27 pages
L43 - Models of Memory Consistency
No ratings yet
L43 - Models of Memory Consistency
5 pages
Lab 3 Synchronization v2-1
No ratings yet
Lab 3 Synchronization v2-1
19 pages
Lab3 Synchronization
No ratings yet
Lab3 Synchronization
17 pages
CH 4 Synchronization Models of Memory Consistency
100% (1)
CH 4 Synchronization Models of Memory Consistency
26 pages
Memory Consistency Models: Sarita Adve
No ratings yet
Memory Consistency Models: Sarita Adve
60 pages
Mmnet13 Sarkar
No ratings yet
Mmnet13 Sarkar
50 pages
Concepts of Distributed Systems 2006/2007: Consistency & Replication
No ratings yet
Concepts of Distributed Systems 2006/2007: Consistency & Replication
53 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Memory Model
No ratings yet
Memory Model
35 pages
Lab 3
No ratings yet
Lab 3
18 pages
Memory Model and Ordering in C++
No ratings yet
Memory Model and Ordering in C++
7 pages
TinyOS Nesc PDF
No ratings yet
TinyOS Nesc PDF
29 pages
Module 4 Full - Latest
No ratings yet
Module 4 Full - Latest
122 pages
What Every Systems Programmer Should Know About Concurrency: Matt Kline
No ratings yet
What Every Systems Programmer Should Know About Concurrency: Matt Kline
12 pages
1709286746_1
No ratings yet
1709286746_1
73 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Thread-Level Parallelism and Synchronization Issues
No ratings yet
Thread-Level Parallelism and Synchronization Issues
9 pages
11 Lock Freedom
No ratings yet
11 Lock Freedom
24 pages
A Whirlwind Tour Through Concurrency: Kedar Namjoshi Bell Labs
No ratings yet
A Whirlwind Tour Through Concurrency: Kedar Namjoshi Bell Labs
37 pages
Locks 1
No ratings yet
Locks 1
61 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
Consistency and Replication
No ratings yet
Consistency and Replication
73 pages
Untitled
No ratings yet
Untitled
27 pages
Synchronization
No ratings yet
Synchronization
81 pages
CS 162 Memory Consistency Models
No ratings yet
CS 162 Memory Consistency Models
22 pages
CSC520 OS_Concepts_Ch1-6_Notes
No ratings yet
CSC520 OS_Concepts_Ch1-6_Notes
8 pages
Unit 3 Process Deadlocks
No ratings yet
Unit 3 Process Deadlocks
30 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
53 pages
Back To Basics Concurrency Arthur Odwyer
No ratings yet
Back To Basics Concurrency Arthur Odwyer
58 pages
Singleton Pattern C++
No ratings yet
Singleton Pattern C++
31 pages
Lab 9
No ratings yet
Lab 9
9 pages
CS0051 - M2-Threads, Processes and Mutual Exclusion
No ratings yet
CS0051 - M2-Threads, Processes and Mutual Exclusion
38 pages
Memory Consistency Models: David Mosberger
No ratings yet
Memory Consistency Models: David Mosberger
11 pages
CS609 Lesson 92-169
No ratings yet
CS609 Lesson 92-169
98 pages
UNIT-II-OS
No ratings yet
UNIT-II-OS
22 pages
Datarace Deadlock
No ratings yet
Datarace Deadlock
4 pages
Week3 ITE240 Process Management
No ratings yet
Week3 ITE240 Process Management
17 pages
Introduction To Algorithms: Design and Analysis of Algorithms 214
No ratings yet
Introduction To Algorithms: Design and Analysis of Algorithms 214
42 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Chapter 5 Multi-Threading
No ratings yet
Chapter 5 Multi-Threading
30 pages
Chapter 5: Process Scheduling
No ratings yet
Chapter 5: Process Scheduling
48 pages
A Methodology For Implementing Highly Concurrent Data Objects by Maurice Herlihy
No ratings yet
A Methodology For Implementing Highly Concurrent Data Objects by Maurice Herlihy
17 pages
Memory Consistyency
No ratings yet
Memory Consistyency
1 page
03 - Lecture - Performance Analysis
No ratings yet
03 - Lecture - Performance Analysis
26 pages
Slides
No ratings yet
Slides
115 pages
Arid Agriculture University, Rawalpindi: Final Exam / Spring 2020 (Paper Duration 48 Hours) To Be Filled by Teacher
No ratings yet
Arid Agriculture University, Rawalpindi: Final Exam / Spring 2020 (Paper Duration 48 Hours) To Be Filled by Teacher
2 pages
Multi Threaded Architectures
No ratings yet
Multi Threaded Architectures
47 pages
Memory Model For Multithreaded C++: Andrei Alexandrescu Hans Boehm Kevlin Henney Doug Lea Bill Pugh
No ratings yet
Memory Model For Multithreaded C++: Andrei Alexandrescu Hans Boehm Kevlin Henney Doug Lea Bill Pugh
6 pages
Multi Threading
No ratings yet
Multi Threading
80 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Java Multithreading
No ratings yet
Java Multithreading
23 pages
Scheduling in Linux
No ratings yet
Scheduling in Linux
4 pages
L 14 DSM
No ratings yet
L 14 DSM
3 pages
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
No ratings yet
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
44 pages
Btech Cs 4 Sem Operating System Ncs 401 2017
No ratings yet
Btech Cs 4 Sem Operating System Ncs 401 2017
1 page
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
Cse 325 - Operat Lab
No ratings yet
Cse 325 - Operat Lab
2 pages
Priority Scheduling Implementation
No ratings yet
Priority Scheduling Implementation
6 pages
Concurrency Primer
No ratings yet
Concurrency Primer
12 pages
Fork
No ratings yet
Fork
37 pages
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
No ratings yet
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
14 pages
Gevent Tutorial
No ratings yet
Gevent Tutorial
11 pages
IPC Linux
No ratings yet
IPC Linux
58 pages
Lecture 1 CMS 165: Introduction To The Course
No ratings yet
Lecture 1 CMS 165: Introduction To The Course
11 pages
Department of Examinations, Sri Lanka
No ratings yet
Department of Examinations, Sri Lanka
15 pages
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
No ratings yet
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
12 pages
hệ điều hành
No ratings yet
hệ điều hành
18 pages
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
No ratings yet
Design and Analysis of Algorithms 214.: Introduction To The C Programming Language
3 pages
Sri Lanka Institute of Information Technology Unit Outline: Course Identification
No ratings yet
Sri Lanka Institute of Information Technology Unit Outline: Course Identification
7 pages
Powered by HackerRank11
No ratings yet
Powered by HackerRank11
3 pages
Giuoco Piano - Two Knights - Fried Liver (C57) : (Richard Westbrook, 2006)
100% (1)
Giuoco Piano - Two Knights - Fried Liver (C57) : (Richard Westbrook, 2006)
7 pages
QUIZ 2 Question 1652767937733
No ratings yet
QUIZ 2 Question 1652767937733
6 pages
Data Structures Question Paper
No ratings yet
Data Structures Question Paper
10 pages
Race Condition: Process Synchronization
No ratings yet
Race Condition: Process Synchronization
5 pages
DAA Final Examination 2005en
No ratings yet
DAA Final Examination 2005en
9 pages
Exercises To Chapter 15 - Transactions
No ratings yet
Exercises To Chapter 15 - Transactions
5 pages
DAA Final Examination 2003en
No ratings yet
DAA Final Examination 2003en
10 pages
Producer Consumer Problem
No ratings yet
Producer Consumer Problem
4 pages
DAA Final Examination 2002en
No ratings yet
DAA Final Examination 2002en
11 pages