ISO/IEC JTC1 SC22 WG21 N2973 = 09-0163 - 2009-09-27
Lawrence Crowl, [email protected], [email protected]
This paper is a revision of N2889 = 09-0079 - 2009-06-21.
Problem Description
Solution Domain
Thread Resources
Solution Value
Related Work
Proposed Solution
Acknowledgements
The async
Function
Thread Joining
Execution Policies
Eager and Lazy Evaluation
Direct Execution
Modified Future Types
Proposed Wording
30.3.1 Class thread
[thread.thread.class]
30.3.1.2 thread
constructors [thread.thread.constr]
30.6.1 Overview [futures.overview]
30.6.4 Class template unique_future
[futures.unique_future]
30.6.5 Class template shared_future
[futures.shared_future]
30.6.? Function template async
[futures.async]
One of the simplest methods for exploiting parallelism is to call one subroutine in parallel with another. However, with the current threading facilities, doing so is a difficult task.
There have been repeated requests for a simpler mechanism, all of which were rejected by the committee as not being within the spirit of the Kona compromise. However, there are now national body comments requesting such a facility.
UK-182 30.3.3.2.2
Future, promise and packaged_task provide a framework for creating future values, but a simple function to tie all three components together is missing. Note that we only need a simple facility for C++0x. Advanced thread pools are to be left for TR2.
async( F&& f, Args && ... );
Semantics are similar to creating a thread object with a packaged_task invokingf
withforward<Args>(args...)
but details are left unspecified to allow different scheduling and thread spawning implementations. It is unspecified whether a task submitted toasync
is run on its own thread or a thread previously used for another async task. If a call to async succeeds, it shall be safe to wait for it from any thread. The state ofthread_local
variables shall be preserved during async calls. No two incomplete async tasks shall see the same value ofthis_thread::get_id()
. [Note: this effectively forces new tasks to be run on a new thread, or a fixed-size pool with no queue. If the library is unable to spawn a new thread or there are no free worker threads then the async call should fail.]
Concurrency and parallism represent a broad domain of problems and solutions. Mechanisms are generally appropriate to a limited portion of that domain. So, mechanisms should explicitly state the domain in which they are intended to be useful.
One anticipated domain for the following async
solution
is extracting a limited amount of concurrency
from existing sequential programs.
That is, some function calls
will be made asynchrounous where appropriate
to extract high-level concurrency from program structure,
and not from its data structures.
The facility is not intended to compete with OpenMP or automatic parallelizers
that extract loop-level parallelism.
To be concrete,
the async
facility would be appropriate
to the recursive calls to quicksort,
but not to the iteration in a partition.
In this domain, the programming model is:
In this model, nested asynchronous calls are not only supported, but desired, as they provide the implementation the opportunity to reuse threads for many potentially, but not actually, asynchronous calls.
Another anticipated domain is to off load work from threads that must remain responsive, such as GUI threads. In this environment, programmers often must insist on concurrency.
The central technical problem in providing an asynchronous execution facility is to provide it in a manner that does not require the use of thread pools, while at the same time avoiding problems synchronizing with the destructors for any thread-local variables used by any threads created to perform the asynchronous work. See N2880: C++ object lifetime interactions with the threads API, Hans-J. Boehm, Lawrence Crowl, ISO/IEC JTC1 WG21 N2880, 2009-05-01.
While not explicit, the essential lesson of N2880 is as follows.
Threads have variables in the form of thread-local variables, parameters, and automatic variables. To ensure that the resources held by those variables are released, one must join with the thread so that those variables are destroyed. To ensure that destructors of those variables are well-defined, one must join with the thread before its referenced environment is destroyed.
Some consequences of this observation are:
In addition to the technical details, the committee must consider the value in any solution that meets the procedural bounds of the Kona compromise and the technical bounds embodied in N2880. In particular, external facilities like Cilk, the Threading Building Blocks, and the Parallel Patterns Library are known to be better able to handle fined-grained parallelism. So, is the solution space of sufficient value, relative to these external facilities, for standardization in C++0x?
The value in a solution is relative not only to external facilities,
but also relative to facilities in the current standard.
Our concurrency primitive, std::thread
,
does not return values,
and getting a value out through std::packaged_task
and std::unique_future
may take more training than many programmers are willing to accept.
So, is the solution space of sufficient value,
relative to these internal facilities,
for standardization in C++0x?
In this paper, we presume that the value in the solution comes from its improvement over existing internal facilities. The wording of the UK national body comment implies the same conclusion. On that basis, we propose the following solution.
Oliver Kowalke is implementing boost.task
(formerly known as boost.threadpool).
In this library, launch_in_thread()
reuses existing threads.
The function returns a returns handle object for both thread and return value.
This library also allows task interruption.
It is available at the Boost Vault
(http://www.boostpro.com/vault/
— section 'Concurrent Programming')
or from the Boost sandbox
(svn —
https://svn.boost.org/svn/boost/sandbox/task/).
Herb Sutter has proposed an alternate solution in N2901, which should appear in updated form as N2970,
The proposed solution consists of
a set of async
functions to launch asychronous work
and a future to manage the function result.
This solution derives from an extensive discussion on the C++ threads standardisation <[email protected]> mailing list. That discusson has not yet reached consensus. We highlight points of disagreement below. Note that the presentation in this paper is substantially expanded from earlier drafts, clarifying several issues, so the disagreements may be weaker than they were in discussion.
Thanks to the following contributors to the discussion on this topic: Hans Boehm, Beman Dawes, Peter Dimov, Pablo Halpern, Howard Hinnant, Oliver Kowalke, Doug Lea, Arch Robison, Bjarne Stroustrup, Alexander Terekhov, and Anthony Williams. In particular, we are extremely grateful to Herb Sutter for forcing a thorough analysis into the issues.
async
Function
The async
functions
use the standard techniques for deferring function execution.
The function may be wrapped in a lambda expression
to encapsulate arguments to the function as well.
For example, consider computing the sum of a very large array. The first task is to not compute asynchronously when the overhead would be significant or when processor resources might be fully engaged. The second task is to split the work into two pieces, one executed by the host thread and one executed asynchronously.
int parallel_sum(int* data, int size) { int sum = 0; if ( size < 1000 ) for ( int i = 0; i < size; ++i ) sum += data[i]; else { auto handle = std::async( [=]{ return parallel_sum( data+size/2, size-size/2 ); } ); sum += parallel_sum(data, size/2); sum += handle.get(); } return sum; }
Because Kona compromise prohibits thread pools and because we must join with any thread created, any asynchronous execution facility must ensure, at the very least, that any thread created is joined before the resulting handle is destroyed. (And, of course, the programmer must destroy the handle, not abandon it to free store.)
A consequence of the joining
is that std::thread
s cannot be reused.
Otherwise, some section of the program
would lose control of the resources accreted
in the std::thread
being reused.
Note, though, that it is possible to reuse operating system threads
in the implementation of std::thread
.
Given that the thread must join,
there are two implementation strategies,
intrusively implement async
or keep the std::thread
within the future for later join
ing.
In the intrusive async
,
the implementation within the thread
must
set_value
or set_exception
function of the promise
corresponding to the future.That is, the promise effectively joins the thread before the future becomes ready.
When storing the std::thread
within the future,
the implementation of async
is a straightforward composition of
std::thread
, packaged_task
,
and a modified unique_future
,
One consequence of
storing the std::thread
within the future
is that either unique_future
must be substantially modified
or that we introduce a new future type.
This paper proposes a solution that does not choose between these implementation strategies. The solution avoids this choice by simply requiring the appropriate synchronization, as if the thread were joined at the critical times.
Unlike in prior proposals,
we now require that the last future to own associated state
from an async
call
also "join with" any thread created by the async
call.
This new behavior is necessary to prevent undefined behavior
when the future is simply abandoned.
The async
functions have a policy parameter.
Three policies are defined in this paper.
The intent of this proposal is to closely follow
the parameters and overloads of the std::thread
constructors.
Concensus has emerged to remove
variadic constructors from std::thread
to both maintain consistency and simplify the form of std::async
.
Programmers can encapsulate function calls with parameters
in a lambda expression to obtain the desired behavior.
When the work is invoked serially, we propose to do so at the point of value request, rather than at the point of initiation. That is, work is invoked lazily rather than eagerly. This approach may seem surprising, but there are reasons to prefer invocation-on-request.
async
functions have already committed
to an eager serial execution.
async
might introduce deadlock.
In contrast,
executing the work serially at the call to get
cannot introduce any deadlock that was not already present
because the calling thread is necessarily blocked.
async
earlier.
When there are sufficient processor resources,
the function executes concurrently and speculatively.
When there are not sufficient resources,
the function will execute only when truly needed.
Eager semantics seem more natural when programmers think of "waiting to use the return value". On the other hand, lazy semantics seem more natural when programmers think of "moving the call earlier". Consider the following examples.
int original( int a, b ) { int c = work1( a ); int d = work2( b ); } int eager( int a, b ) { auto handle = async( [=]{ return work1( a ); } ); int d = work2( b ); int c = handle.get(); } int lazy( int a, b ) { auto handle = async( [=]{ return work1( b ); } ); int c = work1( a ); int d = handle.get(); }
Note also that in the proposed lazy semantics,
any serial execution will be in the context
of the thread that executes the get()
.
While we expect that this thread will nearly always
be the same as the thread that executes async()
,
it need not be because a future can be moved.
There are consequences to lazy evaluation.
In particular, unique_future
must be modified
to carry an std::function
in its associated state
to represent the computation needed.
A desirable implementation
in the case of synchronous execution
is direct execution,
in which the call to the std::function
representing the work
returns its result or exception directly to the caller.
In lazy evaluation,
direct execution is straightforward;
the implementation of a synchronous get()
simply calls the std::function
and returns its result.
Any exception is simply propogated as in a normal function call.
However, the rvalue-reference return type of get
effectively eliminates direct execution
as an implementation possibility.
To enable sequential evaluation, the following modifications to the existing futures are proposed.
unique_future
to determine if it holds a deferred function.
get()
or any other member function of unique_future
that extracts information about the state of the future.
Such member functions may modify the future
and may execute for some time.
shared_future
from a unique_future
to execute any deferred function.
The proposed wording is as follows. It consists primarily of two new subsections.
thread
[thread.thread.class]Within paragraph 1, edit the synopsis as follows.
namespace std { class thread { public: // types: class id; typedef implementation-defined native_handle_type; // See 30.2.3 // construct/copy/destroy: thread(); template <class F> explicit thread(F f);
template <class F, class ...Args> thread(F&& f, Args&&... args);~thread(); thread(const thread&) = delete; thread(thread&&); thread& operator=(const thread&) = delete; thread& operator=(thread&&); // members: void swap(thread&&); bool joinable() const; void join(); void detach(); id get_id() const; native_handle_type native_handle(); // See 30.2.3 // static members: static unsigned hardware_concurrency(); }; }
thread
constructors [thread.thread.constr]Edit the constructor prototypes as follows.
template <class F> explicit thread(F f);
template <class F, class ...Args> thread(F&& f, Args&&... args);
Edit paragraph 4 as follows.
Requires:
F
and each typeshall beTi
inArgs
CopyConstructible
if an lvalue and otherwiseMoveConstructible
.INVOKE (f, w1, w2, ..., wN)
(20.7.2)f()
shall be a valid expressionfor some values.w1
,w2
, ...,wN
, whereN == sizeof...(Args)
Edit paragraph t as follows.
Effects: Constructs an object of type thread and executes
INVOKE (f, t1, t2, ..., tN)
f()
in a new thread of execution, where. Any return value fromt1
,t2
, ...,tN
are the values inargs...
f
is ignored. Iff
terminates with an uncaught exception,std::terminate()
shall be called.
Add to the synopsis the appropriate entries from the following sections.
unique_future
[futures.unique_future]
Edit the synopsis as follows.
The edit adds is_deferred
and
removes const
qualification from member functions
is_ready
,
wait
,
wait_for
, and
wait_until
.
namespace std { template <class R> class unique_future { public: unique_future(unique_future &&); unique_future(const unique_future& rhs) = delete; unique_future(); unique_future& operator=(const unique_future& rhs) = delete; // retrieving the value see below get(); // functions to check state and wait for ready bool is_deferred() const; bool is_ready()
const; bool has_exception() const; bool has_value() const; void wait()const; template <class Rep, class Period> bool wait_for( const chrono::duration<Rep, Period>& rel_time)const; template <class Clock, class Duration> bool wait_until( const chrono::time_point<Clock, Duration>& abs_time)const; }; }
After paragraph 4, add a new paragraph as follows. This paragraph is a requirement on the destructor.
Synchronization: If no other thread refers to the associated state, and that state is associated with a thread created by an
async
call ([futures.async]), as if associated-thread.join()
.
After paragraph 5, add a new paragraph as follows.
This paragraph is a requirement on get()
.
Effects: As if
is_ready()
.
Delete paragraph 6.
This synchronization now happens as a consequence
of "as if" is_ready()
.
Synchronization: if*this
is associated with apromise
object, the completion ofset_value()
orset_exception()
to thatpromise
happens before (1.10)get()
returns.
After paragraph 9, add a new prototype.
bool is_deferred() const;
After that prototype, add a new paragraph.
Returns:
true
if and only if the associated state has a deferred function and that function has not executed.
Edit the is_ready()
prototype as follows.
bool is_ready()
const;
After that prototype add a new paragraph.
Effects: if
is_deferred()
, then execute the deferred function. Otherwise, no effect.
After that paragraph add a new paragraph.
Postcondition:
is_deferred() == false
.
After that paragraph add a new paragraph.
Synchronization: if
*this
is associated with apromise
object, the completion ofset_value()
orset_exception()
to thatpromise
happens before (1.10)is_ready()
returns. If the future is associated with a thread created by anasync
call ([futures.async]), as if associated-thread.join()
.
Edit the wait()
prototype as follows.
void wait()
const;
Edit paragraph 13 as follows.
Effects: as if
is_ready()
.blocksBlocks until*this
is ready. As ifis_ready()
.
Delete paragraph 14.
This synchronization now happens as a consequence
of "as if" is_ready
.
Synchronization: if*this
is associated with apromise
object, the completion ofset_value()
orset_exception()
to thatpromise
happens before (1.10)get()
returns.
Edit the wait_for()
prototype as follows.
template <class Rep, class
periodPeriod> void wait_for(const chrono::duration<Rep, Period>& rel_time)const;
Edit paragraph 16 as follows.
Effects: as if
is_ready()
.blocksBlocks until*this
is ready or untilrel_time
has elapsed. As ifis_ready()
.
Edit the wait_until()
prototype as follows.
template <class Clock, class Duration> void wait_until(const chrono::time_point<Clock, Duration>& abs_time)
const;
shared_future
[futures.shared_future]
Edit paragraph 3 as follows.
That paragraph is a requirement
on the construction from a unique_future
.
Effects: as if
rhs.is_ready()
.moveMove constructs ashared_future
object whose associated state is the same as the state ofrhs
beforeafter theis_ready()
effects.
After paragraph 5, add a new paragraph as follows. This paragraph is a requirement on the destructor.
Synchronization: If no other thread refers to the associated state, and that state is associated with a thread created by an
async
call ([futures.async]), as if associated-thread.join()
.
After paragraph 10, add a new paragraph.
This paragraph describes is_ready()
.
Synchronization: if
*this
is associated with apromise
object, the completion ofset_value()
orset_exception()
to thatpromise
happens before (1.10)is_ready()
returns. If the future is associated with a thread created by anasync
call ([futures.async]), as if associated-thread.join()
.
Edit paragraph 13 as follows.
This paragraph describes wait()
.
Effects: as if
is_ready()
.blocksBlocks until*this
is ready. As ifis_ready()
.
Delete paragraph 14.
The synchronization is handled by the "as if" is_ready()
.
Synchronization: if*this
is associated with apromise
object, the completion ofset_value()
orset_exception()
to thatpromise
happens before (1.10)get()
returns.
Edit paragraph 16 as follows.
This paragraph describes wait_for()
.
Effects: as if
is_ready()
.blocksBlocks until*this
is ready or untilrel_time
has elapsed. As ifis_ready()
.
async
[futures.async]Add the following section.
enum async_policy { async_threaded, async_serial, async_discretion }
template<class Callable>
unique_future<typename Callable::result_type>
async(F f, async_policy policy = async_discretion);Requires:
F
shall beCopyConstructible
if an lvalue and otherwiseMoveConstructible
. The expressionf()
shall be a valid expression.Effects: Constructs an object of type
unique_future<typename Callable::result_type>
([futures.unique_future]). Ifpolicy
isasync_threaded
, creates an object of typethread
and executesf()
in a new thread of execution. Any return value is captured by theunique_future
. Any exception not caught byf
is captured by theunique_future
. Thethread
is associated with theunique_future
, and affects the behavior of theunique_future
. Ifpolicy
isasync_serial
, thenf
is associated with the returnedunique_future
. The invocation is said to be deferred. Ifpolicy
isasync_discretion
, the implementation may choose either policy above at any call toasync
. [Note: Implementations should defer invocations when no more concurrency can be effectively exploited. —end note]Synchronization: The invocation of the
async
happens before (1.10 [intro.multithread]) the invocation off
. [Note: This statement applies even when the correspondingunique_future
is moved to another thread. —end note]Throws:
std::system_error
ifpolicy
isasync_threaded
and the implementation is unable to start a new thread.Error conditions: —
resource_unavailable_try_again
— ifpolicy
isasync_threaded
and either the system lacked the necessary resources to create another thread, or the system-imposed limit on the number of threads in a process would be exceeded.[Example: Two items of
work
below may be executed concurrently.extern int work1(int value); extern int work2(int value); int work(int value) { auto handle = std::async( [=]{ return work2(value); } ); int tmp = work1(value); return tmp + handle.get(); }
—end example:] [Note: The statement
return work1(value) + handle.get();
might not result in concurrency because
get()
may be evaluated beforework1()
, thus forcingwork2
to be evaluated beforework1()
. —end note:]