Arallel Rogramming With Keletons
Arallel Rogramming With Keletons
Arallel Rogramming With Keletons
Parallel Programming
with SkeletonS
By Joel Falcou
Parallel programming is bound to become the main concern of software developers in the coming decades.
Various models aim to solve this tension, trading efficiency for abstraction or vice versa, but how about
getting both?
58 Copublished by the IEEE CS and the AIP 1521-9615/09/$25.00 © 2009 IEEE Computing in SCienCe & engineering
PIPE
φ2
As an example, consider an image- figure 1. Quaff code generation process. the compile-time system analyzes C++
processing application for detecting code using skeleton constructors to build a process network. this network is then
edges in a video stream. We would used to generate the message-passing interface (mPI) code for compilation.
build this application from four
functions:
slices over a pool of slave processors, structure extracted from the applica-
• load, which retrieves an image from and merges the results of a function’s tion definition into a list of executable
the video stream; parallel application (which is thresh in instructions for a given parallel archi-
• thresh, which applies a binary our example): tecture, as Figure 1 shows.
threshold to an image; Other research describes this pro-
• edge, which extracts edges as lists of A2 = pipeline [map [slice, thresh, cess and its associated specific lan-
lines; and merge], edge, save]. guage.5 Whereas languages such as
• save, which saves the result to a file metaOCaml or Template Haskell
on disk. We thus define the final paral- natively support such constructions,
lel application in terms of skeleton C++ requires the use (and abuse) of
The sequential version of our applica- nesting—map being nested inside a template metaprogramming and op-
tion is defined as the sequential com- pipeline—and the list of sequential erator overloading.
position of these four functions: functions. As far as the developer is Let’s examine the Quaff interface
concerned, the parallelization is done and which skeletons it supports.
As = sequence [load, thresh, edge, save], because the skeleton implementation
will handle all the low-level com- The Quaff Programming Model
where sequence represents the se- munication and marshaling details. Figure 2 presents a simple algorithm
quential composition skeleton. Building parallel applications is sim- parallelized with Quaff. In this ex-
We can introduce the first level of plified because the developer only ample, we want to apply the function
parallelism by noticing that we can needs to know the skeletons’ opera- comp to a vector in a parallel way.
run the four functions in parallel if we tional semantics. Another advantage The actual code is split into four
apply them to different elements of the is that developers can reuse existing parts, starting with the definition of
data stream. Thus, while load is load- sequential functions directly because the user functions. The only limita-
ing the ith image, we can apply thresh they don’t need to know about the tion is the argument ordering (input
to the (i − 1)th image. In general, we parallelization process. first, output last), which is a require-
can apply the thresh function on imag- ment to enable Quaff to determine
es it, it–1, it–2, and it–3 in parallel. This The Quaff Library how data should be transferred be-
parallelization scheme—a pipeline—is Quaff is a skeleton-based parallel pro- tween processes. The library pro-
a classic yet useful form of parallelism gramming library whose main task vides communication support for
that we can choose as a skeleton. A first is to rely on C++ template metapro- all standard C and C++ types and
parallel version of A is thus gramming to reduce the overhead some standard template library con-
traditionally associated with object- tainers—such as vector or list—thus
A1 = pipeline [load, thresh, edge, save]. oriented implementations of such li- limiting the marshaling code we
braries. The basic idea is to use the must write to support custom types.
We can express another level of par- C++ template mechanism so that Next, the user initializes the parallel
allelism by noticing that we can run skeleton-based programs expand at execution environment at line 10 via
the thresh function on different im- compile time and generate a new C++ the initialize function. From this
age slices in parallel. So, if we define MPI code to be compiled and execut- point on, we can evaluate and run
two functions for slicing and merg- ed at runtime. This code generation skeleton expressions on the underly-
ing images (slice and merge, respec- totally removes the overhead associ- ing MPI-enabled parallel machine.
tively), we can express As by using a ated with runtime polymorphism The application is a combination
new parallel construction—map— and function forwarding. To do this, of skeleton constructors on lines 12
which slices an image, distributes the developers transform the skeleton through 14.
may/June 2009 59
SCIEntIfIC ProgrAmmIng
# define NPROC 16
typedef std :: vector < particles > data_t ;
• generate builds the particle dis- int main ( int argc , const char * argv [])
{
tribution from the last iteration
initialize (arg , argv );
results,
• measure extracts features from the run ( seq ( gui )
video stream to evaluate each par- & ( map <NPROC >( ( seq ( generate )
ticle’s interest score by using an im- , seq ( measure )
age descriptor, , seq ( sample )
• sample resamples the particle set )
by replicating particles with large )
weight and trimming particles with , seq ( estimate )
, seq ( update_gui )
small weight, and
)
• estimate computes the particle );
set’s average to get the current finalize ();
frame estimation. }
Estimated
may/June 2009 61
SCIEntIfIC ProgrAmmIng
Other Parallel main idea is to generate a process topology from the con-
model can solve many of the com- munications and synchronization. they’re applicable, they provide a con-
mon issues associated with parallel Parallel skeletons won’t be the uni- venient way to describe computational
programming, such as handling com- versal solution for everyone, but when problems and solve them efficiently.
on which
Volume 7, number 1
References
1. H. Sutter and J. Larus, “the free Lunch Is • Silver Bullet podcasts
over: A fundamental turn toward Concur-
rency in Software,” Dr. Dobb’s J., vol. 30, no. and interviews
3, 2005; www.ddj.com/web-development/ • Intellectual Barack’s BlackBerry Bind • education via second life
184405990. Property
2. m. Cole, Algorithmic Skeletons: Structured Protection
Management of Parallel Computation, mIt March/april 2009
Press, 1989.
and Piracy volume 7, numBer 2
may/June 2009 63