Mahmood Browne
Mahmood Browne
1 Introduction
This paper presents a language (P-COM2)1 and a compiler that composes parallel and
distributed programs from independently written components and illustrates their application.
P-COM2 is an interface definition language which incorporates information on behaviors and
implementations of components to enable qualification of components for effectiveness in
specific application instances and execution environments. The general strategy is somewhat
similar to composition of programs in the Web Services paradigm but the goals are quite
different. A component is a serial program which is encapsulated by an associative interface
[8,11] which specifies the properties of the component. The composition implemented by the
compiler is based on matching of associative interfaces and generates as final output either an
MPI program or multi-threaded code for a shared memory multi-processor. The CODE [26]
parallel programming system is used as an intermediate language and is the immediate target
language of the compositional compiler.
Component-oriented software development is one of the most active and significant threads
of research in software engineering [1,10,15,29]. There are many motivations for raising the
level of abstraction of program composition from individual statements to components with
substantial semantics. It is often the case that there is a family of applications which can be
generated from a modest number of appropriately-defined components. Optimization and
adaptation for different execution environments is readily accomplished by creating and
maintaining multiple versions of components rather than by direct modifications of complete
2 CORBA, Web Services, etc. which are very much component-oriented development systems, are not
commonly used for development of parallel or high performance applications.
2 Definition of Terms and Concepts
Domain Analysis: Domain analysis [5] identifies the components from which a family of
programs in the domain can be constructed and identifies a set of attributes in which the
properties and behaviors of the components can be defined. It is usually the case that
applications require components from multiple domains.
Component: A component is one or more sequential computations, an interface which
specifies the information used for selection and matching of components and a state machine
which manages the interface, the interactions with other peers and the invocation of the
sequential computations. An interaction, which may be initiated as an incoming message (or
set of messages) or as an invocation of a transaction, will trigger an action which is associated
with some state of the state machine. The action may include execution of a sequential
computation.
Sequential Computation: A computation is a unit of work that implements some atomic
functionality. A computation is a sequential program which refers only to its own local
variables and its input variables.
Associative Interface: An associative interface [8] encapsulates a component. It describes the
behavior and functionality of a component. One of the most important properties of associative
interfaces is that they enable differentiation among alternative implementations of the same
component. These interfaces are called "associative" because selection and matching is similar
to operations on content-addressable memories. An associative interface consists of an accepts
specification and a requests specification.
Accepts Specification: An accepts interface specifies the set of interactions in which a
component is willing to participate. The accepts interface for a component is a set of three-
tuples (profile, transaction, protocol).
• A profile is a set of attribute/value pairs. Components have a priori agreement on the set of
attributes and values which can appear on the accepts and requests interface of a
component.
• A transaction specification incorporates one or more function signatures including the data
types, functionality and parameters of the unit of work to be executed and a state machine
which manages the order of execution of the units of work. The state machine is defined in
the form of conditional expressions over states and function signatures. A transaction can be
enabled or disabled based on its current state and its current state can be used in runtime
binding of the components. Multiple transactions controlled by the state machine can be
used to represent complex interactions such as precedence of transactions, "and"
relationships among transactions acting as a barrier and "or" relationships between
transactions representing alternative ways of executing the component.
• A protocol defines a sequence of simple interactions necessary to complete the interaction
specified by the profile. The most basic protocol is data-flow (continuations), which is
defined as executing the functionality of a component and transmitting the output to a
successor defined by the selectors at that component without returning to the invoking
component. More complex interaction protocols such as call-return and persistent
transactions are planned but not yet implemented.
Requests Specification: A requests interface specifies the set of interactions which a
component must initiate if it is to complete the interactions it has agreed to accept. The
requests interface is a set of three-tuples (selector, transaction, protocol). A component can
have multiple tuples in its requests interface to implement its required functionality.
• A selector is a conditional expression over the attributes of all the components in the
domain.
• Transaction specifications are similar to those for accepts specifications.
• Protocol specifications are as given for accepts specifications.
Start Component: A start component is a component that has at least one requests interface
and no accepts interface. Every program requires a start component. There can be only one
start component in a program which provides a starting point for the program.
Stop Component: A stop component is a component that has at least one accepts interface and
no requests interface. A stop component is also a requirement for termination of a program.
There can be more than one stop component of a program denoting multiple ending points for
the program.
3 Programming Model
The domain-based, component-oriented programming model targets development of a family
of programs rather than a single program. The programming model has two phases:
development of families of components and specification of instances from the family of
programs which can be instantiated from the sets of components.
5 Compilation Process
The conditional expression of a selector is a template which has slots for attribute names and
values. The names and values are specified in the profiles of other components of the domain.
Each attribute name in the selector expression of a component behaves as a variable. The
attribute variables in a selector are instantiated with the values defined in the profile of another
component. The profile and the selector are said to match when the instantiated conditional
expression evaluates to true.
The source program for the compilation process is a start component with a sequential
computation which implements initialization for the program and a requests interface which
specifies the components implementing the first steps of the computation and one or more
libraries to search for components. The libraries should include the components needed to
compose a family of applications specified by a domain analysis. The set of components
which is composed to form a program is primarily dependent on the requests interface of the
start component.
The target language for the compilation process is a generalized data flow graph as defined
in [26]. A node in this data flow graph consists of an initialization, a firing rule, a sequential
computation and a routing rule for distribution of the outputs of the computation. There are
two special node types, a start node and a stop node. Acceptable data flow graphs must begin
with a start node and terminate on a stop node.
The compilation process starts by parsing the associative interface of the start component.
The compiler then searches a specified list of libraries for components whose accepts interface
matches with the requests interface of the start component. The matching process is actually
not much more than a sophisticated type matching. If the matching between the selector of one
component and the profile of another component is successful, the compiler tries to match the
corresponding transactions of the requests and accepts interface. The transactions are said to
match when all of the following conditions are true. 1) The name of the two transactions is the
same. 2) The number of arguments of each of the two transactions is the same. 3) The data type
of each argument in the requests transaction is the same as that of the corresponding argument
in the accepts transaction. 4) The sequencing constraint given by the conditional expression in
the accepts transaction specification (the state machine) is satisfied. Finally the protocol
specifications must be consistent.
When compilation of the start component is completed, it is converted into a start node [26]
for the data flow graph which will represent the parallel program and each match of a requests
interface to an accepts interface results in addition of a node to the data flow graph which is
being incrementally constructed by the compilation process and an arc connecting the this new
node to the node which is currently being processed by the compiler. If there is a replication
clause in a transaction specification then at runtime the specified number of replicas of the
matched component are instantiated and linked with data flow arcs. This searching and
matching process for the requests interface is applied recursively to each of the components
that are in the matched set. The composition process stops when no more matching of
interfaces is possible which will always occur with a Stop component since a Stop component
has no requests interface. Compilation of a P-COM2 stop component results in generation of a
stop node for the data flow graph. The compiler will signal an error if a requests interface
cannot be matched with an accepts interface of a desired component. The data flow graph
which has been generated is then compiled to a parallel program for a specific architecture by
compilation processes implemented in the CODE [26] parallel programming system.
6 Example Program
This section presents an example program showing the complete process of developing a
parallel program for the fast Fourier transformation (FFT) of a matrix in two dimensions from
simple components. The algorithm presented is an adaptation of Swarztrauber's
multiprocessor FFT algorithm [32]. This problem is simple enough to cover in detail and
illustrates many of the important concepts such as stateful components and precedence
constraints. Given an N x M matrix of complex numbers where both N and M are powers of 2,
we want to compute the 2D FFT of the complex matrix. This 2D FFT can be described in terms
of 1D FFTs, which helps in parallelizing the algorithm. Let us assume that there are P available
processors where P is also a power of 2. In this case the domain analysis is straightforward and
is an analysis of the algorithm itself. The steps of the algorithm are following:
a) Partitioning the matrix row wise (horizontally) into P submatrices, one for each
processor.
b) Sending these submatrices to each of the P processors for computation. The size of
each the submatrix is N/P x M.
c) Each processor performs a 1D FFT on every row of the submatrix that it received.
d) Collecting these 1D FFT’s and then transposing the N x M matrix. The resulting
matrix is of size M x N.
e) Splitting the M x N matrix row wise into P submatrices. The size of each of the
submatrix is M/P x N.
f) Sending these submatrices to the each of the P processors for computation.
g) Again each processor performs a 1-D FFT on every row of the submatrix that it
received.
h) Collecting all the submatrices from the P processors and transposing the M x N
matrix to get an N x M matrix. The resulting N x M matrix is the 2D FFT of the
original matrix.
This simple analysis suggests that all of the instances of this algorithm can be created from
composing instances of three components: a one-dimensional FFT component, a component
which partitions and distributes matrices and a component which merges rows or columns to
recover a matrix and which may optionally transpose the recovered matrix. Let us name the
components as fft_row, distribute, and gather_transpose respectively. One could as well
formulate the algorithm with separate components for merge and transpose but that could
introduce additional communication. Or the algorithm can use any 1D FFT algorithm to
calculate the 2D FFT of the matrix. Additionally the choice of implementation for
transposition of an array may vary with execution environment. Note that each of the
components above can be reused as each of them is actually used twice in the algorithm. These
components could reasonably be expected to be found as "off the shelf" component which can
be found and reused from linear algebra and fft libraries. Other than the above three
components we need a component that will read/initialize the matrix and one component to
print out the final result. Let us name the component as initialize and print. The component to
read/initialize the array may be the Start component and the print component may be the Stop
component. The Start component will be written to specify the set of component instances
which will be composed for a given data set and target execution environment.
initialize distribute
distribute … … fft_row
… … fft_row gather_transpose
gather_transpose print
Fig. 1. Data Flow Graph of 2D FFT Computation
The depencence graph of the program in terms of these components is shown in Figure 1.
This data flow graph suggests an optimization of creating a new component which combines
the functions of distribute and gather_transpose. This depending on the mapping of nodes to
processors, could eliminate two transmissions of the large matrix. As shown in Figure 1,
parallelism can be achieved through the use of multiple fft_row components. Note that the
gather_transpose component has to keep track of its state as it sends data to the distribute
component on its first execution and to the print component after its second execution.
Once we have identified the components, the next step is to complete the domain analysis
by defining a list of attributes through which we can describe the functions, behaviors and
implementations of a component and their instantiations. When some service is required it is
described in terms of the attributes in the format of accepts and requests interfaces.
The two domains from which this computation is composed are the matrix and fft domains.
There is a generic attribute "Domain" which is required for multi-domain problems. The
matrix domain has these distinct attributes:
a) Function: an attribute of type string. Describes its function.
b) Element_type: an attribute of type string. Describes the type information of the input
matrix.
c) Distribute_by_row: an attribute of type boolean. Describes whether the component
partitions the matrix by row or by col.
The fft domain has these attributes:
a) Input: an attribute of type string. Describes the input structure.
b) Element_type: an attribute of type string. Describes the type information of the input.
c) Algorithm: an attribute of type string.
d) Apply_per_row: an attribute of type boolean. Describes whether to apply the FFT
function per row or per column.
The completed domain analysis for the components is shown in Figure 2. Once the domain
analysis is done, we encapsulate the components in associative interfaces using the attributes
and transactions.
As shown in Figure 3, the requests interface of the initialize component specifies that it
needs a component that can distribute a matrix row-wise. The interface passes real and
imaginary parts of the matrix, the dimension of the matrix and the total number of processors
to the distribute component using the transaction specification. The data type mat2 is defined
as a two dimensional array data type.
Fft_row Gather_transpose
a) Domain: fft a) Domain: matrix
b) Input: matrix b) Function: gather
c) Element_type: complex c) Element_type: complex
d) Algorithm: 1d-fft d) Combine_by_row: true
e) Apply_per_row: true e) Transpose: true
Distribute Print
a) Domain: matrix a) Domain: print
b) Function: distribute b) Input: matrix
c) Element_type: complex c) Element_type: complex
d) Distribute_by_row: true
Fig. 2. Domain Analysis of the Components
Figure 4a shows the accepts interface of the distribute component. This distribute
component assumes that the matrix which it partitions and distributes will be merged. This is
specified in Figure 4b. The first selector interfaces to the gather_transpose component
providing the size of each of the submatrices, the total number of submatrices to collect at the
gather_transpose component and also state information which is needed in the gather_transpose
component. The second selector in Figure 4b specifies that it needs p instances of the fft_row
component and distributes the submatrices to each of the replicated components along with
their size. The construct "index [p]" is used to specify that multiple copy of the fft_row
component are needed. The construct "[]" with the transaction argument is used to transmit
different data to different copies of component. For different transmission patterns, different
constructs may be used in the language of the interface. Note that the number of instances of
the fft_row component is determined at runtime.
selector:
string domain == "matrix";
string function == "distribute";
string element_type == "complex";
bool distribute_by_row == true;
transaction:
int get_matrix(out mat2 grid_re,out mat2 grid_im, out
int n, out int m, out int p);
protocol: dataflow;
Fig. 3. Requests Interface of Initialize Component
profile:
string domain = "matrix";
string function = "distribute";
string element_type = "complex";
bool distribute_by_row = true;
transaction:
int get_matrix(in mat2 grid_re,in mat2 grid_im, in int n,
in int m, in int p);
protocol: dataflow;
Fig. 4a. Accepts Interface of distribute component
Figure 5a specifies that this implementation of fft_row component uses the "Cooley-Tukey"
algorithm [13]. The fft_row component requires no knowledge of how many copies of itself are
being used. From Figure 5b, we can see that the instance number of the fft_row component is
passed to the gather_transpose component using the variable "me".
Figure 6a illustrates the use of the ">" operator between the transactions to describe the
precedence relationship between the transactions. The second transaction cannot execute until
the first transaction is completed. The gather_transpose component collects the submatrices
one by one through the second transaction in the interface. P-COM2 incorporates precedence
ordering operations sufficient to express simple state machines for management of interactions
among components.
selector:
string domain == "matrix";
string function == "gather";
string element_type == "complex";
bool combine_by_row == true;
bool transpose == true;
transaction:
int get_p(out int n/p, out int m, out int p,
out int state);
protocol: dataflow;
{selector:
string domain == "fft";
string input == "matrix";
string element_type == "complex";
string algorithm == "Cooley-Tukey";
bool apply_per_row == true;
transaction:
int get_grid_n_m(out mat2 out_grid_re[],out mat2
out_grid_im[], out int n/p, out int m);
protocol: dataflow;
}index [ p ]
Fig. 4b. Requests Interface of distribute component
As shown in Figure 6b, the first requests interface of the gather_transpose component is
used to connect to the distribute component. The second interface connects to the print
component. The variable “state” is used to enable one of the transactions based on the current
state of the gather_transpose component.
profile:
string domain = "fft";
string input = "matrix";
string element_type = "complex";
string algorithm = "Cooley-Tukey";
bool apply_per_row = true;
transaction :
int get_grid_n_m(in mat2 grid_re,in mat2 grid_im,in int n,
in int m);
protocol: dataflow;
Fig. 5a. Accepts Interface of Fft_row Component
selector:
string domain == "matrix";
string function == "gather";
string element_type == "complex";
bool combine_by_row == true;
bool transpose == true;
transaction:
int get_grid_n_m_inst(out mat2 out_grid_re,out mat2
out_grid_im, out int me);
protocol: dataflow;
Fig. 5b. Requests Interface of Fft_row Component
profile:
string domain = "matrix";
string function = "gather";
string element_type = "complex";
bool combine_by_row = true;
bool transpose = true;
transaction:
int get_p(in int n, in int m, in int p,in int state);
>
int get_grid_n_m_inst(in mat2 grid_re,in mat2 grid_im,
in int inst);
protocol: dataflow;
Fig. 6a. Accepts Interface of Gather_transpose Component
selector:
string domain == "matrix";
string function == "distribute";
string element_type == "complex";
bool distribute_by_row == true;
transaction:
%{ state == 1, gathered == p }%
int get_matrix(out mat2 out_grid_re,out mat2 out_grid_im,
out int m, out int n*p, out int p);
protocol: dataflow;
selector:
string domain == "print";
string input == "matrix";
string element_type == "complex";
transaction:
%{ state == 2, gathered == p }%
int get_grid_n_m(out mat2 out_grid_re,out mat2
out_grid_im, out int m,out int n*p);
protocol: dataflow;
Fig. 6b. Requests Interface of Gather_transpose Component
8 Related Research
There has been relatively little research on component based programming in the context of
parallel and distributed program. Darwin [25] is a composition and configuration language for
parallel and distributed programs. Darwin uses a configuration script to compose programs
from components. This composition process is effectively manual. In our approach, the
composition information encapsulates the components themselves, as a result the compiler can
choose the required component automatically.
The component-based software development environment [23,28] of the SciRun project
feature powerful graphical composition of data flow graphs of components which are compiled
to parallel programs. H2O [31] is a component-oriented framework for composition of
distributed programs based on web services. Triana [33] is a graphical development
environment for composing distributed programs from components targeting peer to peer
execution environments. The G2 [24] composes distributed parallel programs from web
services through Microsoft .Net. Armada [27] composes distributed/parallel programs
specialized to data movement and filtering.
The Common Component Architecture (CCA) project [6] is a major research and
development project focused on composition of parallel programs from components. One
primary goal of CCA is to enable composition of programs from components written in
multiple languages. CCA has developed interface standards. The implementations of the CCA
interface specifications are object-oriented. There are several tools, XCAT, [19] Ccaffeine [14]
and BABEL [7,9] implementing the CCA interface specification system. Component
composition are either graphical or through scripts and make files. CCA components interact
through two types of ports. The first type of port is the provides port. The provides port is an
interface that components provide to other components. The second type of port is the uses
port. It is an interface through which components connects with other components which they
require. These port type exhibit some similarities to the accepts and requests transaction
specifications. However, the details and implementations are quite different as we have focused
on incorporation of the information necessary to enable composition by compilation.
ArchJava [4] annotates ports with provides and requires methods which helps the
programmer to better understand the dependency relations among components by exposing it
to the programmer. The accepts and requests interface of a P-COM2 component incorporate
signatures as do ArchJava provides and requires. The accepts and requests interfaces also
include profiles and precedence specification carrying semantic information and enabling
automatic program composition. The attribute name/value pairs in profiles are used for both
selecting and matching components thereby providing a semantics-based matching in addition
to type checking of the matching interfaces.
The use of associative interface has been reported earlier in the literature. Associative
interface is used in one broadcast based coordination model [12]. This model uses run time
composition, whereas our paper presents compile time composition. Associative interfaces
have also been reported in composition of performance modeling [11].
Acknowledgements
This research was supported by the National Science Foundation under grant number
0103725 “Performance-Driven Adaptive Software Design and Control” and grant number
0205181, “A Computational Infrastructure for Reliable Computer Simulations.” The
experiments were run on the facilities of the Texas Advanced Computation Center.
References
1. Achermann F., Lumpe M., et al., Piccola - a Small Composition Language, in Formal
Methods for Distributed Processing - A Survey of Object-Oriented Approaches, pp. 403-
426, Cambridge University Press, 2001.
2. Adve V. S., Bagrodia R., et al., POEMS: End-to-end Performance Design of Large
Parallel Adaptive Computational Systems, in IEEE Transactions on Software
Engineering, vol. 26(11): pp. 1027-1049, November 2000.
3. Adve V., Akinsanmi A., et al., Model-Based Control of Adaptive Applications: an
Overview, in Proceedings of the 16th International Parallel and Distributed Processing
Symposium (IPDPS 2002), April 2002.
4. Aldrich J., Chambers C., et al., ArchJava: connecting software architecture to
implementation, in Proceedings of the 22nd International Conference on Software
Engineering, pp. 187-197, May 2002.
5. Arango G., Domain Analysis: From Art Form to Engineering Discipline, in Proceedings
of the Fifth International Workshop on Software Specification and Design, pp. 152-159,
1989.
6. Armstrong R., Gannon D., et al., Toward a Common Component Architecture for High-
performance Scientific Computing, in Proceedings of the 8th IEEE International
Symposium on High Performance Distributed Computation, pp. 115-124, August 1999.
7. Babel: Components @ LLNL,
http://www.llnl.gov/CASC/components/babel.html.
8. Bayerdorffer B., Associative Broadcast and the Communication Semantics of Naming in
Concurrent Systems, Ph.D. Dissertation, Dept. of Computer Sciences, University of
Texas at Austin, December 1993.
9. Bernholdt D. E., Elwasif W. R., et al., A Component Architecture for High-Performance
Computing, in Proceedings of the Workshop on Performance Optimization via High-Level
Languages and Libraries (POHLL-02), June 2002.
10. Birngruber D., Coml: Yet another, but simple component composition language, in
Workshop on Composition Languages, WCL’01, pp. 1–13, September 2001.
11. Browne J. C. and Dube A., Compositional Development of Performance Models in
POEMS, in International Journal of High-Performance Computing Applications, vol.
14(4), Winter 2000.
12. Browne J. C., Kane K., et al., An Associative Broadcast Based Coordination Model for
Distributed Processes, in Proceedings of COORDINATION 2002, LNCS 2315, pp. 96-
110, 2002.
13. Cooley J. W. and Tukey J. W., An Algorithm for the Machine Computation of the
Complex Fourier Series, in Mathematics of Computation, vol. 9, pp. 297-301, April 1965.
14. Ccaffeine - a CCA component framework for parallel computing, http://www.cca-
forum.org/ccafe/
15. Czarnecki K. and Eisenecker U. W., Components and Generative Programming, in
Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT
International Symposium on the Foundations of Software Engineering, Springer-Verlag
LNCS 1687, pp. 2-19, 1999.
16. Deng G., New approaches for FMM implementation, Masters Thesis, Dept. of
Manufacturing Systems Engineering, University of Texas at Austin, August 2002.
17. Fu Y., Klimkowski K. J., et al., A fast solution method for three-dimensional many-
particle problems of linear elasticity, in International Journal for Numerical Methods in
Engineering, vol. 42(7): pp. 1215-1229, 1998.
18. Fu Y. and Rodin G. J., Fast solution method for three dimensional Stokesian many-
particle problems, in Commun. Numer. Meth. Engng, vol. 16(2): pp. 145-149, 2000.
19. Govindaraju M., Krishnan S., et al., Merging the CCA Component Model with the OGSI
Framework, in Proceedings of the 3rd International Symposium on Cluster Computing
and the Grid (CCGrid2003), pp. 182-189, May 2003.
20. Greengard L. and Rokhlin V., A fast algorithm for particle simulations, in Journal of
Computational Physics, vol. 73(2): pp. 325-348, 1987.
21. Greengard L. and Rokhlin V., A new version of the fast multipole method for the Laplace
equation on three dimensions, in Acta Numerica, vol. 6: pp. 229-270, 1997.
22. Guyer S. and Lin C., An Annotation Language for Optimizing Software Libraries, in
Proceedings of the Second Conference on Domain Specific Languages, pp. 39-53,
October 1999.
23. Johnson C.R., Parker S., et al., Component-Based Problem Solving Environments for
Large-Scale Scientific Computing, in Journal on Concurrency and Computation: Practice
and Experience, vol. 14, pp. 1337-1349, 2002.
24. Kelly W., Roe P., et al., An Enhanced Programming Model for Internet Based Cycle
Stealing, in Proceedings of the 2003 International Conference on Parallel and Distributed
Processing Techniques and Applications, pp. 1649-1655, June 2003.
25. Magee J., Dulay N., et al., Structuring parallel and distributed programs, in Software
Engineering Journal, vol. 8(2): pp. 73-82, March 1993.
26. Newton P. and Browne J. C., The CODE 2.0 Graphical Parallel Programming Language,
in Proceedings of the ACM International Conference on Supercomputing, July 1992.
27. Oldfield R. and Kotz. David, Armada: a parallel I/O framework for computational grids,
Future Generation Computer Systems, vol. 18(4), pp. 501-523, 2002.
28. Parker S. G., A Component-based Architecture for Parallel Multi-Physics PDE
Simulation, in Proceedings of the International Conference on Computational Science,
Springer-Verlag LNCS 2331, pp. 719-734, April 2002.
29. Seiter L., Mezini M., et al., Dynamic component gluing, in OOPSLA Workshop on Multi-
Dimensional Separation of Concerns in Object-Oriented Systems, November 1999.
30. Sun X. and Pitsianis N, A Matrix Version of the Fast Multipole Method, in Siam Review,
vol. 43(2): pp. 289-300, 2001.
31. Sunderam V. and Kurzyniec D., Lightweight Self-Organizing Frameworks for
Metacomputing, in Proceedings of the 11th IEEE International Symposium on High
Performance Distributed Computing HPDC-11 (HPDC'02), pp. 113-124, July 2002.
32. Swarztrauber P. N., Multiprocessor FFTs, in Journal of Parallel Computing, vol. 5: pp.
197-210, 1987.
33. Taylor I., Shields M., et al., Distributed P2P Computing within Triana: A Galaxy
Visualization Test Case, in Proceedings of International Parallel and Distributed
Processing Symposium (IPDPS 2003), April 2003.