0% found this document useful (0 votes)
3 views11 pages

Hummel 1997

Uploaded by

Flavio Bolsonaro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

Hummel 1997

Uploaded by

Flavio Bolsonaro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CONCURRENCY: PRACTICE AND EXPERIENCE, VOL.

9(6), 621–631 (JUNE 1997)

SPMD programming in Java


SUSAN FLYNN HUMMEL∗, TON NGO AND HARINI SRINIVASAN
IBM T. J. Watson Research Center, P.O. Box 217, Yorktown Heights, NY 10598, U.S.A.

SUMMARY
We consider the suitability of the Java concurrent constructs for writing high-performance
SPMD code for parallel machines. More specifically, we investigate implementing a financial
application in Java on a distributed-memory parallel machine. Despite the fact that Java was
not expressly targeted to such applications and architectures per se, we conclude that efficient
implementations are feasible. Finally, we propose a library of Java methods to facilitate SPMD
programming. 1997 by John Wiley & Sons, Ltd.

1. MOTIVATION
Although Java was not specifically designed as a high-performance parallel-computing
language, it does include concurrent objects (threads), and its widespread acceptance
makes it an attractive candidate for writing portable computationally-intensive parallel
applications. In particular, Java has become a popular choice for numerical financial codes,
an example of which is arbitrage – detecting when the buying and selling of securities
is temporarily profitable. These applications involve sophisticated modeling techniques
such as successive over-relaxation (SOR) and Monte Carlo methods[1]. Other numerical
financial applications include data mining (pattern discovery) and cryptography (secure
transactions).
In this paper, we use an SOR code for evaluating American options (see Figure 1)[1], to
explore the suitability of using Java as a high-performance parallel-computing language.
This work is being conducted in the context of a research effort to implement a Java run-
time system (RTS) for the IBM POWERparallel System SP machine[2], which is designed
to effectively scale to large numbers of processors. The RTS is being written in C with
calls to MPI (message passing interface)[3] routines. Plans are to move to a Java plus MPI
version when one becomes available.
The typical programming idiom for highly parallel machines is called data-parallel or
single-program multiple-data (SPMD), where the data provide the parallel dimension.
Parallelism is conceptually specified as a loop whose iterates operate on elements of a,
perhaps multidimensional, array. Data dependences between parallel-loop iterates lead
to a producer–consumer type of sharing, wherein one iterate writes variables that are
later read by another, or collective communication, wherein all iterates participate. The
communication pattern between iterates is often very regular, for example a bidirectional
flow of variables between consecutive iterates (as in the code in Figure 1).
This paper explores the suitability of the Java concurrency constructs for writing SPMD
programs. In particular, the paper:
1. identifies the differences between the parallelism supported by Java and data paral-
lelism
∗ Also with Polytechnic University and Cornell Theory Centre.

CCC 1040–3108/97/060621–11 $17.50 Received 1 February 1997


1997 by John Wiley & Sons, Ltd.
622 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

2. discusses compiler optimizations of Java programs, and the limits imposed on such
optimizations because of the memory-consistency model defined at the Java language
level
3. discusses the key features of data-parallel programming, and how these features can
be implemented using the Java concurrent and synchronization constructs
4. identifies a set of library methods that facilitate data parallel programming in Java.

public void SORloop()


{
int i, k;
double y;
for (int t = 0; t < Psor.Convtime; t++) {
error = 0.0;
for (i = 0; i < Psor.N-1; i++) {
if (i == 0)
y = values[i+1]/2;
else if (i == Psor.N-1)
y = values[i-1]/2;
else
y = (values[i-1] + values[i+1])/2;
y = max(obstacle[i], values[i]+Psor.omega*(y-values[i]));
values[i] = y;
}
}
}

Figure 1. Java SOR code for evaluating American options

1.1. Organization of the paper


In Section 2, we elaborate on the differences between data parallelism and the type of
parallelism that Java was designed to support, which we call control parallelism. In the
subsequent Sections, we discuss the concurrent programming features of Java, and their
suitability for writing data-parallel programs. Section 3 summarizes the Java parallel pro-
gramming model, and its impact on performance optimizations in multi-threaded Java
programs. In Section 4, we consider data-parallel programming using the concurrent and
synchronization constructs supported by Java; we illustrate data parallel programming in
Java using a Successive Over Relaxation code for evaluating American options. Finally, in
Section 5, we propose a library of methods that would facilitate SPMD-programming in
Java on the SP.

2. CONTROL VS. DATA PARALLELISM


Although there is a continuum in the types of parallelism found in systems and numeric
applications, it is instructive to compare the two extremes of what we call control and data
SPMD PROGRAMMING IN JAVA 623

parallelism (respectively):

1. Parallelism in systems: Parallel constructs were originally incorporated into program-


ming languages to express the inherent concurrency in uniprocessor operating sys-
tems[4]. These constructs were later effectively used to program multiple-processor
and distributed systems[5–9]. This type of parallelism typically involves heteroge-
neous, ‘heavy weight’ processes, i.e. each process executes different code sequences
and demarcates a protection domain. The spawning of processes usually does not
follow a regular pattern during program execution, and processes compete for re-
sources, for example, a printer. It is thus important to enforce mutually exclusive
access to resources; however, it is often not necessary to impose an order on accesses
to resources – the system functions correctly as long as only one process accesses a
resource at a time.
2. Parallelism in numerical applications: The goal of parallel numeric applications
is performance, and the type of parallelism differs significantly from that found in
parallel and distributed systems. Data parallelism[10] is homogeneous, ‘light weight’,
and regular – for example, the iterates of a forAll loop operating on an array. Parallel
iterates are co-operating to solve a problem, and hence, it is important to ensure that
events, such as the writing and reading of variables, occur in the correct order. Thus,
producer–consumer access to shared data must be enforced. Iterates may also need
to be synchronized collectively, for example to enforce a barrier between phases of
execution or to reduce the elements of an array into a single value using a binary
operation (such as addition or maximum). A barrier can sometimes collectively
subsume individual produce–consume synchronization between iterates.

As discussed in the next Section, the Java parallel constructs were designed for interactive
and Internet (distributed) programming, and lie somewhere between the above two ex-
tremes. However, with some minor modifications SPMD programming could be expressed
in Java more naturally.

3. THE JAVA PARALLEL PROGRAMMING MODEL


Based on the classification described in the previous Section, the parallel programming
model provided in Java fits best under control parallelism. This Section presents a high-
level view of the support in Java for parallelism; for further details, the reader is referred
to the Java language specification[11].
Java has built-in classes and methods for writing multi-threaded programs. Threads are
created and managed explicitly as objects; they perform the conventional thread operations
such as start, run, suspend, resume, stop, etc. The ThreadGroup class allows a user to
name and manage a collection of threads. The grouping can be nested but cannot overlap;
in other words, a nested thread group forms a hierarchical tree. A ThreadGroup can be
used to collectively suspend, resume, stop and destroy all thread members; however, this
class does not provide methods to create or start all thread members simultaneously.
Java adopts a shared-memory model, allowing a public object to be directly accessi-
ble by any thread. The shared-memory model is easily mapped onto uniprocessor and
shared-memory multiprocessor machines. It can also be implemented on disjoint-memory
multicomputers, such as the SP, using message passing to implement a virtual shared mem-
624 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

ory. For distributed machines, Java’s Remote Method Invocation provides an RPC-like
interface that is appropriate for coarse grained parallelism.
Java threads synchronize through statements or methods that have a synchronized
attribute. These statements and methods function as monitors: only one thread is allowed to
execute synchronized code that access the same variables at a time. Locks are not provided
explicitly in the language, but are employed implicitly in the implementation of monitors
(they are embedded in the bytecode instructions monitorenter and monitorexit[12]).
Conceptually, each object has a lock and a wait queue associated with it; consequently, a
synchronized block of code is associated with the objects it accesses.
When two threads compete for entry to synchronized codes by attempting to lock the
same object, the thread that successfully acquires the lock continues execution, while the
other thread is placed on the wait queue of the object. When executing synchronized code,
a thread can explicitly transfer control by using notify, notifyAll and wait methods.
notify moves one randomly chosen thread from the wait queue of the associated object to
the run queue, while notifyAll moves all threads from the wait queue to the run queue.
If the thread owning the monitor executes a wait, it relinquishes the lock and places itself
on its wait queue, allowing another thread to enter the monitor.
The Java shared-memory model is not the sequentially consistent model[13], which
is commonly used for writing multi-threaded programs. Instead, Java employs a form of
weak consistency[14], to allow for shared-data optimization opportunities. In the sequential
consistency model, any update to a shared variable must be visible to all other threads. Java
allows a weaker memory model wherein an update to a shared variable is only guaranteed
to be visible to other threads when they execute synchronized code. For instance, a value
assigned by a thread to a variable outside a synchronized statement may not be immediately
visible to other threads because the implementation may elect to cache the variable until
the next synchronization point (with some restrictions). The following subsection discusses
the impact of the Java shared variable rules on compiler optimizations.

3.1. Optimizing memory accesses in multi-threaded programs


The goal of SPMD programming is high performance, and exploiting data locality is
arguably one of the most important optimizations on parallel machines. The ability to per-
form data locality optimizations, such as caching and prefetching, depends on the under-
lying language memory model. Communication optimizations[15,16] require an analysis
of memory-access patterns in an SPMD program, and therefore, the ability to perform
communication optimizations also depends on the language memory model.
The Java memory model[11] is illustrated in Figure 2. Every Java thread has, in addition to
a local stack, a working memory in which it can temporarily keep copies of the variables that
it accesses. The main memory contains the master copy of all variables. In the terminology
of Java, the interactions between the stack and the working memory are assign, use, lock
and unlock; the interactions between the working memory and the main memory are store
and load, and those within the main memory are write and read.
The Java specification[11] states that an ‘. . . implementation is free to update main
memory in an order that may be surprising’. The intention is to allow a Java implementation
to maintain the value of a variable in the working memory of a thread; a user must use
the synchronized keyword to ensure that data updates are visible to other threads, i.e.
the main shared memory is made consistent. Alternatively, a variable can be declared
SPMD PROGRAMMING IN JAVA 625

Java Virtual Machine Main Memory

Stack Working memory


assign store write

use load read

Figure 2. Java memory abstraction

volatile so that it will not be cached by any thread. Any update to a volatile variable
by a thread is immediately visible to all other threads; however, this disables some code
optimizations.1
Java defines the required ordering and coupling of the interactions between the memories
(the complete set of rules is complex and is not repeated here). For instance, the read/write
actions to be performed by the main memory for a variable must be executed in the order
of arrivals. Likewise, the rules call for the main memory to be updated at synchronization
points by invalidating all copies in the working memory of a thread on entry to a monitor
(monitorenter) and by writing to main memory all newly generated values on exit from
a monitor (monitorexit).
Although the Java shared data model is weakly consistent, it is not weak enough to allow
the oblivious caching and out of order access of variables between synchronization points.
For example, Java requires that (a) the updates by one thread be totally ordered and (b) the
updates to any given variable/lock be totally ordered. Other languages, such as Ada[17],
only require that updates to shared variables be effective at points where threads (tasks
in Ada) explicitly synchronize, and therefore, an RTS can make better use of memory
hierarchies[18]. A Java implementation does not have enough freedom to fully exploit
optimization opportunities provided by caches and local memories. A stronger memory
consistency model at the language level forces an implementation to adopt at least the
same, if not a stronger, memory consistency model. A strong consistency model at the
implementation level increases the complexity of code optimizations – the compiler/RTS
has to consider interactions between threads at all points where shared variables are updated,
not just at synchronization points[19].
1 Note that the term volatile adopted by Java designates a variable as noncacheable, whereas in parallel pro-
gramming this term has traditionally meant that a variable is cacheable but may be updated by other processors[17].
626 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

If whole program information is available, then a compiler/RTS can determine which


variables are shared, and which shared variables are not updated in synchronized code.
Such information is necessary to perform many standard compiler optimizations, e.g. code
motion[20], in the presence of threads and synchronization. With the separate compilation
of packages, where whole program information is not always available, it is difficult to
distinguish between shared and private variables. In such situations, therefore, several
common code optimizations are effectively disabled.

4. DATA PARALLELISM IN JAVA


Data parallelism arises from simultaneous operations on disjoint partitions of data by mul-
tiple processors. Conceptually, each thread executes on a different virtual processor and
accesses its own partition of the data. Synchronization is needed to enforce the ordering
necessary for correctness. Communication is necessary when a thread has to access a data
partition owned by a different thread, and, as described earlier, the producer/consumer rela-
tionship conveniently encapsulates both of these requirements. Determining the producer
(the thread that generates the data) and the consumer (the thread that accesses the data) is
straightforward since the relationship is specified by the algorithm. The following features
are idiomatic in data-parallel programming:

1. creating and starting multiple threads simultaneously


2. producer–consumer operation for synchronization and data communication between
threads
3. collective communication/synchronization between a set of threads.

Although Java provides language constructs for parallelism, there are several factors that
make expressing data parallelism in Java awkward. First, as noted in the previous Section,
there is no mechanism for creating and starting threads in a ThreadGroup simultaneously.
A forAll must be implemented as a serial loop in Java:

SPMDThread spmdthread[] = new SPMDThread[NumT];


for (i = 0; i < NumT; i++) {
spmdthread[i] = new SPMDThread();
spmdthread[i].start();
}

A simple high-level construct for creating multiple threads simultaneously is more de-
sirable. Furthermore, if thread creation is expensive and thread startup is not well synchro-
nized across the parallel machine, the serialization will result in idle time that degrades the
speedup of a program.
Secondly, implementing the producer–consumer data sharing using the Java
synchronized statements, wait, notify and notifyAll, is problematic for the fol-
lowing reasons. Because notifys are not queued, it is complicated to enforce a specific
ordering. Consider the producer–consumer code in Figure 3: since there is no guarantee
that the Produce method will be invoked after the Consume method, an early notify
will be lost. A programmer must therefore add mutual exclusion and auxiliary variables to
SPMD PROGRAMMING IN JAVA 627

public void Producer {


...
synchronized(spmdthread[my id+1]) {
spmdthread[my id+1].notify();
}
...
}
public void Consumer {
...
synchronized(spmdthread[my id]) {
spmdthread[my id].wait();
}
...
}

Figure 3. Incorrect Java producer–consumer code

enforce the correct ordering, resulting in a low-level style of programming. It is not clear
that a compiler can always generate efficient code for these producer–consumer methods.
Another problem is that the implementation of notify selects a random thread on
the associated wait queue of an object; however, producer–consumer semantics requires
that the notify selects a specific thread from the wait queue. Therefore, to implement
producer–consumer synchronization, we use a separate object for each producer–consumer
pair to ensure that only one thread can appear on the wait queue.

public void Produce(Semaphore mySem) {


synchronized (mySem) {
if (mySem.flag == 1) {
mySem.counter = 0; mySem.notify();
} else {
mySem.wait();
mySem.counter = 0;
}
}
}
public void Consume(Semaphore mySem) {
synchronized (mySem) {
mySem.counter = 1;
mySem.notify();
mySem.wait();
}
}
Figure 4. Java producer–consumer methods
628 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

The language specification[11] claims that wait and notify are ‘. . . especially ap-
propriate in situations where threads have a producer–consumer relationship’; however,
this mainly applies to bounded buffer applications where it is not necessary to associate
a producer with a specific consumer. Producer–consumer Java methods are given in [21],
for example, that use synchronized statements and methods to implement a linked list. A
somewhat simpler Java implementation of producer–consumer methods that adheres to our
semantics is given in Figure 4.
Collective communication in Java must also employ mutual exclusion and auxiliary
variables. An example barrier code is given in Figure 5.

public void Barrier(Semaphore barrier) {


synchronized(barrier) {
barrier.counter -= 1;
if barrier.counter > 0 {
barrier.wait();
else {
barrier.counter = num threads;
barrier.notifyAll();
}
}
Figure 5. Java barrier code

Note that the RTS can capitalize on the MPI library to implement the Java producer–
consumer data sharing and collective communications. In this case, the message passing
routines are C functions with Java wrappers and can be invoked as native methods in a
data-parallel Java program.
A data-parallel version of the main loop of the SOR code from Figure 1 is given in
Figure 6. The main loop is executed by each thread on its local partition of the valuesOld
and valuesNew arrays. Each thread computes valuesNew using its valuesOld and the
boundary valuesOld elements of it neighbors. The thread then copies valuesNew into
valuesOld for the next outer-loop iterate. The individual producer–consumer sharing is
subsumed by barriers. The code in Figure 6 could be made more data parallel using the
native library methods, e.g. forAll, that we propose in the next Section.

5. SPMD JAVA LIBRARY INTERFACE


For efficiency, we opted to localize shared data structures in our SPMD-Java library; thus,
each thread has a local partition of each shared array. Localizing shared data structures is a
natural extension of the per-thread working-memory model supported by Java. This design
choice necessitates that threads explicitly communicate. As discussed in Section 3.1, to
further enable optimizations, it is desirable to designate shared variables as such. Library
methods for creating thread arrays and for establishing communication links between
them according to various topologies are provided. The library also includes methods for
producer–consumer data sharing and for collective communication. The proposed library
methods are given in Tables 1 and 2 and are described in more detail below.
SPMD PROGRAMMING IN JAVA 629

public void SPMD SORloop()


{
int i, t, N;
double y, z;
N = SPMDPsor.N/spmd num threads();
for (t = 0; t < SPMDsor.ConvTime; t++) {
for (i = 0; i < N; i++) {
if (i != 0)
if (thread id() != 0) {
z = SPMDsor.threads[thread id()-1].valuesOld[N-1];
y = (z + valuesOld[i+1])/2;
} else
y = valuesOld[i+1]/2;
else if (i == N - 1)
if (thread id() != N - 1) {
z = SPMDsor.threads[thread id()+1].valuesOld[0];
y = (z + valuesOld[i-1])/2;
else
y = valuesOld[i-1]/2;
else
y = (valuesOld[i-1] + valuesOld[i+1])/2;
y = max(obstacle[i], valuesOld[i]+SPMDsor.omega*(y-valuesOld[i]));
valuesNew[i] = y;
}
Barrier(SPMDsor.barrier);
for (i = 0; i < N; i++) {
valuesOld[i] = valuesNew[i];
}
Barrier(SPMDsor.barrier);
}
}

Figure 6. SPMD Java SOR code for evaluating American options

The library methods for managing threads are given in Table 1. The start all method
implements a forAll loop, and is parameterized with a keyword indicating how shared
data-structures are localized and, in particular, how initial values are assigned. Static distri-
butions are blocked – that is, each thread is given an equal-sized consecutive chunk of each
array dimension. Dynamic distributions are also blocked, but elements may be migrated
during execution by the RTS to improve load balancing. Our RTS supports fractiling[22]
dynamic scheduling, which allocates iterates in decreasing size chunks.
The spmd setProducerMap and spmd setConsumerMap methods create thread com-
munication channels for an arbitrary topology. We also provide library methods to spec-
ify several common communication channel topologies, such as one-, two- and three-
dimensional grids or a binary tree (spmd thread 1dgrid, spmd thread 2dgrid,
spmd thread 3dgrid, spmd thread tree).
The library methods for thread communication are given in Table 2. The spmd produce
and spmd consume methods are parameterized by a keyword specifying which thread the
caller is being synchronized with. The parameter can be a specific thread or a logical thread
based on the ThreadGroup topology. There are methods for broadcasting and multicasting
variables to threads in a ThreadGroup. A barrier method is provided to synchronize
phases of a parallel algorithm. The library includes methods for common reduction and
scan operations, which are the cornerstones of many parallel algorithms[23]. A reduction
630 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

applies a binary operation to reduce the elements of an array into a single element, while a
scan applies a binary operation to reduce subsequences of the elements of an array.

Table 1. Thread management methods

Name Function

create spmd groups create an SPMD thread group


start all (static, dynamic) create threads and begin their execution
thread id return thread ID
num threads return number of threads
spmd thread 1dgrid define a linear array of producer–consumer
spmd thread 2dgrid define a 2D grid of producer–consumer
spmd thread 3dgrid define a 3D array of producer–consumer
spmd thread tree define a tree of producer–consumer
spmd setProducerMap, define an arbitrary producer–consumer topology
spmd setConsumerMap

Table 2. Thread communication methods

Name Function

spmd produce signal to a specific consumer


spmd consume wait for a signal from specific producer
spmd barrier wait for all threads to arrive
spmd gather compact a sparse array into local arrays
spmd scatter uncompact a sparse array into local arrays
spmd permute permute the local protions of an array
spmd barrier wait for all threads to arrive
spmd reduce:
(add,multiply,or,and,max,min) global reduction
spmd scan:
(add,multiply,or,and,max,min) parallel prefix

6. CONCLUSION
The concurrent constructs of Java were selected to facilitate the writing of interactive
and Internet applications. In this paper, we explored writing SPMD programs in Java,
and demonstrated that it is possible, if somewhat awkward, with the current language
specification. We identified several features that would make expressing SPMD parallelism
in Java more natural, for example, by including keywords to specify that threads are to be
started simultaneously, and by requiring that notifys be queued. In lieu of the addition of
these features, we propose that a standard library of SPMD routines be adopted. This paper
is an initial step towards this end.
Since the goal of parallel computing is primarily efficiency, there are other changes to the
language that are desirable for SPMD computing. One of the most important of these is the
explicit declaration of shared variables and the relaxing of the Java memory consistency
model, so that more code optimizations can be performed.
SPMD PROGRAMMING IN JAVA 631

REFERENCES
1. J. Dewynne, P. Wilmott and S. Howison, Option Pricing Mathematical Models and Computation,
Oxford Financial Press, 1993.
2. T. Agerwala, J. L. Martin, J. H. Mizra, D. C. Sadler, D. M. Dias and M. Snir, ‘SP2 system
architecture’, IBM Syst. J., 2(34),152–184 (1995).
3. MPI Forum, ‘Document for a standard message passing interface’, Technical Report CS-93-214,
University of Tennessee, November 1993.
4. P. Brinch Hansen, ‘The programming language Concurrent Pascal’, IEEE Trans. Softw. Eng.,
1(2), 199–206 (1975).
5. Parallel Computing Forum, ‘PCF Parallel FORTRAN extensions’, Special issue, FORTRAN
Forum, 10(3), (1991).
6. IBM, Parallel Fortran Language and Library Reference, March 1988. Pub. No. SC23-0431-0.
7. S. Flynn Hummel and R. Kelly, ‘A rationale for massively parallel programming with sets’,
J. Program. Lang., 1, (1993).
8. I. Foster, R. Olsen and S. Tuecke, ‘Programming in Fortran M, version 1.0’, Technical Report,
Argonne National Laboratory, October 1993.
9. H. F. Jordan, M. S. Benten, G. Alaghband and R. Jakob, ‘The force: A highly portable parallel
programming language’, in E. C. Plachy and Peter M. Kogge (Eds.), Proc. 1989 International
Conf. on Parallel Processing, vol. II, St. Charles, IL, August 1989, pp. II-112–II-117.
10. High Performance Fortran Forum, ‘High Performance Fortran language specification, version
1.0’, Technical Report CRPC-TR92225, Rice University, May 1993.
11. B. Joy, J. Gosling and G. Steele, The Java Language Specification, Addison-Wesley, 1996.
12. T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison-Wesley, 1997.
13. Leslie Lamport, ‘How to make a multiprocessor computer that correctly executes multiprocess
programs’, IEEE Trans. Comput., C-28(9), 690–691 (1979).
14. Michel Dubois, Christoph Scheurich and Faye Briggs, ‘Memory access buffering in multipro-
cessors’, in Conf. Proc. 13th Annual International Symp. on Computer Architecture, Tokyo,
June 1986, pp. 434–442.
15. Manish Gupta, Edith Schonberg and Harini Srinivasan, ‘A unified data-flow framework for
optimizing communication’, IEEE Trans. Parallel Distrib. Syst., 7 (7), (1996).
16. S. P. Amarasinghe and M. S. Lam, ‘Communication optimization and code generation for
distributed memory machines’, in Proc. ACM SIGPLAN ’93 Conference on Programming
Language Design and Implementation, Albuquerque, NM, June 1993.
17. Ada 95 Rationale, Intermetrics Inc, 1995.
18. S. Flynn Hummel, R. B. K. Dewar and E. Schonberg, ‘A storage model for Ada on hierarchical-
memory multiprocessors’, in A. Alverez (Ed.), Proc. of the Ada-Europe Int. Conf., Cambridge
University Press, 1989.
19. Harini Srinivasan and Michael Wolfe, ‘Analyzing programs with explicit parallelism’, in Utpal
Banerjee, David Gelernter, Alexandru Nicolau and David A. Padua (Eds.), Languages and
Compilers for Parallel Computing, Springer-Verlag, 1992, pp. 405–419.
20. Robert Strom and Kenneth Zadeck, personal communication, March 1996.
21. D. Lea, Concurrent Programming in Java, Addison-Wesley, 1997.
22. S. Flynn Hummel, I. Banicescu, C. Wang and J. Wein, ‘Load balancing and data locality
via fractiling: An experimental study’, in Boleslaw K. Szymanski and Balaram Sinharoy (Eds.),
Proc. Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers,
Kluwer Academic Publishers, Boston, MA, 1995, pp. 85–89.
23. G. Blelloch, ‘Scan primitives and parallel vector models’, Ph.D. Dissertation MIT/LCS/TR-463,
MIT, October 1989.

You might also like