0% found this document useful (0 votes)

3 views11 pages

Hummel 1997

Uploaded by

Flavio Bolsonaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

Hummel 1997

Uploaded by

Flavio Bolsonaro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CONCURRENCY: PRACTICE AND EXPERIENCE, VOL.

9(6), 621–631 (JUNE 1997)

SPMD programming in Java

SUSAN FLYNN HUMMEL∗, TON NGO AND HARINI SRINIVASAN
IBM T. J. Watson Research Center, P.O. Box 217, Yorktown Heights, NY 10598, U.S.A.

SUMMARY
We consider the suitability of the Java concurrent constructs for writing high-performance
SPMD code for parallel machines. More specifically, we investigate implementing a financial
application in Java on a distributed-memory parallel machine. Despite the fact that Java was
not expressly targeted to such applications and architectures per se, we conclude that efficient
implementations are feasible. Finally, we propose a library of Java methods to facilitate SPMD
programming. 1997 by John Wiley & Sons, Ltd.

1. MOTIVATION
Although Java was not specifically designed as a high-performance parallel-computing
language, it does include concurrent objects (threads), and its widespread acceptance
makes it an attractive candidate for writing portable computationally-intensive parallel
applications. In particular, Java has become a popular choice for numerical financial codes,
an example of which is arbitrage – detecting when the buying and selling of securities
is temporarily profitable. These applications involve sophisticated modeling techniques
such as successive over-relaxation (SOR) and Monte Carlo methods[1]. Other numerical
financial applications include data mining (pattern discovery) and cryptography (secure
transactions).
In this paper, we use an SOR code for evaluating American options (see Figure 1)[1], to
explore the suitability of using Java as a high-performance parallel-computing language.
This work is being conducted in the context of a research effort to implement a Java run-
time system (RTS) for the IBM POWERparallel System SP machine[2], which is designed
to effectively scale to large numbers of processors. The RTS is being written in C with
calls to MPI (message passing interface)[3] routines. Plans are to move to a Java plus MPI
version when one becomes available.
The typical programming idiom for highly parallel machines is called data-parallel or
single-program multiple-data (SPMD), where the data provide the parallel dimension.
Parallelism is conceptually specified as a loop whose iterates operate on elements of a,
perhaps multidimensional, array. Data dependences between parallel-loop iterates lead
to a producer–consumer type of sharing, wherein one iterate writes variables that are
later read by another, or collective communication, wherein all iterates participate. The
communication pattern between iterates is often very regular, for example a bidirectional
flow of variables between consecutive iterates (as in the code in Figure 1).
This paper explores the suitability of the Java concurrency constructs for writing SPMD
programs. In particular, the paper:
1. identifies the differences between the parallelism supported by Java and data paral-
lelism
∗ Also with Polytechnic University and Cornell Theory Centre.

CCC 1040–3108/97/060621–11 $17.50 Received 1 February 1997

1997 by John Wiley & Sons, Ltd.
622 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

2. discusses compiler optimizations of Java programs, and the limits imposed on such
optimizations because of the memory-consistency model defined at the Java language
level
3. discusses the key features of data-parallel programming, and how these features can
be implemented using the Java concurrent and synchronization constructs
4. identifies a set of library methods that facilitate data parallel programming in Java.

public void SORloop()

{
int i, k;
double y;
for (int t = 0; t < Psor.Convtime; t++) {
error = 0.0;
for (i = 0; i < Psor.N-1; i++) {
if (i == 0)
y = values[i+1]/2;
else if (i == Psor.N-1)
y = values[i-1]/2;
else
y = (values[i-1] + values[i+1])/2;
y = max(obstacle[i], values[i]+Psor.omega*(y-values[i]));
values[i] = y;
}
}
}

Figure 1. Java SOR code for evaluating American options

1.1. Organization of the paper

In Section 2, we elaborate on the differences between data parallelism and the type of
parallelism that Java was designed to support, which we call control parallelism. In the
subsequent Sections, we discuss the concurrent programming features of Java, and their
suitability for writing data-parallel programs. Section 3 summarizes the Java parallel pro-
gramming model, and its impact on performance optimizations in multi-threaded Java
programs. In Section 4, we consider data-parallel programming using the concurrent and
synchronization constructs supported by Java; we illustrate data parallel programming in
Java using a Successive Over Relaxation code for evaluating American options. Finally, in
Section 5, we propose a library of methods that would facilitate SPMD-programming in
Java on the SP.

2. CONTROL VS. DATA PARALLELISM

Although there is a continuum in the types of parallelism found in systems and numeric
applications, it is instructive to compare the two extremes of what we call control and data
SPMD PROGRAMMING IN JAVA 623

parallelism (respectively):

1. Parallelism in systems: Parallel constructs were originally incorporated into program-

ming languages to express the inherent concurrency in uniprocessor operating sys-
tems[4]. These constructs were later effectively used to program multiple-processor
and distributed systems[5–9]. This type of parallelism typically involves heteroge-
neous, ‘heavy weight’ processes, i.e. each process executes different code sequences
and demarcates a protection domain. The spawning of processes usually does not
follow a regular pattern during program execution, and processes compete for re-
sources, for example, a printer. It is thus important to enforce mutually exclusive
access to resources; however, it is often not necessary to impose an order on accesses
to resources – the system functions correctly as long as only one process accesses a
resource at a time.
2. Parallelism in numerical applications: The goal of parallel numeric applications
is performance, and the type of parallelism differs significantly from that found in
parallel and distributed systems. Data parallelism[10] is homogeneous, ‘light weight’,
and regular – for example, the iterates of a forAll loop operating on an array. Parallel
iterates are co-operating to solve a problem, and hence, it is important to ensure that
events, such as the writing and reading of variables, occur in the correct order. Thus,
producer–consumer access to shared data must be enforced. Iterates may also need
to be synchronized collectively, for example to enforce a barrier between phases of
execution or to reduce the elements of an array into a single value using a binary
operation (such as addition or maximum). A barrier can sometimes collectively
subsume individual produce–consume synchronization between iterates.

As discussed in the next Section, the Java parallel constructs were designed for interactive
and Internet (distributed) programming, and lie somewhere between the above two ex-
tremes. However, with some minor modifications SPMD programming could be expressed
in Java more naturally.

3. THE JAVA PARALLEL PROGRAMMING MODEL

Based on the classification described in the previous Section, the parallel programming
model provided in Java fits best under control parallelism. This Section presents a high-
level view of the support in Java for parallelism; for further details, the reader is referred
to the Java language specification[11].
Java has built-in classes and methods for writing multi-threaded programs. Threads are
created and managed explicitly as objects; they perform the conventional thread operations
such as start, run, suspend, resume, stop, etc. The ThreadGroup class allows a user to
name and manage a collection of threads. The grouping can be nested but cannot overlap;
in other words, a nested thread group forms a hierarchical tree. A ThreadGroup can be
used to collectively suspend, resume, stop and destroy all thread members; however, this
class does not provide methods to create or start all thread members simultaneously.
Java adopts a shared-memory model, allowing a public object to be directly accessi-
ble by any thread. The shared-memory model is easily mapped onto uniprocessor and
shared-memory multiprocessor machines. It can also be implemented on disjoint-memory
multicomputers, such as the SP, using message passing to implement a virtual shared mem-
624 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

ory. For distributed machines, Java’s Remote Method Invocation provides an RPC-like
interface that is appropriate for coarse grained parallelism.
Java threads synchronize through statements or methods that have a synchronized
attribute. These statements and methods function as monitors: only one thread is allowed to
execute synchronized code that access the same variables at a time. Locks are not provided
explicitly in the language, but are employed implicitly in the implementation of monitors
(they are embedded in the bytecode instructions monitorenter and monitorexit[12]).
Conceptually, each object has a lock and a wait queue associated with it; consequently, a
synchronized block of code is associated with the objects it accesses.
When two threads compete for entry to synchronized codes by attempting to lock the
same object, the thread that successfully acquires the lock continues execution, while the
other thread is placed on the wait queue of the object. When executing synchronized code,
a thread can explicitly transfer control by using notify, notifyAll and wait methods.
notify moves one randomly chosen thread from the wait queue of the associated object to
the run queue, while notifyAll moves all threads from the wait queue to the run queue.
If the thread owning the monitor executes a wait, it relinquishes the lock and places itself
on its wait queue, allowing another thread to enter the monitor.
The Java shared-memory model is not the sequentially consistent model[13], which
is commonly used for writing multi-threaded programs. Instead, Java employs a form of
weak consistency[14], to allow for shared-data optimization opportunities. In the sequential
consistency model, any update to a shared variable must be visible to all other threads. Java
allows a weaker memory model wherein an update to a shared variable is only guaranteed
to be visible to other threads when they execute synchronized code. For instance, a value
assigned by a thread to a variable outside a synchronized statement may not be immediately
visible to other threads because the implementation may elect to cache the variable until
the next synchronization point (with some restrictions). The following subsection discusses
the impact of the Java shared variable rules on compiler optimizations.

3.1. Optimizing memory accesses in multi-threaded programs

The goal of SPMD programming is high performance, and exploiting data locality is
arguably one of the most important optimizations on parallel machines. The ability to per-
form data locality optimizations, such as caching and prefetching, depends on the under-
lying language memory model. Communication optimizations[15,16] require an analysis
of memory-access patterns in an SPMD program, and therefore, the ability to perform
communication optimizations also depends on the language memory model.
The Java memory model[11] is illustrated in Figure 2. Every Java thread has, in addition to
a local stack, a working memory in which it can temporarily keep copies of the variables that
it accesses. The main memory contains the master copy of all variables. In the terminology
of Java, the interactions between the stack and the working memory are assign, use, lock
and unlock; the interactions between the working memory and the main memory are store
and load, and those within the main memory are write and read.
The Java specification[11] states that an ‘. . . implementation is free to update main
memory in an order that may be surprising’. The intention is to allow a Java implementation
to maintain the value of a variable in the working memory of a thread; a user must use
the synchronized keyword to ensure that data updates are visible to other threads, i.e.
the main shared memory is made consistent. Alternatively, a variable can be declared
SPMD PROGRAMMING IN JAVA 625

Java Virtual Machine Main Memory

Stack Working memory

assign store write

use load read

Figure 2. Java memory abstraction

volatile so that it will not be cached by any thread. Any update to a volatile variable
by a thread is immediately visible to all other threads; however, this disables some code
optimizations.1
Java defines the required ordering and coupling of the interactions between the memories
(the complete set of rules is complex and is not repeated here). For instance, the read/write
actions to be performed by the main memory for a variable must be executed in the order
of arrivals. Likewise, the rules call for the main memory to be updated at synchronization
points by invalidating all copies in the working memory of a thread on entry to a monitor
(monitorenter) and by writing to main memory all newly generated values on exit from
a monitor (monitorexit).
Although the Java shared data model is weakly consistent, it is not weak enough to allow
the oblivious caching and out of order access of variables between synchronization points.
For example, Java requires that (a) the updates by one thread be totally ordered and (b) the
updates to any given variable/lock be totally ordered. Other languages, such as Ada[17],
only require that updates to shared variables be effective at points where threads (tasks
in Ada) explicitly synchronize, and therefore, an RTS can make better use of memory
hierarchies[18]. A Java implementation does not have enough freedom to fully exploit
optimization opportunities provided by caches and local memories. A stronger memory
consistency model at the language level forces an implementation to adopt at least the
same, if not a stronger, memory consistency model. A strong consistency model at the
implementation level increases the complexity of code optimizations – the compiler/RTS
has to consider interactions between threads at all points where shared variables are updated,
not just at synchronization points[19].
1 Note that the term volatile adopted by Java designates a variable as noncacheable, whereas in parallel pro-
gramming this term has traditionally meant that a variable is cacheable but may be updated by other processors[17].
626 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

If whole program information is available, then a compiler/RTS can determine which

variables are shared, and which shared variables are not updated in synchronized code.
Such information is necessary to perform many standard compiler optimizations, e.g. code
motion[20], in the presence of threads and synchronization. With the separate compilation
of packages, where whole program information is not always available, it is difficult to
distinguish between shared and private variables. In such situations, therefore, several
common code optimizations are effectively disabled.

4. DATA PARALLELISM IN JAVA

Data parallelism arises from simultaneous operations on disjoint partitions of data by mul-
tiple processors. Conceptually, each thread executes on a different virtual processor and
accesses its own partition of the data. Synchronization is needed to enforce the ordering
necessary for correctness. Communication is necessary when a thread has to access a data
partition owned by a different thread, and, as described earlier, the producer/consumer rela-
tionship conveniently encapsulates both of these requirements. Determining the producer
(the thread that generates the data) and the consumer (the thread that accesses the data) is
straightforward since the relationship is specified by the algorithm. The following features
are idiomatic in data-parallel programming:

1. creating and starting multiple threads simultaneously

2. producer–consumer operation for synchronization and data communication between
threads
3. collective communication/synchronization between a set of threads.

Although Java provides language constructs for parallelism, there are several factors that
make expressing data parallelism in Java awkward. First, as noted in the previous Section,
there is no mechanism for creating and starting threads in a ThreadGroup simultaneously.
A forAll must be implemented as a serial loop in Java:

SPMDThread spmdthread[] = new SPMDThread[NumT];

for (i = 0; i < NumT; i++) {
spmdthread[i] = new SPMDThread();
spmdthread[i].start();
}

A simple high-level construct for creating multiple threads simultaneously is more de-
sirable. Furthermore, if thread creation is expensive and thread startup is not well synchro-
nized across the parallel machine, the serialization will result in idle time that degrades the
speedup of a program.
Secondly, implementing the producer–consumer data sharing using the Java
synchronized statements, wait, notify and notifyAll, is problematic for the fol-
lowing reasons. Because notifys are not queued, it is complicated to enforce a specific
ordering. Consider the producer–consumer code in Figure 3: since there is no guarantee
that the Produce method will be invoked after the Consume method, an early notify
will be lost. A programmer must therefore add mutual exclusion and auxiliary variables to
SPMD PROGRAMMING IN JAVA 627

public void Producer {

...
synchronized(spmdthread[my id+1]) {
spmdthread[my id+1].notify();
}
...
}
public void Consumer {
...
synchronized(spmdthread[my id]) {
spmdthread[my id].wait();
}
...
}

Figure 3. Incorrect Java producer–consumer code

enforce the correct ordering, resulting in a low-level style of programming. It is not clear
that a compiler can always generate efficient code for these producer–consumer methods.
Another problem is that the implementation of notify selects a random thread on
the associated wait queue of an object; however, producer–consumer semantics requires
that the notify selects a specific thread from the wait queue. Therefore, to implement
producer–consumer synchronization, we use a separate object for each producer–consumer
pair to ensure that only one thread can appear on the wait queue.

public void Produce(Semaphore mySem) {

synchronized (mySem) {
if (mySem.flag == 1) {
mySem.counter = 0; mySem.notify();
} else {
mySem.wait();
mySem.counter = 0;
}
}
}
public void Consume(Semaphore mySem) {
synchronized (mySem) {
mySem.counter = 1;
mySem.notify();
mySem.wait();
}
}
Figure 4. Java producer–consumer methods
628 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

The language specification[11] claims that wait and notify are ‘. . . especially ap-
propriate in situations where threads have a producer–consumer relationship’; however,
this mainly applies to bounded buffer applications where it is not necessary to associate
a producer with a specific consumer. Producer–consumer Java methods are given in [21],
for example, that use synchronized statements and methods to implement a linked list. A
somewhat simpler Java implementation of producer–consumer methods that adheres to our
semantics is given in Figure 4.
Collective communication in Java must also employ mutual exclusion and auxiliary
variables. An example barrier code is given in Figure 5.

public void Barrier(Semaphore barrier) {

synchronized(barrier) {
barrier.counter -= 1;
if barrier.counter > 0 {
barrier.wait();
else {
barrier.counter = num threads;
barrier.notifyAll();
}
}
Figure 5. Java barrier code

Note that the RTS can capitalize on the MPI library to implement the Java producer–
consumer data sharing and collective communications. In this case, the message passing
routines are C functions with Java wrappers and can be invoked as native methods in a
data-parallel Java program.
A data-parallel version of the main loop of the SOR code from Figure 1 is given in
Figure 6. The main loop is executed by each thread on its local partition of the valuesOld
and valuesNew arrays. Each thread computes valuesNew using its valuesOld and the
boundary valuesOld elements of it neighbors. The thread then copies valuesNew into
valuesOld for the next outer-loop iterate. The individual producer–consumer sharing is
subsumed by barriers. The code in Figure 6 could be made more data parallel using the
native library methods, e.g. forAll, that we propose in the next Section.

5. SPMD JAVA LIBRARY INTERFACE

For efficiency, we opted to localize shared data structures in our SPMD-Java library; thus,
each thread has a local partition of each shared array. Localizing shared data structures is a
natural extension of the per-thread working-memory model supported by Java. This design
choice necessitates that threads explicitly communicate. As discussed in Section 3.1, to
further enable optimizations, it is desirable to designate shared variables as such. Library
methods for creating thread arrays and for establishing communication links between
them according to various topologies are provided. The library also includes methods for
producer–consumer data sharing and for collective communication. The proposed library
methods are given in Tables 1 and 2 and are described in more detail below.
SPMD PROGRAMMING IN JAVA 629

public void SPMD SORloop()

{
int i, t, N;
double y, z;
N = SPMDPsor.N/spmd num threads();
for (t = 0; t < SPMDsor.ConvTime; t++) {
for (i = 0; i < N; i++) {
if (i != 0)
if (thread id() != 0) {
z = SPMDsor.threads[thread id()-1].valuesOld[N-1];
y = (z + valuesOld[i+1])/2;
} else
y = valuesOld[i+1]/2;
else if (i == N - 1)
if (thread id() != N - 1) {
z = SPMDsor.threads[thread id()+1].valuesOld[0];
y = (z + valuesOld[i-1])/2;
else
y = valuesOld[i-1]/2;
else
y = (valuesOld[i-1] + valuesOld[i+1])/2;
y = max(obstacle[i], valuesOld[i]+SPMDsor.omega*(y-valuesOld[i]));
valuesNew[i] = y;
}
Barrier(SPMDsor.barrier);
for (i = 0; i < N; i++) {
valuesOld[i] = valuesNew[i];
}
Barrier(SPMDsor.barrier);
}
}

Figure 6. SPMD Java SOR code for evaluating American options

The library methods for managing threads are given in Table 1. The start all method
implements a forAll loop, and is parameterized with a keyword indicating how shared
data-structures are localized and, in particular, how initial values are assigned. Static distri-
butions are blocked – that is, each thread is given an equal-sized consecutive chunk of each
array dimension. Dynamic distributions are also blocked, but elements may be migrated
during execution by the RTS to improve load balancing. Our RTS supports fractiling[22]
dynamic scheduling, which allocates iterates in decreasing size chunks.
The spmd setProducerMap and spmd setConsumerMap methods create thread com-
munication channels for an arbitrary topology. We also provide library methods to spec-
ify several common communication channel topologies, such as one-, two- and three-
dimensional grids or a binary tree (spmd thread 1dgrid, spmd thread 2dgrid,
spmd thread 3dgrid, spmd thread tree).
The library methods for thread communication are given in Table 2. The spmd produce
and spmd consume methods are parameterized by a keyword specifying which thread the
caller is being synchronized with. The parameter can be a specific thread or a logical thread
based on the ThreadGroup topology. There are methods for broadcasting and multicasting
variables to threads in a ThreadGroup. A barrier method is provided to synchronize
phases of a parallel algorithm. The library includes methods for common reduction and
scan operations, which are the cornerstones of many parallel algorithms[23]. A reduction
630 S. F. HUMMEL, T. NGO AND H. SRINIVASAN

applies a binary operation to reduce the elements of an array into a single element, while a
scan applies a binary operation to reduce subsequences of the elements of an array.

Table 1. Thread management methods

Name Function

create spmd groups create an SPMD thread group

start all (static, dynamic) create threads and begin their execution
thread id return thread ID
num threads return number of threads
spmd thread 1dgrid define a linear array of producer–consumer
spmd thread 2dgrid define a 2D grid of producer–consumer
spmd thread 3dgrid define a 3D array of producer–consumer
spmd thread tree define a tree of producer–consumer
spmd setProducerMap, define an arbitrary producer–consumer topology
spmd setConsumerMap

Table 2. Thread communication methods

Name Function

spmd produce signal to a specific consumer

spmd consume wait for a signal from specific producer
spmd barrier wait for all threads to arrive
spmd gather compact a sparse array into local arrays
spmd scatter uncompact a sparse array into local arrays
spmd permute permute the local protions of an array
spmd barrier wait for all threads to arrive
spmd reduce:
(add,multiply,or,and,max,min) global reduction
spmd scan:
(add,multiply,or,and,max,min) parallel prefix

6. CONCLUSION
The concurrent constructs of Java were selected to facilitate the writing of interactive
and Internet applications. In this paper, we explored writing SPMD programs in Java,
and demonstrated that it is possible, if somewhat awkward, with the current language
specification. We identified several features that would make expressing SPMD parallelism
in Java more natural, for example, by including keywords to specify that threads are to be
started simultaneously, and by requiring that notifys be queued. In lieu of the addition of
these features, we propose that a standard library of SPMD routines be adopted. This paper
is an initial step towards this end.
Since the goal of parallel computing is primarily efficiency, there are other changes to the
language that are desirable for SPMD computing. One of the most important of these is the
explicit declaration of shared variables and the relaxing of the Java memory consistency
model, so that more code optimizations can be performed.
SPMD PROGRAMMING IN JAVA 631

REFERENCES
1. J. Dewynne, P. Wilmott and S. Howison, Option Pricing Mathematical Models and Computation,
Oxford Financial Press, 1993.
2. T. Agerwala, J. L. Martin, J. H. Mizra, D. C. Sadler, D. M. Dias and M. Snir, ‘SP2 system
architecture’, IBM Syst. J., 2(34),152–184 (1995).
3. MPI Forum, ‘Document for a standard message passing interface’, Technical Report CS-93-214,
University of Tennessee, November 1993.
4. P. Brinch Hansen, ‘The programming language Concurrent Pascal’, IEEE Trans. Softw. Eng.,
1(2), 199–206 (1975).
5. Parallel Computing Forum, ‘PCF Parallel FORTRAN extensions’, Special issue, FORTRAN
Forum, 10(3), (1991).
6. IBM, Parallel Fortran Language and Library Reference, March 1988. Pub. No. SC23-0431-0.
7. S. Flynn Hummel and R. Kelly, ‘A rationale for massively parallel programming with sets’,
J. Program. Lang., 1, (1993).
8. I. Foster, R. Olsen and S. Tuecke, ‘Programming in Fortran M, version 1.0’, Technical Report,
Argonne National Laboratory, October 1993.
9. H. F. Jordan, M. S. Benten, G. Alaghband and R. Jakob, ‘The force: A highly portable parallel
programming language’, in E. C. Plachy and Peter M. Kogge (Eds.), Proc. 1989 International
Conf. on Parallel Processing, vol. II, St. Charles, IL, August 1989, pp. II-112–II-117.
10. High Performance Fortran Forum, ‘High Performance Fortran language specification, version
1.0’, Technical Report CRPC-TR92225, Rice University, May 1993.
11. B. Joy, J. Gosling and G. Steele, The Java Language Specification, Addison-Wesley, 1996.
12. T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison-Wesley, 1997.
13. Leslie Lamport, ‘How to make a multiprocessor computer that correctly executes multiprocess
programs’, IEEE Trans. Comput., C-28(9), 690–691 (1979).
14. Michel Dubois, Christoph Scheurich and Faye Briggs, ‘Memory access buffering in multipro-
cessors’, in Conf. Proc. 13th Annual International Symp. on Computer Architecture, Tokyo,
June 1986, pp. 434–442.
15. Manish Gupta, Edith Schonberg and Harini Srinivasan, ‘A unified data-flow framework for
optimizing communication’, IEEE Trans. Parallel Distrib. Syst., 7 (7), (1996).
16. S. P. Amarasinghe and M. S. Lam, ‘Communication optimization and code generation for
distributed memory machines’, in Proc. ACM SIGPLAN ’93 Conference on Programming
Language Design and Implementation, Albuquerque, NM, June 1993.
17. Ada 95 Rationale, Intermetrics Inc, 1995.
18. S. Flynn Hummel, R. B. K. Dewar and E. Schonberg, ‘A storage model for Ada on hierarchical-
memory multiprocessors’, in A. Alverez (Ed.), Proc. of the Ada-Europe Int. Conf., Cambridge
University Press, 1989.
19. Harini Srinivasan and Michael Wolfe, ‘Analyzing programs with explicit parallelism’, in Utpal
Banerjee, David Gelernter, Alexandru Nicolau and David A. Padua (Eds.), Languages and
Compilers for Parallel Computing, Springer-Verlag, 1992, pp. 405–419.
20. Robert Strom and Kenneth Zadeck, personal communication, March 1996.
21. D. Lea, Concurrent Programming in Java, Addison-Wesley, 1997.
22. S. Flynn Hummel, I. Banicescu, C. Wang and J. Wein, ‘Load balancing and data locality
via fractiling: An experimental study’, in Boleslaw K. Szymanski and Balaram Sinharoy (Eds.),
Proc. Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers,
Kluwer Academic Publishers, Boston, MA, 1995, pp. 85–89.
23. G. Blelloch, ‘Scan primitives and parallel vector models’, Ph.D. Dissertation MIT/LCS/TR-463,
MIT, October 1989.

Java Algorithms for Beginners: A Practical Guide with Examples
From Everand
Java Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Concurrent and Real-Time Programming in Java
No ratings yet
Concurrent and Real-Time Programming in Java
1,030 pages
Timer Control Series: User Instructions
0% (1)
Timer Control Series: User Instructions
2 pages
Seminar HPJava
No ratings yet
Seminar HPJava
36 pages
Java Data Structures Explained: A Practical Guide with Example
From Everand
Java Data Structures Explained: A Practical Guide with Example
William E. Clark
No ratings yet
Java OOP Simplified: A Practical Guide with Examples
From Everand
Java OOP Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
HPJava Seminar Report
No ratings yet
HPJava Seminar Report
37 pages
Java Concurrency and Multithreading: Unlock the Secrets of Expert-Level Skills
From Everand
Java Concurrency and Multithreading: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Core Java Programming
From Everand
Core Java Programming
Jitendra Patel
4/5 (11)
Advanced Java
From Everand
Advanced Java
Manish Soni
No ratings yet
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
Mastering Computer Programming: A Comprehensive Guide
From Everand
Mastering Computer Programming: A Comprehensive Guide
Kondwani Hara
No ratings yet
JavaScript File Handling from Scratch: A Practical Guide with Examples
From Everand
JavaScript File Handling from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
L12-Principles of Message Passing1
No ratings yet
L12-Principles of Message Passing1
10 pages
Concurrent Programming in Java Design Principles A
No ratings yet
Concurrent Programming in Java Design Principles A
216 pages
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
JavaScript OOP Step by Step: A Practical Guide with Examples
From Everand
JavaScript OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Java Concurrency: From Basics to Expert Proficiency
From Everand
Mastering Java Concurrency: From Basics to Expert Proficiency
William Smith
No ratings yet
OpenMP in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenMP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
From Everand
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
Larry Jones
No ratings yet
Learning Concurrent Programming in Scala
From Everand
Learning Concurrent Programming in Scala
Aleksandar Prokopec
No ratings yet
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
HPJPC Christopher and Thiruvathukal
No ratings yet
HPJPC Christopher and Thiruvathukal
420 pages
Java Functional Programming Made Simple: A Practical Guide with Examples
From Everand
Java Functional Programming Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Module5 Notes Multithread
No ratings yet
Module5 Notes Multithread
17 pages
Unit 3
No ratings yet
Unit 3
49 pages
Mastering Java Concurrency: Essential Techniques
From Everand
Mastering Java Concurrency: Essential Techniques
Ed A Norex
No ratings yet
Concurrency and Multithreading in C: POSIX Threads and Synchronization
From Everand
Concurrency and Multithreading in C: POSIX Threads and Synchronization
Larry Jones
No ratings yet
Parallel Asynchronous Programming Java
No ratings yet
Parallel Asynchronous Programming Java
144 pages
Parallel Programming Models
No ratings yet
Parallel Programming Models
25 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Unit III
No ratings yet
Unit III
31 pages
JavaScript Algorithms Step by Step: A Practical Guide with Examples
From Everand
JavaScript Algorithms Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
3.3-Recent Trends in Parallel Computing
No ratings yet
3.3-Recent Trends in Parallel Computing
12 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Programming Models
No ratings yet
Programming Models
21 pages
Unit 3 Parallel Programming: Structure Nos
No ratings yet
Unit 3 Parallel Programming: Structure Nos
26 pages
Parallel Asynchronous Programming
No ratings yet
Parallel Asynchronous Programming
144 pages
ACA Unit 8 - 1
No ratings yet
ACA Unit 8 - 1
23 pages
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Java Concurrency Patterns: Mastering Multithreading and Asynchronous Techniques
From Everand
Java Concurrency Patterns: Mastering Multithreading and Asynchronous Techniques
Peter Jones
No ratings yet
IGNOU BCA Object-Oriented Technologies and Java Programming Previous Year Solved Papers MCS 024
From Everand
IGNOU BCA Object-Oriented Technologies and Java Programming Previous Year Solved Papers MCS 024
Manish Soni
No ratings yet
PA Midsem
No ratings yet
PA Midsem
20 pages
Unit 3 Complete APP
No ratings yet
Unit 3 Complete APP
49 pages
Unit 2.2 Parallel Programming Models
No ratings yet
Unit 2.2 Parallel Programming Models
23 pages
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
From Everand
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
Theo Houle
No ratings yet
03 (Parallel Software)
No ratings yet
03 (Parallel Software)
38 pages
Thinking About Star
From Everand
Thinking About Star
Francis McCabe
No ratings yet
Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
25 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
JAVA: Java Programming for beginners teaching you basic to advanced JAVA programming skills!
From Everand
JAVA: Java Programming for beginners teaching you basic to advanced JAVA programming skills!
Adam Dodson
No ratings yet
MICRO-I Notes by Dr. Abhishek Sharma
No ratings yet
MICRO-I Notes by Dr. Abhishek Sharma
27 pages
12pulse Rectifier For More Electric Aircraft Applications
No ratings yet
12pulse Rectifier For More Electric Aircraft Applications
6 pages
XILINX XC Series
100% (1)
XILINX XC Series
41 pages
Mobatch
100% (2)
Mobatch
5 pages
TSM Tape Solution v8.1.4
No ratings yet
TSM Tape Solution v8.1.4
276 pages
Komparasi Product Ruijie Vs Hikvision v0.1
No ratings yet
Komparasi Product Ruijie Vs Hikvision v0.1
4 pages
CS621 Week 15
No ratings yet
CS621 Week 15
64 pages
Flavor: A Formal Language For Audio-Visual Object Representation
No ratings yet
Flavor: A Formal Language For Audio-Visual Object Representation
9 pages
VB Book - ThankGod Egbe
50% (2)
VB Book - ThankGod Egbe
503 pages
Operating System Concepts 10th Edition Abraham Silberschatz Instant Download
No ratings yet
Operating System Concepts 10th Edition Abraham Silberschatz Instant Download
75 pages
Module - Unit-1 Intro To Programming
No ratings yet
Module - Unit-1 Intro To Programming
37 pages
Hackaday USSSSSBTalkingUSBFromPython OFlynn PDF
No ratings yet
Hackaday USSSSSBTalkingUSBFromPython OFlynn PDF
165 pages
En781 Brochure 20190905
No ratings yet
En781 Brochure 20190905
3 pages
The Source-Free RC Circuit: V (T) Across The
No ratings yet
The Source-Free RC Circuit: V (T) Across The
58 pages
Vxrail Tech Faq
No ratings yet
Vxrail Tech Faq
40 pages
A Salon Coupon Discount System in C++
No ratings yet
A Salon Coupon Discount System in C++
9 pages
Xamarin Mobile Development For Android Cookbook - Sample Chapter
100% (1)
Xamarin Mobile Development For Android Cookbook - Sample Chapter
37 pages
Shashi Shekhar: Work Experience Skills
No ratings yet
Shashi Shekhar: Work Experience Skills
1 page
ES-8 Version 2 Supplementary Manual
No ratings yet
ES-8 Version 2 Supplementary Manual
2 pages
DocScanner 02-Dec-2022 3-03 PM
No ratings yet
DocScanner 02-Dec-2022 3-03 PM
21 pages
TM1651 V1.1 en
No ratings yet
TM1651 V1.1 en
12 pages
HPE Aruba Networking Foundational Care-A00137490enw
No ratings yet
HPE Aruba Networking Foundational Care-A00137490enw
10 pages
Software Test Report Template
No ratings yet
Software Test Report Template
5 pages
BRKIPM-1261 Multicast
No ratings yet
BRKIPM-1261 Multicast
105 pages
Gebru Netsanet Kassaye 150519190409
No ratings yet
Gebru Netsanet Kassaye 150519190409
65 pages
Check Esx
No ratings yet
Check Esx
52 pages
NE-Q05 Datasheet
No ratings yet
NE-Q05 Datasheet
3 pages
Introduction To Robotics 3rd Edition
No ratings yet
Introduction To Robotics 3rd Edition
3 pages
Peplink Maritime Antenna 40g Datasheet
No ratings yet
Peplink Maritime Antenna 40g Datasheet
13 pages

Hummel 1997

Uploaded by

Hummel 1997

Uploaded by

CONCURRENCY: PRACTICE AND EXPERIENCE, VOL.

9(6), 621–631 (JUNE 1997)

SPMD programming in Java

CCC 1040–3108/97/060621–11 $17.50 Received 1 February 1997

public void SORloop()

Figure 1. Java SOR code for evaluating American options

1.1. Organization of the paper

2. CONTROL VS. DATA PARALLELISM

1. Parallelism in systems: Parallel constructs were originally incorporated into program-

3. THE JAVA PARALLEL PROGRAMMING MODEL

3.1. Optimizing memory accesses in multi-threaded programs

Java Virtual Machine Main Memory

Stack Working memory

use load read

Figure 2. Java memory abstraction

If whole program information is available, then a compiler/RTS can determine which

4. DATA PARALLELISM IN JAVA

1. creating and starting multiple threads simultaneously

SPMDThread spmdthread[] = new SPMDThread[NumT];

public void Producer {

Figure 3. Incorrect Java producer–consumer code

public void Produce(Semaphore mySem) {

public void Barrier(Semaphore barrier) {

5. SPMD JAVA LIBRARY INTERFACE

public void SPMD SORloop()

Figure 6. SPMD Java SOR code for evaluating American options

Table 1. Thread management methods

create spmd groups create an SPMD thread group

Table 2. Thread communication methods

spmd produce signal to a specific consumer

You might also like