0% found this document useful (0 votes)
12 views

Debugging Concurrent Programs

Uploaded by

sagar singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Debugging Concurrent Programs

Uploaded by

sagar singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Debugging Concurrent Programs

CHARLES E. MCDOWELL and DAVID P. HELMBOLD


Board of Studies in Computer and Information Sciences, University of California at Santa Cruz, Santa Cruz,
California 95064

The main problems associated with debugging concurrent programs are increased
complexity, the “probe effect,” nonrepeatability, and the lack of a synchronized global
clock. The probe effect refers to the fact that any attempt to observe the behavior of a
distributed system may change the behavior of that system. For some parallel programs,
different executions with the same data will result in different results even without any
attempt to observe the behavior. Even when the behavior can be observed, in many
systems the lack of a synchronized global clock makes the results of the observation
difficult to interpret. This paper discusses these and other problems related to debugging
concurrent programs and presents a survey of current techniques used in debugging
concurrent programs. Systems using three general techniques are described: traditional or
breakpoint style debuggers, event monitoring systems, and static analysis systems. In
addition, techniques for limiting, organizing, and displaying a large amount of data
produced by the debugging systems are discussed.

Categories and Subject Descriptors: A.1 [General Literature]: Introductory and Survey;
D.1.3 [Programming Techniques]: Concurrent Programming; D.2.4 [Software
Engineering]: Program Verification-assertion checkers; D.2.5 [Software
Engineering]: Testing and Debugging-debugging aids; diagnostics; monitors; symbolic
execution; tracing
Additional Key Words and Phrases: Distributed computing, event history,
nondeterminism, parallel processing, probe-effect, program replay, program visualization,
static analysis

INTRODUCTION programs even harder than debugging se-


quential programs. In the remainder of this
The interest in parallel programming has
section we will justify this claim and outline
grown dramatically in recent years. New
the basic approaches currently used for de-
languages, such as Ada’ and Modula II,
bugging parallel programs. In Sections l-4
have built-in features for concurrency.
we discuss each of these approaches in de-
Older languages, such as C and FORTRAN,
tail. We conclude with Section 5 and an
have been extended in a variety of ways in
appendix with tables that summarize the
order to support parallel programming
features of 35 systems designed for debug-
[Gehani and Roome 1985; Karp 19871.
ging parallel programs.
The added complexity of expressing
concurrency has made debugging parallel
Difficulty Debugging Concurrent Programs

‘Ada is a registered trademark of the U.S. Govern- The classic approach to debugging sequen-
ment (Ada Joint Program Office). tial programs involves repeatedly stopping

This work was supported in part by IBM grants SL87033 and SL88096.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM 0360-0300/89/1200-0593 $01.50

ACM Computing Surveys, Vol. 21, No. 4, December 1989


594 l C. E. McDowell and D. P. Helmbold

the program during execution, examining existent [Lamport 19781. Without a syn-
the state, and then either continuing or chronized global clock, it may be difficult
reexecuting in order to stop at an earlier to determine the precise order of events
point in the execution. This style of debug- occurring in distinct, concurrently execut-
ging is called cyclical debugging. Unfortu- ing processors.
nately, parallel programs do not always
have reproducible behavior. Even when Basic Approaches
they are run with the same inputs, their Some researchers distinguish between
results can be radically different. These monitoring and traditional debugging
differences are caused by races, which occur [Joyce et al. 19871. Monitoring is the pro-
whenever two activities are allowed to pro- cess of gathering information about a pro-
gress in parallel. For example, one process gram’s execution. Debugging, as defined in
may attempt to write a memory location the current ANSI/IEEE standard glossary
while a second process is reading from that of software engineering terms, is “the pro-
memory cell. The second process’s behavior cess of locating, analyzing, and correcting
may differ radically, depending on whether suspected faults,” where a fault is defined
its reads the new or old value. to be an accidental condition that causes a
The cyclical debugging approach often program to fail to perform its required func-
fails for parallel programs because the un- tion. Since monitoring is often an effective
desirable behavior may not appear when procedure for locating incorrect behavior,
the program is reexecuted. If the undesira- it should be considered a debugging tool.
ble behavior occurs with very low probabil- For the purposes of this survey, tech-
ity, the programmer may never be able to niques for debugging concurrent systems
recreate the error situation. In fact, any have been organized into four groups:
attempt to gain more information about the
program may contribute to the difficulty of (1) Traditional debugging techniques can
reproducing the erroneous behavior. This be applied with some success to parallel
has been referred to as the “Heisenberg programs. These are discussed in Sec-
Uncertainty” principle applied to software tion 1.
[LeDoux and Parker 19851 or the “Probe (2) Event-based debuggers view the exe-
Effect” [Gait 19851. For programs that con- cution of a parallel program as a se-
tain races, any additional print or debug- quence (or several parallel sequences)
ging statements may modify a crucial race, of events. The generation and analysis
lowering the probability that the interest- of these sequences or event histories is
ing behavior occurs. This interference can the subject of Section 2.
be disastrous when attempting to diagnose (3) Techniques for displaying the flow of
an error in a parallel program. control and distributed data associated
The nondeterminism arising from races with parallel programs are presented in
is particularly difficult to deal with because Section 3.
the programmer often has little or no con- Static analysis techniques based on
trol over it. The resolution of a race may
(4)
dataflow analysis of parallel programs
depend on each CPU’s load, the amount of are presented in Section 4. These tech-
network traffic, and nondeterminism in the niques allow some program errors to
communication medium (e.g., exponential be detected without executing the
backoff protocols [Tannenbaum 1981, pp. program.
292-2951). It is this nondeterministic
behavior that tends to make understand- This survey covers a large number of
ing, writing, and debugging parallel pro- research and commercial projects designed
grams more difficult than their sequential to help produce error-free concurrent soft-
counterparts. ware. It focuses primarily on systems that
An additional problem found in distrib- are directed toward isolating program er-
uted systems is that the concept of “global rors. A large body of work in formal pro-
state” can be misleading or even non- gram verification and in program testing

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Program 595

has been explicitly excluded from this sur- They have the potential of identifying a
vey. Most of the systems surveyed fall into large class of program errors that are par-
one of two general categories, traditional ticularly difficult to find using current dy-
parallel debuggers (or what are sometimes namic techniques. These techniques have
called “breakpoint” debuggers) and event- been applied mostly to parallel versions of
based debuggers. Of course, some systems FORTRAN that do not support recursion.
contain aspects of both classes. All of the As with the event-based debuggers, static
systems (or in some cases proposed sys- analysis systems are still in the prototype
tems) in these two general categories are stage. The primary problem with most
listed in the tables in Appendix A. static analysis algorithms is that their
In addition to traditional parallel debug- worst-case computational complexity is
gers and event-based debuggers, some static often exponential.
analysis systems are included. The static All three types of debugging systems have
analysis systems surveyed fall somewhere made some progress in presenting the com-
between debugging and testing. The static plex concurrent program state and the ac-
analysis systems are distinguished from companying massive amounts of data to
testing by not requiring program execution the user. Multiple windows is a useful
and by generally checking for structural mechanism for interfacing with traditional
faults instead of functional faults. That is, style debuggers for parallel systems. The
the analysis tools have no knowledge of the abstraction capabilities of event-based de-
intended function of the program and sim- buggers (see Section 2) have been used to
ply identify program structures that are present interesting and potentially useful
generally indicative of an error. These views of system states graphically [Hough
systems do not appear in the comparison and Cuny 19871.
table in Appendix A but are discussed in
Section 4. 1. EXTENDING TRADITIONAL DEBUGGING
Each of the three types of systems sur- TO PARALLEL PROGRAMS
veyed takes a different approach to the
debugging problem. The traditional parallel The simplest type of debugger to imple-
debuggers are the easiest to build and there- ment for parallel systems is (or behaves
fore provide an immediate partial solution. like) a collection of sequential debuggers,
They provide some control over program one per parallel process. To date, all com-
execution and provide state examination. mercially available debuggers for parallel
They are also severely limited by the probe programs fit this description. The primary
effect. differences lie in how the output from the
Event-based debuggers provide better several sequential debuggers is displayed
abstraction than that provided by tradi- and how the separate sequential debuggers
tional style debuggers. They also address are controlled. We will call these collections
the probe effect by permitting deterministic of sequential debuggers traditional parallel
replay of nondeterministic programs. If it debuggers.
is not possible to record event histories The probe effect, discussed in the Intro-
continuously, however, the probe effect will duction, has gone mostly unaddressed by
still be a problem. Also, event-based debug- traditional parallel debuggers. This makes
gers are generally research prototypes, ap- traditional parallel debuggers ineffective
plicable only to systems without shared against timing dependent errors. The probe
memory. A notable exception is instant effect, however, does not always rear its
replay [LeBlanc and Mellor-Crummey ugly head, allowing many program errors
19871, which supports event tracing and to be isolated using traditional cyclic de-
replay on the shared memory BBN Butter- bugging techniques. This can be attributed
fly provided OS protocol routines are used to two factors. First, those errors in parallel
for all shared memory accesses. programs that are not timing dependent
Static analysis tools avoid the probe ef- would never be masked by the probe effect.
fect entirely by not executing the programs. Second, even for timing related errors, the

ACM Computing Surveys, Vol. 21, No. 4, December 1989


596 . C. E. McDowell and D. P. Helmbold

effect of the probe may not disturb the The Sun Microsystems’ dbxtool is an ex-
outcome of the critical races. ample of applying a set of sequential de-
Another criticism of traditional parallel buggers to concurrent programs without
debuggers is that they operate at too low a any explicit coordination. It is capable of
level. For programs consisting of many con- attaching to an existing UNIX’ process,
currently executing processes, the major making it possible to debug a system of
difficulty may be in understanding what is communicating UNIX processes by attach-
happening at the interprocess level. Tradi- ing a separate copy of dbxtool to each pro-
tional debugging techniques work well for cess. (The UNIX process may not contain
viewing the behavior at the instruction process creation calls such as “fork,” and
level or at the procedure level. In Section the executable image being debugged can-
2.4 some recent developments for viewing not be shared.)
program behavior at a more abstract level An alternative to relying on a window
are presented. manager to direct commands to the proper
sequential debugger is to control all of the
debuggers from a single terminal or window
1.1 Coordinating Several Sequential [Sequent Corp. 19861. Commands are then
Debuggers directed at a specific process using a com-
In addition to the sequential capabilities of mand parameter or by defaulting to a
standard sequential debuggers, traditional specific “current” process. For example,
parallel debuggers should be able to do the “continue Pl” would continue process Pl,
following: and “continue” without a parameter would
continue the “current” process. The “cur-
1. direct any sequential debugger com- rent” process can be changed at any time.
mand to a specific task, The use of a single control window also
2. direct any sequential debugger com- permits the commands to be sent to all
mand to an arbitrary set of tasks, processes. For example, “continue all”
3. differentiate the terminal output from would continue all currently suspended
the different tasks. processes. In general, all processes will not
receive the command at the same instant.
The most primitive debugger for parallel The commands will, however, arrive at
programs would be nothing more than a times that differ by an amount approxi-
sequential debugger capable of attaching to mating the communication delay in the
any single process in a parallel program. system. If all processes could be instanta-
All that would then be necessary is to pro- neously stopped (and started) then, in the
vide the user with multiple real or virtual absence of timeouts, “stop all” breakpoints
terminals from which to execute the mul- would not cause any probe effect. This is,
tiple copies of the debugger. Today’s mul- of course, impossible, but anything that can
tiple window workstations make this more be done to minimize the time difference
practical than it might have been a few for receipt of stop signals should reduce
years ago. With a window manager [Schei- the probe effect. In addition to reducing
fler and Gettys 1986; Sun Microsystems the probe effect, broadcasting a single
19861 points 1 and 3 could be satisfied by command to a set of processes is a useful
selecting the desired window. Satisfying feature.
point 2 could be achieved simply by repeat- The “current” task notion is generalized
ing the desired command in each of the in Griffin [1987] and Intel Corp. [1987] to
desired windows. This approach, however, a current set of processes that all receive
would become fairly unwieldy for more any process-related commands. In Griffin
than a few processes. Furthermore, the time [ 19871, processes can be added to or re-
lapse between sending the command to the moved from the set simply by pointing to a
first and last processes in the set could symbol for the process in a special window.
aggravate the probe effect. This is particu-
larly true for commands such as “stop” and
“continue.” * UNIX is a trademark of AT&T Bell Laboratories.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 597

This could be generalized to permit arbi- notion of logical time that stops when any
trary groupings of processes. For instance, process reaches a breakpoint [Cooper 1987;
it might be desirable to alternate com- DiMaio et al. 19851. In systems that sus-
mands between two disjoint sets of pro- pend only the selected process, other pro-
cesses. With only a single “current” set and cesses will continue to execute until they
no overlap between the desired sets, this encounter some explicitly time-dependent
would require as many commands from the operation. In that case, the logical clock is
user as would be required with no support the one used for time in the time-dependent
for process grouping. It would, however, operation. For example, if a breakpoint is
still reduce the probe effect. It appears that encountered, no timeouts will expire until
the macro capability of Intel Corp. [1987] the suspended process is continued. In sys-
combined with their “context” command tems that stop all processes upon encoun-
for specifying the current set of pro- tering a breakpoint, logical time is stopped
cesses would support this toggling back and so that all of the suspended processes can
forth between disjoint sets of processes. be continued with minimal impact. This
will certainly not eliminate the probe effect,
but it can permit some traditional style
1.2 Breakpoints debugging in the presence of such explicitly
time-dependent operations.
The ability to set breakpoints is possibly The domain of expressions or predicates
the most important feature of a sequential used to describe a breakpoint is larger for
debugger. (Since tracing is equivalent to parallel programs than for sequential pro-
setting a breakpoint that, when encoun- grams. These predicate expressions may
tered, prints some information and auto- involve both process state and events. An
matically continues, the discussion in this event can be loosely defined as any atomic
section will refer only to breakpoints.) Tra- action visible beyond the scope of a single
ditional parallel debuggers generally sup- process.
port the same types of breakpoints as those Predicates involving global state in an
found in sequential debuggers. These executing parallel program can be a prob-
breakpoints include stop at a source state- lem. This results from the lack of global
ment, stop on the occurrence of an excep- clock in most systems. For example, an
tion or some user detectable event, stop expression such as “process A never modi-
when a specific variable is accessed, and fies variable X while process B is modifying
stop when some conditional expression is variable X” may appear to be true due to
satisfied [Seidner and Tindall 19831. Un- the delay in communicating this informa-
like sequential debuggers, there are two tion to the debugger, when in fact concur-
possible actions to take when a breakpoint rent modification has occurred. The use of
is encountered. Either all of the processes events and some notion of consistent global
in the parallel program can be stopped im- time can be used to address this (see Sec-
mediately or only the process encountering tion 2). Possibly more important is that it
the breakpoint can be stopped. The former may not be possible to stop the desired
can be difficult to achieve within a suffi- processes after detecting that the predicate
ciently small interval of time, and the latter is satisfied yet before the state has changed.
can have a serious impact on systems that The distinction between events and
contain such things as timeouts. Assuming global states is admittedly vague in general
message passing is the communication but is usually well defined for any particular
mechanism, an algorithm to stop all pro- system. For example, an event-based pred-
cesses in a consistent state is presented in icate might be “process A sends a message,”
Miller and Choi [1988a]. and a global state-based predicate might be
Using breakpoints to debug systems with “the message buffer contains a message
explicit time-dependent operations (such as from process A.” This could even be repre-
timeouts) can be especially difficult. Some sented as a collection of program counter-
systems have attempted to deal with such based breakpoints, one immediately follow-
explicit race conditions by supporting a ing each send statement in process A. In
ACM Computing Surveys, Vol. 21, No. 4, December 1989
598 l C. E. McDowell and D. P. Helmbold

systems that deal with event-based predi- facilities:


cates, there must be some language for de-
scribing events. The language may be as l the ability to trap on any interprocess
simple as naming one of a finite set of communication (IPC),
events such as “taskstart” or “sendmes- l the ability to modify/insert/delete IPC
sage”; or it may support relatively powerful messages,
abstractions such as those described in Sec- l the ability to control the clock used for
tion 2. Table A.4 summarizes the break- timeouts.
point capabilities of the systems surveyed.
Several approaches have been taken to
provide these capabilities. Some debugging
1.3 OS Support for Parallel Debuggers systems modify the program source in order
Parallel debuggers that support global to provide the necessary hooks for the de-
state-based breakpoints and event-based bugger at run time. This avoids the need
breakpoints place greater demands on the for modifying the operating system at the
operating system than sequential debug- expense of slower performance and re-
gers. This is just one more step in the stricted capability. A second approach is to
evolution of debuggers. Early debuggers provide an alternative set of system rou-
only needed the ability to examine core tines. This permits the debugger to inter-
memory and the saved values of CPU reg- vene in any interaction between the user
isters after the program terminated. Next and system. The final approach is actually
was added the ability to set breakpoints. to modify the operating system to provide
The operating system provided a means of the necessary hooks. At some point it may
modifying the executable image and of become cost effective to implement more of
passing control to the debugger when the the debugging hooks directly in the ma-
breakpoint instruction was encountered. chine architecture. The interface entry in
Most operating systems also pass control Table A.2 summarizes where the hooks
to the debugger for most program excep- were placed for the systems surveyed.
tions. Some hardware architectures now
also include a special trace mode to facili- 2. EVENT HISTORIES
tate single stepping. The final feature that
is provided is a mechanism for passing con- Since the various debuggers surveyed were
trol to the debugger when a specific memory designed for different environments, it is
location is accessed. To summarize, a state- only natural that they do not agree on the
of-the-art sequential debugger may need definition of “event.” For example, in the
the following capabilities to be provided by DISDEB system events are memory ac-
the operating system or hardware: cesses, in Radar each message send or re-
ceive is an event, in Instant Replay an
l the ability to read or write a register or event is the access of an object, and in
memory location, YODA events represent Ada tasking activ-
l the ability to set and trap breakpoints, ity. In some systems, such as TSL and
EDL, events can be defined by the pro-
l the ability to trap program exceptions, grammer. In systems with explicit inter-
l the ability to single step a program, process communication, events can be
l the ability to trap memory accesses. divided into two classes: those representing
inter-process communication activity and
Traditional parallel debuggers that go those representing activity internal to a
beyond being simply a collection of sequen- single process. This distinction does not
tial debuggers acting together may require seem to hold for shared memory systems,
more of the underlying operating system. since each memory access is a potential
Debuggers that fit the traditional cyclic interprocess communication.
debugging paradigm, using breakpoints and Some systems merely display the events
tracing, require one or more of the following as they occur (see Appendix A). A more

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs 599

powerful approach is to record an event any single process. This permits the
history containing all of the events gener- use of a sequential debugger on a pro-
ated by the program. The history can then cess without reexecuting the entire
be examined by the user after the program program.
has completed. Since the event history is
often very large, some debuggers provide Browsing requires only minimal infor-
facilities to browse or query the history. mation about each event. Simply recording
Event histories can also be used to guide the kinds of events executed by a process
the program’s execution, allowing the re- can help isolate an error. Of course, if more
production of erroneous computations. If information is recorded, then more infor-
the history is complete enough, a single mation will be available to the programmer.
process can be debugged in isolation with One problem with browsing event histo-
the history providing the needed commu- ries is that the histories frequently contain
nication. Finally, some systems can auto- enormous numbers of events, making it
matically check the history for suspicious difficult to locate the events of interest.
behavior or transform the lower level his- Some systems allow selective recording of
tory of events into more meaningful high- information, and others include powerful
level events. mechanisms for examining the event his-
tory (see Section 2.2).
2.1 Recording Event Histories To replay an execution requires enough
information so that the next event in which
A common approach is for the debugger to
do as little as possible, mainly recording each process participates can be deter-
mined. LeBlanc and Mellor-Crummey
information, at run time. By limiting the
debugger’s activity, the probe effect should [1987] describe a method that reduces the
be reduced. The recorded information can amount of information needed for replay
then be analyzed following the program’s compared with previous methods that re-
execution. corded the complete contents of all mes-
sages. Their ideas work because the
program generates the contents of the mes-
2.1.1 Which Information to Record
sages during the reexecution.
The amount of information that must be Simulating the rest of the program so
recorded for each event depends upon how that a single process can be debugged in
the event history is going to be used. Three isolation requires that all events visible to
general levels of use that require increasing the process be recorded. This includes both
amounts of detail to be recorded for each the contents of messages and the values
event are the following: written to shared memory. Note that reex-
ecuting a single process requires more in-
(1) Browsing-The event history is exam- formation than reexecuting the entire
ined possibly through the use of spe- system.
cialized tools. Examination methods If the interesting portion of the execution
range from text editors to “movies” can be identified, then the amount of infor-
showing the state changes caused by mation required for replay can be consid-
events [Hough and Cuny 1987; erably reduced. Instead of recording the
Le Blanc and Robbins 19851. entire history, the debugger can take a
(2) Replay-The debugger uses the event snapshot of the program’s state and keep
history to control a reexecution of the only that part of the history that follows
program. This permits the use of con- the snapshot. It may, however, be difficult
ventional debugging techniques, such to obtain accurate snapshots in distributed
as breakpoints, state examination, and systems efficiently (see Chandy and
single stepping, without changing the Lamport [1985] for one method). This tech-
behavior of the program. nique may work best for simulating a single
(3) Simulation-The event history can be process, since only that process’s state
used to simulate the environment of needs to be recorded.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


600 . C. E. McDowell and D. P. Helmbold
2.1.2 How the History Gets Recorded arate history tape for each process. Al-
though each history tape is a linear stream,
In addition to the amount of information
together they (with the program) represent
recorded in an event history, some atten-
the partial ordering of the events in
tion must be given to how the recording is
the computation. The Traveler debugger
done and the resulting impact on perfor-
[Manning 19871 for Acore (a LISP-like lan-
mance and the probe effect. In the systems guage) keeps a history tape (“lifeline” in
surveyed, the methods used to generate the their terminology) for each object in the
event history varied from inserting appro-
program. In addition, the Traveler system
priate statements in the original source records the partial order by explicitly link-
program to monitoring the buses passively. ing each action to the child actions it causes
An intermediate method of recording event (or parent action it allows to continue).
histories is to provide modified system rou- A general technique for obtaining the
tines. These modified routines record the partial order involves associating each
history in addition to performing their nor-
event with a vector of “logical timestamps”
mal system functions. In some cases such [Fidge 1988; Haban and Weisel19881. The
as LeBlanc and Mellor-Crummey [1987] it
order (or absence thereof) between two
is the user’s responsibility to insert system
events can be easily determined by com-
calls that record the event information. In
paring the vectors of timestamps associated
others it is only necessary to link the pro- with the events.
gram using special monitoring versions (see
the interface entry in Table A.2).
With the exception of the bus monitor-
2.2 Browsing Event Histories
ing, all other recording methods resulted in
possible changes in timing and hence are Many systems recognize the need for facil-
potentially susceptible to the probe effect. ities that help interpret massive event his-
In some papers (see the probe effect in tories. Graphical techniques such as time-
Table A-2) it was argued that the perfor- process diagrams and animation are dis-
mance impact of monitoring was suffi- cussed in Section 3. A simple feature found
ciently small to justify leaving the recording in many systems is filtering, whereby the
permanently enabled. In other papers it events that the programmer feels are un-
was argued that the perturbations caused important are automatically discarded,
by the monitoring software were suffi- usually based on process or kind of event
ciently small to avoid the probe effect in (see Table A.@. Two systems go further,
most cases. storing the event history in a database for
easy examination.
The YODA system for Ada tasking pro-
2.1.3 Linear Versus Partially Ordered Event
Histories
grams [LeDoux and Parker 19851 stores the
event history as Prolog facts. Prolog pred-
An issue that arises in distributed systems icates define the common temporal rela-
is whether the event history should be par- tionships such as during, before, and after.
tially or linearly (i.e., totally) ordered. On The user is able to define and store addi-
the one hand, a linearly ordered history is tional predicates (either temporal or non-
simpler to understand and can be easier to temporal). Existential queries, such as
work with. On the other hand, a linear “which tasks updated variable X before
stream is often misleading since it implies task T,” are used to retrieve information
an ordering between every pair of events- from the history. Note that the YODA
even when the events are completely unre- system stores when (using a global event
lated. A partial ordering on the events is counter) and to what values variables are
necessary to reflect the behavior of a dis- updated, as well as the explicit intertask
tributed system accurately. communications.
One way to represent the partial order In [Snodgrass 19841, the events are
(used by Instant Replay [LeBlanc and captured in a relational database. In addi-
Mellor-Crummey 19871) is to record a sep- tion to the event relations, the database

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 601
contains interval relations indicating be- regard. On one example (gaussian elimina-
havior with duration. Time is kept in mi- tion of a 400 X 400 matrix on up to 64
croseconds, presumably using a global processors) they report that (with tuned
clock. TQuel, a version of Quel augmented monitoring) the overhead involved in col-
with temporal constructs, is used to build lecting the event history amounts to less
new relations from those in the event his- than 1 percent. Furthermore, there was no
tory. The user queries the history by build- additional decrease in performance when
ing and printing the appropriate relation. the execution was replayed.
Replaying real-time systems has several
2.3 Replaying Event Histories additional problems. The external I/O and
interrupts must be recorded in addition to
Several of the debugging systems allow a the communications between the processes.
program to be reexecuted under the control Since real-time systems generally have a
of an event history (see Table A.5). We call strong time dependency, it may be impor-
this capability replay. If the history cor- tant to simulate time during the replay. If
rectly reflects the interprocess communi- behavior due to the absence of communi-
cation of the original execution, then the cation, such as timeouts, takes place, then
replay produces the same results. Although faithfully reproducing such behavior re-
replay helps solve the reproducibility prob- quires that additional events appear in the
lems, it is useful only if additional infor- history.
mation is gained.
One way to gain information is to replay 2.4 Checking and Transforming
the program in “debug mode,” with a tra- the Event History
ditional sequential debugger attached to
each process. This allows the internal state Several debugging systems compare the
of the processes to be examined, giving the event history generated by the program
programmer significantly more informa- with a set of predicates. These systems are
tion than a stream of IPC events. Some more than browsers since the analysis is
systems allow a single suspect process to be done at run time. In addition, the violation
replayed in isolation, with the remainder of of a predicate can trigger additional debug-
the program simulated by the event history. ging action. Because the analysis is done at
This approach reduces the parallel debug- run time, all of the predicates to be tested
ging problem to that of debugging a single must be written before the program starts
sequential process, once the faulty process executing. This disadvantage can be re-
is identified. duced if the event history is recorded so
Another way of gaining information is to that the execution can be replayed.
execute a modified program under the con- The DISDEB system [Lazzerini and
trol of an event history. The event history Prete 19861relies on programmable debug-
can control the new program as long as ging aids that eavesdrop on bus traffic,
its behavior is compatible with the old avoiding the probe effect. Although this
[Curtis and Wittie 1982; LeBlanc and system contains several interesting fea-
Mellor-Crummey 19871. This facility al- tures, it has a very low-level interface. An
lows the programmer to add additional event definition is of the form “process P
debugging statements or experiment with with certain permissions accessesmemory
modified algorithms. Another advantage of location X reading/writing value V.” The
this feature is that corrected programs can memory location is mandatory, and the
be tested against the same input and his- value may be a range. If part of the event
tory that caused the previous version to definition is omitted, then that part is
fail. treated as “don’t care.” DISDEB allows
The added synchronization involved dur- state to be stored in two ways. First,
ing replay can dramatically slow down a counters and timers can be defined and
parallel program. The Instant Replay sys- used; second, event definitions can be en-
tem [LeBlanc and Mellor-Crummey 19871 abled and disabled. Once a suitable set of
reported some of the best results in this events has been defined, it can be used to

ACM Computing Surveys, Vol. 21, No. 4, December 1989


602 l C. E. McDowell and D. P. Helmbold

trigger debugging actions; for example, this system is simplified by the restriction
“when El occurs, display counter C and that each specification can only refer to
stop process N.” Other potential actions events from a single process.
include starting and stopping traces The TSL system [Helmbold and Luck-
of memory locations and manipulating ham 1985a] automatically checks specifi-
timers. cations against the events generated by an
A similar approach is taken by the Ada tasking program. Each TSL specifica-
HARD system for Ada tasking programs tion is of the form “when this occurs then
[Di Maio et al. 19851. There the predicates that occurs before something else,” where
and debugging actions are encoded in spe- each of the three parts is an event formula.
cial Ada tasks called D tasks. Manually TSL contains placeholders allowing a sin-
inserted calls to the D tasks enable them gle specification to constrain multiple
to obtain information about the program’s tasks. Additional abstraction is gained by
execution. Based on this information, they using macros for event subformulas. An
can call routines that display or modify the important contribution of the TSL system
program state. All of the Ada facilities can is its use of Ada semantics to guarantee
be used inside of a D task, so the program- that, even in distributed systems, certain
mer can use a familiar high-level language pairs of events appear in the history in the
to control the debugging process. correct order.
Rather than using the stream of events The Event Description Language (EDL)
to control debugging activity, the following takes a slightly different approach [Bates
systems automatically check specifications and Wileden 19831. Instead of checking
for the program. Although this requires specifications against the event history, it
that the programmer learn an additional provides a method for defining multiple
language, it can complement a formal spec- levels of abstract events from the primitive
ification/verification approach to program events generated by the program. Each
development. Most of these systems have high-level event is defined by an event for-
their own way of specifying complex event mula over lower level events. There is one
formulas, usually based on the sequential clause that constrains the values associated
and parallel composition of events. with the lower level events and another that
The IDD system [Harter et al. 19851 uses determines the values associated with the
an interval logic specification of the pro- higher level event. The Belvedere system
gram. This specification is checked against uses EDL to help control its display (see
the program’s behavior. When a specifica- Section 3.3).
tion is violated, the program is stopped for All of these specification methods have
inspection. Temporal logic views the com- simplifying restrictions. In EDL, an accu-
putation as a sequence of states. The main rate global clock is assumed, the event
operators are “always” and “eventually,” recognizer is a potential bottleneck, and
meaning that the following predicate on the some ambiguity arises when a low-level
state is either always true or eventually event can be used in multiple higher level
becomes true. Interval logic adds expressi- events (see, however, Bates [1988]). The
bility by restricting the temporal operators TSL specification checker requires a lin-
to portions of the computation. early ordered stream of events and is also
The ECSP debugger [Baiardi et al. 19861 a bottleneck in the current application. In
can check behavior specifications that the IDD system, events are restricted to
completely describe the allowable commu- broadcasts on a shared medium (such as an
nication behavior of the processes. The Ethernet). A tree structure method for eval-
specifications can refer to various commu- uating the IDD interval logic expressions is
nication activity and can contain assertions briefly described. The ECSP assertion
on the process’s state. One of the constructs checker is for a hierarchical fork-join
in their language causes control to be re- method of parallelism. Its main disadvan-
turned to the user (presumably so that the tages are that each specification deals only
process can be examined). The ordering of with the activity of one process and all
events and checking of specifications in processes must be completely specified.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs 603

3. GRAPHICS proc Simple = reply(request(A) + request(B)


+ request(C));
Sequential programs lend themselves rea- proc A = reply(request(X) + request(Y));
sonably well to being debugged with a single proc B = reply(b);
sequential output device. There is only one proc C = reply(c);
proc X = reply(r);
thread of execution that can accurately be proc Y = reply(y);
displayed as sequential text. Also, the data
are logically stored in one place and can be Figure 1. A stylized transaction program.
displayed when desired. Parallel programs
are different in these two areas. In parallel
programs, there are multiple threads of 3.1 Text Windows
control and the data may be logically as
A simple text presentation of the debugging
well as physically distributed. An impor-
information is the most common type of
tant goal of research into parallel debugging
display. All of the systems make use of
systems is to find ways of presenting the
some simple text displays. For a traditional
distributed data and control of parallel pro-
parallel debugger, this may be the only type
grams to users in a manner that aids in
of data display (see Section 1.1). In an
comprehension.
event-based system, a sequential display of
Four basic techniques for displaying de-
the events as seen by a particular process
bugging information are as follows:
can be useful. The Traveler [Manning
19871, which is an object-based system, can
(1) Textual presentation of the data, which display a “lifeline” that is a sequential list
may involve color, highlighting, or a of processes in the order that they accessed
display of control flow information. a particular shared object.
Time may not progress monotonically It is not always necessary for time to
from the top of the screen to the progress monotonically from the top of a
bottom. sequential text display to the bottom. In
(2) Time-process diagrams that present the Traveler events are displayed in their
execution of the program in a two- causal order instead of in their temporal
dimensional display with time on one order. Traveler is used to debug message
axis and individual processes on the passing programs. In their model, all mes-
other axis. The points in the display sages come in pairs, with a request and a
are labeled to indicate the activity of reply. All requests block until a reply is
the specified process at the specified received, and the programming model is
time. such that making one request may result in
many more nested requests before a reply
(3) Animation of the program execution, is sent. Also, one process may send several
whereby both dimensions of the display
are spatial dimensions. The display requests concurrently. The Traveler dis-
corresponds to a single instant in time, play presents these nested request-reply
pairs, which they call transactions, in such
or snapshot of the program state. These
a way as to emphasize the nested structure.
snapshots can be displayed one after
another, animating the program’s exe- Requests sent concurrently will be dis-
played paired with their responses and
cution. The actual format of a single
nested within the send-receive pair that
frame can take many forms as dis-
caused them to occur. Figure 1 gives a styl-
cussed below.
ized example program using requests and
(4) Multiple windows, whose use permits replies. “Request(A)” sends a message to
several simultaneous views of the pro- request a response from process A. “Re-
gram being debugged. This frequently ply(x)” sends the reply “x”. Figure 2 gives
involves using one window per process. a possible sequential ordering of the events
for a partial execution of the request “re-
Each of these approaches will be discussed quest(Simple)“. Figure 3 gives the nested
below with examples taken from the sur- transaction display for the same partial
veyed papers. execution.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


604 . C. E. McDowell and D. P. Helmbold
request(Simple)
request(A )
reauest(B)
re&estiXj
Figure 2. A sequential display. request(C)
request(Y)
reply(y) Figure 4. A simple time-process diagram.
reply(c)

In Griffin [ 19871 time-process diagrams


request(Simple)
are used without enhancement to display
request(A)
request(X) the activity of this shared memory-based
[no response] system. It has the advantage that it can be
request(Y) presented on a simple text screen (see Fig-
Figure 3. A nested display. reply(y) ure 5). This system is actually a uniproces-
[no response]
Iequest(B) sor simulation of a multiprocessor, and this
[no response] is evident in the display. The unit of time
request(C) is the occurrence of an event. A single char-
reply(c) acter is used to represent each of the pos-
[no response]
sible events in which a process may engage.
Because this is a uniprocessor, each column
will have one event character and all other
For large programs, all transactions and active processors will have a “.” in that
their subtransactions might not fit on the column. The last character other than “.”
display simultaneously. To handle this, the in a row corresponds to the last event in
user may selectively open and close trans- which the process engaged. This can be
actions. When a transaction is closed, all thought of as the process’s current state.
of its subtransactions are hidden. For The display can be scrolled forward or
example, in Figure 3, if the transaction backward in time. The last non “.” in each
for “request(A)” were closed, then “re- row that was scrolled off to the left is
quest(X)“, “request(Y)“, and their corre- maintained in the leftmost column. The
sponding response lines would not be display contains additional text that pro-
shown. As the computation advances, vides additional information about the
the various “ [no response] ” entries will current (rightmost column) state. This in-
be filled in. cludes such things as which signal a process
is waiting for and which signals have been
3.2 Time-Process Diagrams posted. (To avoid confusion with our
broader use of the word event, we use signal
A time-process diagram is a two-dimen- here where the mtdbx system [Griffin
sional representation of the state of a par- 19871 uses event.)
allel system over time. One axis represents In [Harter et al. 19851 the message-pass-
time, and the other axis represents the ing system, Idd, there is heavier use of
processes. Each point in the display gives graphics. Instead of placing one character
some information regarding the state of the at each point in the display, two points
corresponding process at the corresponding in the display are connected by a line to
time. For example, a simple time-process indicate the transmission and receipt of
diagram for a hypothetical system is shown a message (see Figure 6). To aid in compre-
in Figure 4. Each row describes the events hension of a potentially very cluttered dis-
in which a process engages, and each col- play, the user can magnijl or scroll to see
umn describes the state of the system at a only a selected portion of the display. The
particular instant in time. At time 1, pro- display option allows the user to rearrange
cess 1 engages in event A; at time 2, process the rows so that related processes can be
2 engages in event B and process 3 engages placed close together. The user may also
in event C; and so on. select various filters to be used in displaying

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 605

1 STRRT] 1 STOP 11 CONT 11 QUIT 1 Display Rate Chm~l: tOI O-10

MRIN -...............................
TRSK 1 i? : : : . SC- . . . . I...(-)(-..I.( . . . .
TRSK2 . R. . . : : : : : : . SP. UJ.. . . . . . . . . . . . . . . . .
TRSK3 . . R.. SI . . . I. : : . . 1 -I(. . . . . -1. (. -I(-
TRSK4.. . R.. -SPUl.. . . . . . ::. . . . . . . . . . . . . . . . .

TASK ID STATUS TASK IO STATUS LOCK STRTUS OWNER


0 MAIN running 3 3 running 1 off
1 lockwait on 2 4 4 eventwait on 4 2 on 3
t 2 eventwait on 3

Figure 5. The mtdbx time-process diagram.

Display Filter

w Time in MS (Freers) Time in MS

< <oo

Figure 6. The Idd time-process diagram. Used with permission [Harter 198510 1985 IEEE.

subsets of the messages that fall within the message. Figure 7 shows process histories
time-process space currently being dis- for three processes and the corresponding
played. Similar displays are included in concurrency map. The horizontal lines
PPUTT [Fowler et al. 19881 in which the (partially obscured by the boxes) corre-
emphasis is on the programmer noticing spond to logical time divisions. An event
irregularities in the patterns of communi- may have occurred during any time division
cation. touched by the box containing the event.
For many of the variations on time- Time-process displays appear to have a
process diagrams a global clock is required. definite place in viewing the activity of
At least one system [Stone 19881, however, parallel systems. They do have their limi-
uses a type of time-process display without tations. As the number of processes be-
needing a global clock. This display is called comes large, the display may become too
a concurrency map. Instead of displaying cluttered with information to be useful.
exactly when events occurred based on a This can be addressed to a degree with
global clock, events are arranged to show filtering and such features as the display
only the order in which they occurred. This and magnify options described above for
order is derived from the time dependencies Idd. In the next section, we present an
in the program. For example, the receipt of alternative display that gives a much dif-
a message must follow the sending of a ferent view of the system.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


PROCESSA PROCEiS B PROCEiS C
I II: Compute Cl: Compute

Compute Compute Computo “I


Send Ml Rocelve Ml Receive Mz
Compute Compute Compute
Send I42 Send MI Receive MI
COJllpUtO Computa Compute
Send I43 Roaivo M3 Send M!5
Compute Computa Compute
Recoivo H5 ... ...

...
Compute

I
Process Histories

/ /+fc;I: Compute t

Figure 7. Process histories and a concurrency map.


Debugging Concurrent Programs l 607
3.3 Animation
An alternative to time-process displays is
to place each process (or selected portions
of distributed data) at a different point in
a two-dimensional display and have the
entire display represent the system at a
single instant in time. As time advances,
the display changes and these changes can
be played in sequence to give a type of
animated movie. This movie displays the
evolution of the state of the system. The
placement of the processes (or data) in the
two-dimensional display could be arbitrary,
be under user control, or correspond to the
underlying structure of the program (or
data) being represented.
In Belvedere [Hough and Cuny 19871,
the placement of processes is very impor- Figure 8. Animation using Belvedere. Used with per-
tant and is specified by the user. The basic mission of Hough and Cuny [1988].
animation elements are depicted in Fig-
ure 8. This system animates primitive
events and user defined events specified
using EDL. To further help organize the can use existing views or build new ones.
display, the user may request that events Four views have been constructed and are
be displayed from different perspectives: described in Socha et al. [1988]: icon view,
that of a processor, a channel or a data vector view, trace view, and linked-list
item. For example, when viewed from the view.
perspective of a single processor, the events In the icon view, the events indicate
will be displayed in the order that they where in a two-dimensional picture a par-
appeared to that processor. Examples of ticular icon should be drawn and when to
this can be found in Hough and Cuny start a new animation frame. By observing
[1987] and in Figure 8. Figure 8 is a snap- the position of the icons within a single
shot of message traffic animation during a frame and their change in position from
traveling salesman program. Activity is de- frame to frame, several errors have been
picted by highlighting the appropriate ports detected. The vector view is a variation of
(small boxes) and channels (lines). A port this where instead of drawing icons, vectors
is highlighted during a receive and a chan- are drawn.
nel during a send. Arrowheads indicate di- The trace view provides a small box (win-
rection for sends, with multiple arrowheads dow) for each process. Connections to other
indicating more than one message in the processors are shown as lines to other
buffer. boxes. Values local to the process are then
The Radar [LeBlanc and Robbins 19851 displayed inside the box. The linked-list
system also uses animation of messages.In view is a variation of the trace view that
Radar, the user can control how long each displays a linked-list data structure from
time frame is displayed. Also, at any time, the program. The nodes are drawn as boxes,
the user can have the contents of any mes- with the actual data values displayed. The
sage displayed. boxes are then connected to show the actual
Voyeur [Socha 19881 is a prototype sys- link structure.
tem for the construction of application spe- The emphasis in Voyeur is the ability to
cific “views” of parallel programs. Its input provide an easy method for programmers
is a sequence of events generated from user- to animate their parallel programs for the
inserted instrumentation code. The user purpose of uncovering errors.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


608 . C. E. McDowell and D. P. Helmbold
3.4 Animation Versus Time Process would be to display several animation
frames simultaneously. Using the abstrac-
It should come as no surprise that neither tion of Belvedere, it might even be useful
animation nor time-process diagrams alone to display several different perspectives in
is sufficient to detect all of the errors in different windows simultaneously. It may
parallel programs easily. Animation is good be that the flexibility of a system like
for observing the instantaneous state of the Voyeur will be necessary because no single
system. By only displaying a single instant view is sufficient for the many different
of time, more state information can be dis- types of errors that must be addressed.
played simultaneously. This may mean
more information per process or more pro-
cesses. Patterns of concurrent behavior can 4. STATIC ANALYSIS FOR DEBUGGING
also be viewed (e.g., all processes except one PARALLEL PROGRAMS
are sending messages to their neighbor). When the probe effect renders the tech-
Animation, however, does not clearly show niques in Sections 1 and 2 useless, what
patterns of behavior that occur across time. options are left to a programmer to debug
This is addressed to some degree in Belve- a parallel program? Some researchers are
dere by the use of high-level abstract events pursuing static analysis techniques for de-
that may encompass an interval of time. tecting certain classes of anomalies in par-
Time-process diagrams can display pat- allel programs. This is distinct from formal
terns of behavior over time. This can be proof of correctness, because no attempt is
especially helpful in finding performance made to prove conformance with a written
bugs. The trade-off is that only a very small specification. Instead, an attempt is made
amount of information can be displayed for to give assurance that the program can-
each process at any point in time. In mtdbx not enter certain predefined states that
[Griffin 19871 only a single character is generally indicate errors.
displayed for the entire state of a process. Static analysis is being used to detect two
In PPUTT a process is either waiting, run- classes of errors in parallel programs: syn-
ning, sending, or receiving. chronization errors and data-usage errors.
It appears that the ideal debugger for Synchronization errors include such things
complex concurrent systems would support as deadlock and wait forever. Data-usage
both animation and time-process displays. errors include the usual sequential data-
Two pieces of evidence to support this usage errors, such as reading an uninitial-
claim are the following: ized variable, and parallel data-usage errors
The animation systems find some errors typified by two processes simultaneously
by noticing changes from one frame to updating a shared variable.
the next (the passage of time), and There appear to be two related but dis-
tinct areas being investigated. One is apply-
The time-process displays generally pro-
ing dataflow analysis techniques, similar to
vide some mechanism for displaying de-
those used by optimizing and vectorizing
tailed system state for a particular compilers, to determine data-usage prop-
instant in time. erties in parallel programs. The other is
One answering the question,
._. approach to combining the two
would be simultaneously to present a time- Is it possible for two statements Sl and S2
process diagram in one window of a graphic in a parallel program to execute in parallel?
workstation and an animation in another.
The time-process display would guide the These two areas are discussed in the re-
programmer to the important point in time, mainder of this section. Some closely re-
and the animation display would present lated work on combining static analysis
detailed information about the state of the with dynamic debugging and a system de-
program in a comprehensible way. An al- signed to aid in the development of parallel
ternative method of displaying both time algorithms are presented at the end of
and the detail found in animation frames Section 4.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 609

4.1 Dataflow Analysis of Parallel Programs (6) a process waiting for the completion of
another process that is guaranteed to
Probably the most frequently referenced have already completed, and
work on dataflow analysis of parallel pro-
grams is that of Taylor and Osterweil (7) a process that is scheduled to execute
[ 19801. Their algorithms generate four in parallel with itself.
data-usage sets for each node of a program In addition to not permitting recursion,
flow graph: gen, kill, live, and avail. These the algorithm for item (2) assumes the exis-
correspond to the sets by the same names tence of an algorithm for determining if
used in the global dataflow analysis of op- two statements can execute in parallel.
timizing compilers [Fosdick and Osterweil This is the subject of Section 4.2. Also, it
1976; Hecht and Ullman 19751. The origi- is recognized that it is impossible to “create
nal algorithms used to compute live and a fixed static procedure capable of con-
avail have been extended to pass data-usage structing the PAF of any program written
information across edges in the flow graph in a language which allows run-time deter-
corresponding to synchronization opera- mination of tasks to be scheduled and
tions. By reinterpreting the meaning of gen waited for.”
and kill, it is possible to use the modified The difficulty of using dataflow to ana-
data-usage sets to arrive at algorithms to lyze parallel programs is clearly shown in
detect anomalies in parallel programs. Callahan and Subhlok [1988]. They present
The algorithms in Taylor and Osterweil an algorithm for determining which data
[1980] assume a simple process synchroni- dependencies present in a sequential exe-
zation model. One process may cause an- cution of a program are preserved in a
other process to begin execution with the parallel execution of the program. They
statement “schedule X” and wait for the then show that determining if all data de-
completion of another process with “wait pendencies are maintained is Co-NP-hard
X”. Their model does not permit a process using only the information found in their
to execute (be scheduled) in parallel with verison of the PAF which they call the
itself. This would correspond to a recursive synchronized control flow graph. They also
process invocation. Recursion is also not present approximations that execute in
allowed within any single process. They polynomial time on programs written using
present algorithms based on their modified a simple programming model. Two notable
data-usage sets that operate on a represen- limitations of the model are that no syn-
tation of the program called a Process chronization operations are permitted
Augmented Flowgraph (PAF). This is con- within loops and all synchronization is
structed by taking the flowgraphs of the done with event variables that cannot be
individual processes and connecting them cleared.
with edges to indicate process synchroni-
zation constraints. For example, there 4.2 Parallel (i, j)
would be an edge connecting the “schedule
X” statement in one process with the initial A Boolean function parallel (i, j), which
statement in process X. Their algorithms returns true if it is possible for program
can detect the following: points “i” and “j ” to execute in parallel,
can be used to detect parallel access errors.
(1) a reference to an uninitialized variable, These occur when a variable is being read
and written in parallel or when two pro-
(2) a variable that is referenced while being
cesses can simultaneously write to the same
defined in parallel,
variable. If the program can be represented
(3) a definition of a variable that is never as a Petri net [Peterson 19771, then this
referenced, function can be implemented by examining
(4) a variable that may have an indeter- the reachable states for the net. Unfortu-
minate value, nately, the number of reachable states in a
(5) a process waiting for the completion of bounded Petri net grows exponentially with
an unscheduled process, the number of places (nodes) in the net.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


610 l C. E. McDowell and D. P. Helmbold

An algorithm similar to computing the and Osterweil [ 19801, if a node M is in any


reachable states in a Petri net applied to of the above three execution sets at node
Ada programs is presented by Taylor N, then the nodes N and M would be as-
[1983]. Like the Petri net algorithm, the sumed to execute in parallel. The result of
number of states generated by Taylor’s al- this could be a potentially large number of
gorithm can increase exponentially with extraneous anomaly reports corresponding
the number of parallel tasks in the Ada to infeasible paths.
program. The three functions are actually repre-
An algorithm presented in McDowell sented as execution sequence sets attached
[ 19891 computes parallel (i, j ) for programs to each node of the PAF. The algorithms
written in FORTRAN with extensions to for computing the execution sequence sets
support explicit parallelism. Whereas the are very similar to the dataflow analysis
simple language in Taylor and Osterweil algorithms mentioned in Section 4.1. The
[1980] explicitly prohibits the execution of details of the algorithms can be found in
a process with itself, the algorithm in Bristow et al. [ 1979131.For the nonrecursive
McDowell [1989] uses the fact that many language HAL/S, the execution sequence
parallel numerical applications are ex- sets can all be computed in polynomial
pressed as collections of identical tasks ex- time.
ecuting in parallel on shared data. The The result of applying the analysis men-
result is that many fewer states are gener- tioned above is an anomaly report. It should
ated. This algorithm is being used in a be possible to provide the user with suffi-
prototype debugging tool [Appelbe and cient information to determine what source
McDowell 19851. statements are involved in the anomaly.
A somewhat different approach to com- There are, however, two problems related
puting parallel (i, j ) was taken in Bristow to presenting the anomaly report. One
et al. [1979a]. Their algorithms operate on problem is that the anomaly report may
the same PAF representation of a program contain many anomalies that are the result
described in Section 4.1. They can build of infeasible paths and do not correspond
PAFs for the real language HAL/S. Al- to a real error in the program. Some prog-
though more powerful than the simple lan- ress in removing infeasible paths from
guage used in Taylor and Osterweil [ 19801, static analysis of sequential programs has
HAL/S is still nonrecursive and con- been reported [Werner 19881. There is,
tains relatively simple synchronization however, nothing in the literature concern-
operators. ing the removal of infeasible paths from
Instead of computing a single function, static analysis of concurrent programs.
parallel (i, j), they compute eleven func- The second problem with presenting the
tions which they call execution sequence anomaly report is presenting the informa-
sets. Included are three execution sequence tion in such a way that the user under-
sets: concurrent, always-concurrent, and stands how the erroneous concurrent state
possibly-concurrent. The sets are computed could arise. It may not be sufficient to
for each node in the PAF. The set concur- report that variable X is modified concur-
rent for node N contains all nodes M such rently by a process executing line 100 and
that on all execution paths on which both another process executing line 200. If the
M and N occur, they occur with no forced user cannot understand how lines 100 and
ordering. The set always-concurrent is a 200 can execute in parallel, then it may be
subset of concurrent that satisfies the ad- difficult to determine how to resolve the
ditional restriction that all program exe- problem. Furthermore, the user may simply
cution paths containing N also contain M. decide (erroneously) that this situation
Node M is in the set possibly-concurrent could never arise and that the anomaly
for node N if there is some execution path report should be ignored.
in which both M and N occur with no forced The approach taken in Applebe and
ordering between the two. For use in the McDowell [1988] allows the user to exam-
anomaly detection algorithms of Taylor ine not only the concurrency state causing

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 611

the anomaly report but also the concur- block that is not executed with the given
rency states that led up to that state. A input. The analysis performed to identify
multiwindow user interface is provided that races can also be used to help with break-
displays an anomalous concurrency state point debugging. If the critical point in each
along with a description of the anomaly process involved in a race can be identified,
[McDowell 19881. The concurrency state is then a breakpoint can be placed just before
represented by displaying a small portion that point in each process. This will stop
of the source for each concurrent task in a the system in the state necessary to induce
separate window. The user may then dis- the race and permit close examination. In
play any previous or successor concurrency addition, by selectively continuing the
state to determine how the situation arose. processes, alternative race outcomes can be
This is somewhat like performing a coarse explored.
forward or backward simulation.
4.4 Static Analysis in the Development
4.3 Combining Static Analysis with Process
Dynamic Debugging
In addition to analyzing parallel programs
Taylor [1984] describes several ways in statically, and debugging them with run
which static analysis could be productively time debuggers and monitors, there is the
combined with dynamic analysis. One ap- possibility of eliminating the errors in pro-
proach would be to use the information grams before they occur. Here we would
from static analysis to help develop test like to present some current work that
data for use in conjunction with a dynamic seeks to aid in the development of parallel
debugger. Conversely, information from dy- programs that are free of the kinds of errors
namic monitoring could be used to guide outlined at the beginning of Section 4.
partial static analysis when complete static Automatic vectorizing compilers are the
analysis would generate too many states. If predecessors of the work presented here.
subparts of a program could be shown to be They represent a very restricted form of
free of errors using static analysis, then parallel programs that are free of parallel
those portions of the program would not bugs, assuming, of course, that the compi-
need monitoring. This could reduce the lers are correct. This work has been ex-
overhead associated with monitoring. If tended most notably by Banerjee et al.
run-time assertion testing is included in the [ 19791 to permit parallel execution of a wide
program, then the static analyzer could as- range of loops. Again, assuming that both
sume that the assertions are true, reducing the sequential programs and the compilers
the number of states that must be exam- are correct, the parallel programs that re-
ined. A related technique is the use of sym- sult will also be correct.
bolic execution to reduce the state space of In addition to the fully automatic tech-
a static analysis tool by eliminating infeas- niques of Banerjee et al., researchers at
ible paths [Young and Taylor 19861. Rice University are working on a system
A somewhat different combined use of called PTOOL [Allen et al. 19861. PTOOL
static and dynamic techniques is described performs interprocess dataflow analysis to
in Allen and Padua [ 19871, Miller and Choi determine when a loop can be parallelized.
[1988b], and Stone [1989]. Each of these It does this only for loops selected by the
systems applies static analysis to a dynam- programmer. It interacts with the program-
ically generated trace in order to identify mer for three reasons. First, the amount of
parallel access anomalies that they call time to perform the analysis for all loops
races. If a particular trace can be shown to and all combinations of loops is prohibitive.
be free of races, then the program is free of It is assumed that the programmer under-
races for the given input. This does not stands the overall structure of the program
mean that the program is free of races in and knows which sections are most suitable
general. For example, a race condition for parallelization. Second, the programmer
could be present in a conditionally executed can make a judgment about the typical

ACM Computing Surveys, Vol. 21, No. 4, December 1989


612 l C. E. McDowell and D. P. Helmbold

values of certain variables at run time that a major problem with the animation dia-
affect the decision of whether to parallelize grams is the placement of the symbols rep-
a particular loop. This is particularly im- resenting the processes. Hough and Cuny
portant when the overhead for parallel ex- [1987] make it clear that proper placement
ecution is relatively high. Finally, by can be very important in comprehension of
interacting with the programmer, PTOOL the display (see Figure 8).
can provide information that might permit In addition to the problem of placement
the programmer to change the program is the problem of too much information-
slightly, thereby allowing an important even for a picture. A possible solution is a
loop to be parallelized. In a fully automatic language for abstracting low-level events
system the compiler would have to reject into higher level events for display. Event
the loop as a candidate, possibly missing description languages can also be used
an important opportunity for parallel to filter out irrelevant events, reducing
speedup. the amount of information that must be
By using automatic or semiautomatic displayed.
techniques based on correctness preserving One prominent feature of several systems
transformations, it is possible to debug a is modularity [Joyce et al. 1987; Victor
sequential version of a program using con- 19771. By carefully designing a modular
ventional debugging tools and then trans- system, the addition or modification of var-
form it into an equivalent parallel version. ious features can be managed easily. An
event-based system might have modules for
low-level event monitoring, filtering, and
5. CONCLUSION
recording of events (from the low-level
Having completed the survey, the question event modules), display of recorded events,
remains, What progress has been made and analysis of recorded events, and controlled
where is more work needed? Because of the reexecution of the program. In traditional
diversity of applications, languages, and parallel debuggers as described in Section
systems, no single approach can satisfy all 1, there may be separate modules for inter-
parallel debugging needs. The following acting with the low-level machine and for
paragraphs summarize what has been interacting with the user. “Plug compati-
achieved and speculate on possible research ble” modules are advantageous because
directions. they allow experimentation with different
The deficiencies of both static and dy- debugger functions.
namic techniques have been discussed in A well-defined interface (or hierarchy of
this paper. One promising approach that interfaces) between the user and the low-
alleviates some of these deficiencies is the level machine isolates most components
creation of a toolkit that integrates both from changes in any one part of the system.
approaches (see Section 4.3). The user modules become machine inde-
As the saying goes, “A picture is worth a pendent; the low-level machine modules
thousand words.” With program activities become user interface and language
distributed across both space and time, independent; and possibly some user in-
simple sequential displays of program ac- terface modules may become language
tivity are inadequate. The time-process dia- independent.
grams (see Section 3.2) give a compact view The probe effect is possibly the most
of the event history, whereas the animation significant difference between debugging
diagrams (see Section 3.3) give a more parallel programs and sequential programs.
detailed view of a single instant in time. The most obvious solution to the problem
Both representations are valuable, and the of the probe effect is to have the probes
use of multiwindow workstations makes it permanently in place. This does not help
possible to have both. with breakpoint debugging, but it solves
The appropriateness of each of these dia- the problem for event-based debugging
grams needs further research. For example, using monitoring and event histories. The

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs 613
problem with this solution is the perfor- for expressing them. Furthermore, the issue
mance penalty for having software probes of integrating design specifications with
permanently enabled. The use of hardware run-time checking has not been addressed.
assistance for high-level debugging was A variation of the specification approach
proposed in Gentleman and Hoeksma is taken by EDL [Bates and Wileden 19831.
[ 19831, and systems using hardware moni- They do not check the event history against
toring for multiprocessors are described in specifications but instead transform it into
Lazzerini and Prete [ 19861and Rubin et al. a higher level history. This approach may
[1988]. help bridge the gap between low-level com-
Before hardware designers will dedicate munication primitives and the more ab-
precious silicon to “hooks” for parallel de- stract communication mechanisms used in
buggers, it will be necessary to identify just the program. EDL has been successfully
what hooks are useful. Having specified the used to control the presentation of graphic
hooks, a cost-benefit analysis could deter- data in the Belvedere system [Hough and
mine which low-level debugging elements Cuny 19871.
should be implemented in hardware. Most We have seen some attempts at software
uniprocessors today have hardware hooks solutions to the probe effect. These involve
for breakpointing and single stepping. It some mechanism for the debugger to ma-
seems only natural that hooks for parallel nipulate the logical passage of time. In none
debugging be added to parallel systems. of the systems surveyed was this completely
Event histories may be the most natural successful; real-time events do not lend
abstraction of distributed systems. A vari- themselves well to manipulations of logical
ety of tools and methods for examining time. Although appearing unsolvable in
them have been developed. For small and general, software solutions might be attain-
simple parallel programs, it may suffice to able for some systems such as message-
print the events as they occur. In larger passing systems without timeouts. It
systems, it may be preferable to save the certainly appears that designers of new
event history for later examination. An parallel languages and synchronization
alternative to examining the event history constructs should keep the probe effect in
manually is to check the event history mind.
against a set of specifications as it is gen- Perhaps the best way to avoid the bugs
erated. Although this approach incurs a associated with current parallel constructs
large overhead, it may be the only effective is to design languages for parallel machines
way to monitor large continually executing that make such errors impossible. Dataflow
systems. Ideally, specifications from the and functional languages are one example
program’s design phase would be used to of systems that attempt to “define away the
detect errors. In current systems, the pro- problem.” Still other examples can be found
grammer must write the specifications in declarative languages such as Prolog
for checking the event history. This is or higher level languages as described in
usually done as part of a testing phase and Goldberg [ 19861. A somewhat less radical
is not directly connected to any design approach is the use of tools for automati-
specification. cally detecting parallelism. These may be
The event specification languages we fully automatic or require some user inter-
have seen [Baiardi et al. 1986; Harter et al. action. In either case, such systems would
1985; Helmbold and Luckham 1985b] can ensure that the parallel program produces
all express simple constraints on the event exactly the same results as the sequential
stream. It is unreasonable, however, to ex- version.
pect the programmer to specify completely
the intended behavior of a program with ACKNOWLEDGMENTS
these languages. Much work needs to be We would like to thank Anil Sahai who contributed
done before we know what kinds of asser- to an early version of this survey. We would also like
tions are most useful and the best languages to thank the referees for their comments.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


614 l C. E. McDowell and D. P. Helmbold
APPENDIX A. SUMMARY TABLES Hardware Hardware configuration
debugger uses or re-
The tables in this appendix present brief quires
descriptions of the systems surveyed in a
form that permits quick comparisons. To Status Completeness of debugger
make it possible to place as much infor- implementation
mation as possible in each summary table, partial Prototype missing major
we use one- or two-word descriptors in features
the tables. Informal explanations of the
descriptors are given before the tables. production Production version avail-
able
A.1 General Characteristics Part 1 prototype Complete in-house system
O.S. Operating system debug-
ger runs under n/s Not specified

Table A.l. General Characteristics Part 1


System OS. Hardware Status
Agora [For881 Agora LAN prototype
Amoeba [Els88] Amoeba n/s prototype
belvedere [HC87] simple sim emulator partial
BUGNET [CW82] MICROS MICRONET partial
CBUG [Gai85] UNIX any UNIX prototype
cdbg [Int87] iPSC iPSC prototype
dbxtool [AM861 UNIX(Sun) Sun production
defence n/s uniprocessor partial
DISDEB yiE; Mara Mara prototype
EDL [BW83] VMT UMass VMT UMass partial
HARD [ MCR85] UNIX any UNIX prototype
IDD [HHK85] UNIX network(Sun) partial
Instant [LM87] Chrysalis BBN butterfly prototype
Jade [ JLSU87] Jipc vax/Sun prototype
MAD [RRZ88] n/s mimd sh. bus prototype
Meglos [GK86] UNIX MC68000 production
mtdbx [Gri87] UNIX/COS Sun/Gray prototype
Multibug [CP86] n/s multi-macro-proc prototype
Parasight [AG88] Mach/Umax Multimax prototype
pdbx [Se@61 Dynix Sequent production
Pilgram [Coo871 Mayflower Cambridge DCS partial
PPD [MC88b] n/s n/s partial
RADAR [LR85] n/s PERQ prototype
Recap PLW n/s n/s proposed
Traveler [Man871 Apiary emulator prototype
TSL [HL85b] any any prototype
Voyeur [SBN88] n/s n/s prototype
YODA [LP85] n/s n/s prototype
[AP87] Cedar n/s partial
[BDV86] MuTEAM MuTEAM partial
[GB85] PathPascal n/s partial
[GGKSI] n/s n/s prototype
[GKY88] any any prototype
UNIX any UNIX prototype
Medusa/StarOS Cm* prototype

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 615

A.2 General Characteristics Part 2 Global Clock Whether debugger re-


quires a global clock
Interface How the debugger hooks assumed Debugger assumes the
into the program existence of an accurate
hardware Additional hardware is global clock
used instead of a soft- self-timed Debugger simulates its
ware interface own global time
manual Calls to the debugger are uniprocessor Debugger is for a single
manually inserted by the processor system or
programmer simulation
object Compiler modifies the ob- Languages Languages the debugger
ject code supports
oper sys Debugger interacts with Model The model of communica-
the normal operating tion
system block-send Message passing with
source Automatic insertion of blocking sends
source code statements gmem A bank of global memory
calling the debugger equidistant from all
Probe Effect How the debugger ad- processors
dresses the Probe Effect hybrid Has both shared memory
fast calls Fast monitoring opera- and message passing
tions minimize the effect lmem The shared memory is lo-
on program timing cated at the processors
leave in Leave the debugger in the messages Message passing
system rndzv Ada rendezvous
logical time Logical time hides the WC Remote procedure call
effects of debugging
operations n/s Not specified

Table A.2. General Characteristics Part 2


System Interface Probe Effect Global Clock Languages Model
Agora [For881 oper sys leave in self-timed n/s lmem
Amoeba [Els88] object n/s none n/s messages
belvedere [HC87] object logical time uniprocessor Simple Simon messages
BUGNET [CWSZ] object n/s self-timed Modula2 messages
CBUG [Gai85] source fast calls none C 23-m
cdbg [I&37] oper sys n/s none C, Ftn messages
dbxtool [AM861 oper sys n/s none C, Pscl, Ftn messages
defence [~;UJ object n/s uniprocessor Cont. Euclid monitors
DISDEB hardware none’ none any gmem + lmem
EDL [BW83] n/s n/s assumed n/s n/s
HARD [MCR85] source logical time assumed Ada rndzv + gmem
IDD [HHK85] object n/s none C, ModulaQ messages
Instant [LM87] object leave in none several gmem
Jade [ JLSU87] object n/s none several block-send
MAD [RRZ88] man + hard leave in assumed PARC(C) gmem
Meglos GK861 y-c; n/s none C messages
mtdbx [Gri87] logical time uniprocessor f77 + Cray gmem
Multibug [CP86] oper sys n/s none low level messages
Parasight [AGW oper sys fast calls none C gmem

’ DISDEB uses additional hardware to eavesdrop on the network traffic. This allows the DISDEB debuggers
to run transparently, without disturbing the program’s timing.
(continued)

ACM Computing Surveys, Vol. 21, No. 4, December 1989


616 l C. E. McDowell and D. P. Helmbold
Table A.2. (Continued)
System Interface Probe Effect Global Clock Languages Model
pdbx Bw861 oper sys n/s none C, P, F + dynix gmem
Pilgram [Coo871 object logical time none Cont. Clu WC
PPD [MC88b] object n/s none C gmem
RADAR [LR85] object n/s none Pronet messages
Recap [PL88] object n/s none n/s hybrid
Traveler [Man871 oper sys n/s uniprocessor Acore(lisp) messages
TSL [HL85b] source n/s none Ada rndzv
Voyeur [SBN88] manual n/s assumed several hybrid
YODA [LP85] source n/s assumed Ada rndzv + gmem
[AP87] oper sys n/s none FORTRAN gmem
[BDV86] object logical time none ECSP messages
[GB85] oper sys n/s none Path Pascal gmem
[GGK84] oper sys none none n/s messages
[GKY88] source fast calls none Occam, NIL messages
[MMSSG] oper sys fast calls none C messages
[Sno84] object n/s assumed n/s hybrid

A.3 User Interface windows, win Processes are dis-


played in separate
Exam/Mod State Capabilities for ex- windows
amining/modifying
the program’s state Exam Event
global, glb Global state can be History How user examines
examined a recorded event
ipc Communication state history
can be examined browser Using an
local Local states can be editor/browser
examined ( law-we > Using queries in the
+ Modification of state indicated language
is also possible replay Only examination
sequent (Plans to) interface is to replay the
with a sequential history
debugger scroll tp Scrollable time-
Rename Objects Whether program process diagrams
objects are given Event Lang What language (if
special names dur- any) is used to
ing debugging express patterns
no No objects can be of events
given names
procs Only processes can Control Sched Whether user can
be given names control scheduling
Most objects can be of the program
yes
given names hist History guides to
Graphics Whether debugger scheduling
uses graphics select, se1 Can select which
commun Animated view of process to run next
interprocess sus/cont, SC Can suspend or con-
communications tinue individual
Time-process processes
diagrams can
be displayed n/s Not specified

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 617
Table A.3. User Interface
Examine
Exam/Mod Rename Event Event Control
System State Objects Graphics History Lang Sched
Agora [For881 local no windows replay none sus/cont
Amoeba [El9881 local+ no none replay reg. exp. sus/cont
belvedere [HC87] ipc no commun replay EDL hist
BUGNET [CWSZ] local, ipc no none replay none sus/cont
CBUG [Gai85] local, ipc no windows none none sus/cont
cdbg [Int87] local+ yes none none none sus/cont
dbxtool [AM861 local+ no windows none none sus/cont
defence [Web831 local+ no none none none sus/cont
DISDEB [LP86] global+ no none none sus/cont
EDL [BW83] n/s no none replay ;DL none
HARD [MCR85] glb+, ipc+ no none none Adad sus/cont
IDD [HHK85] local, ipc no tP scroll tp int. log. SC,hist
Instant [LM87] sequent no tP” replay none hist
Jade [ JLSU87] sequent yes tp, commun browser none sel, hist
MAD [RRZ88] global no tP browser path rules n/s
Meglos [GK86] local+ no none none none sus/cont
mtdbx [Gri87] global+ no tp, win scroll tp none SC,sel, hist
Multibug [CP86] local+ yes none none none sus/cont
Parasight [AG88] local+ no none none none sus/cont
pdbx L%W local+ no windows none none sus/cont
Pilgram [Coo871 global+ no none none none sus/cont
PPD [MC88b] local, ipc no b replay none sus/cont
RADAR [ LR85] ipc no commun replay none none
Recap [PL88] sequent no none replay none hist
Traveler [Man871 n/s no windows browser none select
TSL [HL85b] sequent yes none browser TSL select
Voyeur [SBN88] global yes programmable browser none none
YODA [LP85] ipc no none prolog none none
[AP87] n/s no none none none none
[BDV86] sequent no none none BS none
[GB85] global+ no win, commun replay yes, n/s sus/cont
[GGK84] local+ no none scroll tp yes, n/s sus/cont
[GKY88] glb+, ipc+ no windows browser temp. logic SC, select
[MMS86] n/s no none browser none none
[Sno84] local no none TQuel TQuel none
“In Fowler et al. [1988] a toolkit by the same authors includes process time diagrams that require a global
clock. The process time diagrams are not presented in the 1987 paper.
b PPD generates and displays dynamic dependence graphs.
’ DISDEB allows complex events to be built out of very low-level machine code like events using a low-level
language. For example, the language can only refer to physical addresses rather than using identifiers from the
source code.
d There is a facility for calling the debugger from special tasks. These tasks can be used to implement arbitrarily
complex breakpoints.

A.4 Breakpoints local Breakpoints can


be set on local
State Breakpoints Types of state
state-based stmt Breakpoints
breakpoints can be set at
supported a source
global Breakpoints can statement
be set on global Event Breakpoints Types of
(and local) event-based
state breakpoints

ACM Computing Surveys, Vol. 21, No. 4, December 1989


618 . C. E. McDowell and D. P. Helmbold

mult. ( language) Breakpoints on during program


conjunction, execution
disjunction, or Breakpoint Effect What is halted
repetition of when a break-
events point is reached
seq. ( language ) Breakpoints on either Either one pro-
complex cess or the
sequence entire program
of events may be halted
process One process is
single Breakpoints on halted
the occurrence program The entire pro-
of single events gram is halted
Modify Breakpoints Whether break-
points can be n/a Not applicable
added/disabled n/s Not specified

Table A.4. Breakpoints


State Event Modify Breakpoint
System Breakpoints Breakpoints Breakpoints Effect
Agora [For881 local single yes -.
Amoeba [El&] local + stmt seq(reg. expr.) yes either
belvedere [HC87] no none no n/a
BUGNET n/s single yes either
CBUG ME; stmt none yes process
cdbg [I&37] local + stmt single yes process
dbxtool [AM861 local + stmt none yes process
defence [Web831 local + stmt none yes program
DISDEB [LP86] no multiple no either
EDL [BW83] no none n/a n/a
HARD [MCR85] local + stmt multiple remove process
IDD [HHK85] global seq n/s program
Instant [LM87] local + stmt none yes program
Jade [JLSU87] no single yes
MAD [ RRZ88] n/s n/s n/a n/a
Meglos [GK86] local single yes either
mtdbx [Gri87] local + stmt none yes either
Multibug [CP86] local + stmt single yes process
Parasight [AG88] stmt none yes process
pdbx Pw861 local + stmt none yes process
Pilgram [Coo871 stmt none 3-s process
PPD [MC88b] stmt + local none yes program
RADAR none n/a n/a
Recap EEli; gal + stmt n/s yes process
Traveler [Man871 no no (planned) n/a n/a
TSL [HLSW] no seq(TSL) no process
Voyeur [SBN88] n/s n/s n/s n/s
YODA [LP85] no none n/a n/a
[AP87] n/a n/a n/a n/a
[BDV86] local seq(BS) no process
[GB85] -2 yes process
[GGKM] $a1 + global seq yes either
[GKY88] stmt + global single no program
[MMS86] no none n/a n/a
[ Sno84] no none n/a n/a

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs l 619
A.5 Event Monitoring history Using predicates on the
history of events
Event Type What is an event (language > Specified language
ipc Every (explicit) interpro- describes “interesting”
cess communication events
sh mem Shared memory references local Local state
stmt Each statement execution process Specifying important
History Kind of event history process(es)
recorded Replay How complete are the
buffer Last n events are stored in replay facilities
a buffer commun Communication state can
chk pt All events since the last be deduced
checkpoint are saved complete Entire state (including
complete All events are recorded and local vars) is available
preserved Ordering How are event histories
sparse Some events are kept; ordered
others are not linear All events are forced into a
Filtering How the information linear history
recorded (or replayed) partial “Concurrent” events are
can be reduced not ordered
event Using predicates on single
events n/a Not applicable
global Global state n/s Not specified

Table AS. Event Monitoring


System Event Type History Filtering Replay Ordering
Agora [For881 sh mem chk. pt. n/s complete partial
Amoeba [Els88] ipc chk. pt. history complete partial
belvedere [HC87] ipc complete none commun partial
BUGNET [CW82] ipc chk. pt. proc, event complete linear
CBUG [Gai85] ipc none none none n/a
cdbg [Int87] ipc none n/a n/a n/a
dbxtool [AM861 stmt none none none n/a
defence [Web831 stmt none n/a n/a n/a
DISDEB [LP86] ipc, sh mem none a none linear
EDL [BW83] n/s complete EDL commun linear
HARD [MCR85] stmt none none none partial
IDD [HHK85] ipc buffer proc, event none linear
Instant [LM87] sh mem complete none complete partial
Jade [JLSU87] ipc complete proc, event complete linear
MAD stmt sparse history none linear
Meglos [gEEi; sh mem none none none partial
mtdbx [Gri87] ipc complete none complete linear
Multibug [CP86] ipc none n/a n/a n/a
Parasight [AG88] stmt none none none n/a
pdbx [SeqW stmt none none none n/a
Pilgram [Coo871 stmt none none none n/a
PPD [MC88b] sh mem complete none none partial
RADAR [ LR85] ipc complete none complete partial
Recap [PL88] ipc, sh mem chk. pt. process complete partial

’ DISDEB allows complex events to be built out of very low-level machine code like events using a low-level
language. For example, the language can only refer to physical addresses rather than using identifiers from the
source code.
’ Filtering is done by transactions. The nested function calls can be hidden, giving a clearer picture of the high-
level activity (see Section 3.1).
(continued)

ACM Computing Surveys, Vol. 21, No. 4, December 1989


620 . C. E. McDowell and D. P. Helmbold
Tadle AS. (Continued)
System Event Type History Filtering Replay Ordering
Traveler [Man871 ipc complete b none partial
TSL [HL85b] ipc complete TSL (planned) linear
Voyeur [SBN88] stmt complete n/s none linear
YODA [LP85] ipc, sh mem complete none none linear
[AP87] ipc, sh mem sparse none none partial
[BDV86] ipc none n/a n/a n/a
[GB85] ipc complete n/s commun partial
[GGK84] n/s complete suggested complete partial
[GKY88] ipc, stmt complete history complete linear
[MMS86] ipc complete event none linear
[Sno84] stmt sparse event none linear

REFERENCES anomaly detection in HAL/S programs. Tech.


Rep. CU-CS-151-79. Univ. of Colorado at Boul-
[ABKP86] ALLEN, R., BAUMGARTNER, D., KEN- der.
NEDY, K., AND PORTERFIELD, A. 1986. Ptool: A BDV86] BAIARDI, F., DEFRANCESCO, N., AND
semiautomatic parallel programming assistant. VAGLINI, G. 1986. Development of a debugger
In Proceedings of the International Conference on for a concurrent language. IEEE Trans. Softw.
Parallel Processing. IEEE, pp. 164-170. Eng. SE-12,4 (Apr.), 547-553.
[AG88] ARAL, Z., AND GERTNER, I. 1988. High-level BW83] BATES, P. C., AND WILEDEN, J. C. 1983.
debugging in parasight. In Proceedings of Work- High-level debugging of distributed systems: The
shop on Parallel and Distributed Debugging. behavioral abstraction approach. J. Syst. Softw.
ACM, pp. 151-162. 3, 255-264. Also COINS Tech. Rep. #83-29.
[AM851 APPELBE, W. F., AND MCDOWELL, C. E. 1985. [CL851 CHANDY, K. M., AND LAMPORT, L. 1985.
Anomaly reporting: A tool for debugging and Distributed snapshots: Determining global states
developing parallel numerical algorithms. In Pro- of distributed systems. ACM Trans. Comput.
ceedings of the 1st International Conference on Syst. 3, 1 (Feb.), 63-75.
Supercomputing Systems. IEEE, pp. 386-391. [Coo871 COOPER, R. 1987. Pilgram: A debugger for
[AM861 ADAMS, E., AND MUCHNICK, S. S. 1986. distributed systems. In Proceedings of the 7th
Dbxtook A window-based symbolic debugger for International Conference on Distributed Comput-
sun workstations. Softw. Pratt. Exper. 16,7,653- ing Systems. IEEE, pp. 458-465.
669. [CP86] CORSINI, P., AND PRETE, C. A. 1986.
[AM881 APPELBE, W. F., AND MCDOWELL, C. E. 1988. Multibug: Interactive debugging in distributed
Developing multitasking applications programs. systems. IEEE Micro 6, 3, 26-33.
In Proceedings of Hawaii International Confer- [CS881 CALLAHAN, D., AND SUBHLOK, J. 1988.
ence on System Sciences. IEEE, pp. 94-101. Static analysis of low-level synchronization. In
[AP87] ALLEN, T. R., AND PADUA, D. A. 1987. Proceedings of Workshop on Parallel and Distrib-
Debugging FORTRAN on a shared memory uted Debugging. ACM, pp. 100-111.
machine. In Proceedings of the International Con- [CW82] CURTIS, R. S., AND WITTIE, L. D. 1982.
ference on Parallel Processing. Penn State BugNet: A debugging system for parallel pro-
University, pp. 721-727. gramming environments. In Proceedings of the
3rd International Conference on Distributed Com-
[Bat881 BATES, P. 1988. Debugging heterogeneous puting Systems. ACM, pp. 394-399.
distributed systems using event-based models of
behavior. In Proceedings of Workshop on Parallel [Els88] ELSHOFF, I. J. P. 1988. A distributed debug-
and Distributed Debugging. ACM, pp. 11-22. ger for amoeba. In Proceedings of Workshop on
Parallel and Distributed Debugging. ACM, pp.
[BCKT79] BANERJEE, U., CHEN, S., KUCK, D. J., l-10.
AND TOWLE, R. A. 1979. Time and parallel pro- [Fid88] FIDGE, C. J. 1988. Partial orders for parallel
cessor bounds for fortran-like loops. IEEE Trans.
debugging. In Proceedings of Workshop on
Comput. 28,9 (Sept.), 660-670.
Parallel and Distributed Debugging. ACM, pp.
[BDER79a] BRISTOW, G., DRAY, C., EDWARDS, B., 183-194.
AND RIDDLE, W. 1979. Anomaly detection in [FLM88] FOWLER, R. J., LEBLANC, T. J., AND
concurrent programs. In Proceedings of the 4th MELLOR-CRUMMEY, J. M. 1988. An integrated
International Conference on Software Engineer- approach to parallel program debugging and per-
ing. IEEE. formance analysis on large-scale multiprocessors.
[BDER79b] BRISTOW, G., DREY, C., EDWARDS, B., In Proceedings of Workshop on Parallel and Dis-
AND RIDDLE, W. 1979. Design of a system for tributed Debugging. ACM, pp. 163-173.

ACM Computing Surveys, Vol. 21, No. 4, December 1989


Debugging Concurrent Programs 621
[F076] FOSDICK,L. D., AND OSTERWEIL,L. J. 1976. [HW88] HABAN, D., AND WEIGEL, W. 1988. Global
Data flow analysis in software reliability. ACM events and global breakpoints in distributed sys-
Comput. Surv. 8 (Sept.), 305-330. tems. In Proceedings of Hawaii Znternational Con-
[For881 FORIN, A. 1988. Debugging of heteroge- ference on System Sciences. IEEE, pp. 166-175.
neous parallel systems. In Proceedings of Work- [Int87] INTEL CORP.1987. iPSC Concurrent Debug-
shop on Parallel and Distributed Debucminp.
-- - ger Manual.
AC-M, pp. 130-140. [JLSU87] JOYCE, J., LOMOW, G., SLIND, K., AND
[Gai85] GAIT, J. 1985. A debugger for concurrent UNGER, B. 1987. Monitoring distributed sys-
programs. Softw. Pratt. Exper. 15,6, 539-554. tems. ACM Trans. Comput. Syst. 5, 2 (May),
[GB85] GARCIA, M. E., AND BERMAN, W. J. 1985. 121-150.
An approach to concurrent systems debugging. In [Kar87] KARP, A. H. 1987. Programming for paral-
Proceedings of the 5th International Conference lelism. Computer 20, 5, 43-57.
on Distributed Computing Systems. IEEE, pp. [Lam781 LAMPORT, L. 1978. Time, clocks, and the
507-514. ordering of events in a distributed system.
[GGK84] GARCIA-M• LINA, H., GERMANO, F., JR., Commun. ACM 21,7,558-565.
AND KOHLER, W. H. 1984. Debugging a distrib- [LM87] LEBLANC, T. J., AND MELLOR-CRUMMEY,
uted computing system. IEEE Trans. Softw. Eng. J. M. 1987. Debugging parallel programs with
SE-IO, 2 (Mar.), 210-219. instant replay. IEEE Trans. Comput. C-36, 4
[GH83] GENTLEMAN, W. M., AND HOEKSMA, H. (Apr.), 471-482.
1983. Hardware assisted high level debugging. [LP85] LEDOUX, C. H., AND PARKER,D. S., JR. 1985.
SZGPLAN Notices 18,8 (August), 140-144. Saving traces for ada debugging. In Ada In Use,
[GK86] GAGLIANELLO, R. D., AND KATSEFF, H. P. Proceedings of the Ada International Conference.
1986. The meglos user interface. In Proceedings ACM, Cambridge University Press, pp. 97-108.
of Full Joint Computer Conference. ACM, pp. [LP86] LAZZERINI, B., AND PRETE, C. A. 1986.
169-177. Disdeb: An interactive high-level debugging sys-
[GKY88] GOLDSZMIDT, G., KATZ, S., AND YEMINI, tem for a multi-microprocessor system. Micropro-
S. 1988. Interactive blackbox debugging for con- cess. Microprogram. 18, 401-408.
current languages. In Proceedings >? %‘ork.shop [LR85] LEBLANC, R. J., AND ROBBINS, A. D. 1985.
on Parallel and Distributed Debu,&ng. ACM, Event-driven monitoring of distributed programs.
pp. 271-282. In Proceedings of the 5th International Conference
[Go1861 GOLDBERG,A. T. 1986. Knowledge-based on Distributed Computing Systems. IEEE, pp.
programming: A survey of program design and 515-522.
construction techniques. IEEE Trans. Softw. [Man871 MANNING, C. R. 1987. Traveler: The api-
Eng. SE-12,4 (Apr.), 752-768. ary observatory. In Proceedings of European
[GR85] GEHANI, N. H., AND ROOME, W. D. 1985. Conference on Object Oriented Programming.
Concurrent C. Tech. Rep., AT&T Bell Labora- pp. 97-105.
tories. [MC88a] MILLER, B. P., AND CHOI, J.-D. 1988a.
[Gri87] GRIFFIN,J. 1987. Parallel debugging system Breakpoints and halting in distributed systems.
user’s guide. Tech. Rep., Los Alamos National In Proceedings of International Conference on
Distributed Computing Systems. IEEE.
Laboratory.
[MC88b] MILLER, B. P., AND CHOI, J.-D. 1988. A
[HC87] HOUGH, A. A., AND CUNY, J. 1987. mechanism for efficient debugging of parallel pro-
Belvedere: Prototype of a pattern-oriented debug- grams. In Proceedings of Workshop on Parallel
ger for highly parallel computation. In Proceed- and Distributed Debugging. ACM, pp. 141-150.
ings of the International Conference on Parallel
Processing. Penn State University, pp. 735-738. [McD88] MCDOWELL, C. E. 1988. Viewing anoma-
lous states in parallel programs. In Proceedings
[HHK85] HARTER, P. K., JR., HEIMBIGNER, D. M., of the Znternutionul Conference on Parallel Pro-
AND KING, R. 1985. IDD: An interactive distrib- cessing. Penn State University, pp. 54-57.
uted debugger. In Proceedings of the 5th Znter-
national Conference on Distributed Computing [McD89] MCDOWELL, C. E. 1989. A practical algo-
Systems. IEEE, pp. 498-506.
rithm for static analysis of parallel programs.
Journal of Parallel and Distributed Computing 6,
[HL85a] HELMBOLD, D., AND LUCKHAM, D. 1985. 3 (June), 515-536.
Debugging ada tasking programs. IEEE Softw. 2, [MCR85] DI MAIO, A., CERI, S., AND REGHIZZI, S.
2, 47-57. C. 1985. Execution monitoring and debugging
[HL85b] HELMBOLD, D., AND LUCKHAM, D. 1985. tool for ada using relational algebra. In Ada In
TSL: Task Sequencing Language. In Ada In Use, Use, Proceedings of the Ada International Confer-
Proceedings of the Ada International Conference. ence. ACM, Cambridge University Press.
ACM, Cambridge University Press. [MMS86] MILLER, B. D., MACRANDER, C., AND
[HU75] HECHT, M. S., AND ULLMAN, J. D. 1975. A SECHREST, S. 1986. A distributed programs
simple algorithm for global data flow analysis monitor for Berkeley UNIX. Softw. Pruct. Exper.
problems. SIAM J. Comput. 4,519-532. 16,2,183-200.

ACM ComputingSurveys,Vol. 21,No. 4, December1989


622 l C. E. McDowell and D. P. Helmbold
[Pet771 PETERSON, J. L. 1977. Petri nets. ACM [Sto88] STONE, J. M. 1988. A graphical represen-
Comput. Surv. 9, 3 (Sept.), 223-252. tation of concurrent processes. In Proceedings of
[PL88] PAN, D. Z., AND LINTON, M. A. 1988. Workshop on Paralleiand Distributed Debugging.
Supporting reverse execution of parallel pro- ACM. Published as SZGPLAN Notices 24.1 (Jan- I

arams. In Proceedings of Workshop on Parallel uary 1989). pp. 226-235.


and Distributed Debugging. ACM. Published [Sun861 SUN MICROSYSTEMS. 1986. NeWS Prelim-
as SZGPLAN Notices 24, 1 (January 1989). pp. inary Technical Overview.
124-129. [Tan811 TANENBAUM, A. S. 1981. Computer Net-
lRRZ881 RUBIN, R. V., RUDOLPH, L., AND ZERNIK, works. Prentice-Hall, Englewood Cliffs, N.J.
D. i988. Debugging parallel programs in paral- [Tay83] TAYLOR, R. N. 1983. A general-purpose
lel. In Proceedings of Workshop on Parallel and algorithm for analyzing concurrent programs.
Distributed Debugging. ACM. Published as SZG- CACM 26,5,362-376.
PLAN Notices 24, 1 (January 1989). pp. 216-225. [Tay84] TAYLOR, R. N. 1984. Debugging real-time
lSBN881 SOCHA, D., BAILEY, M. L., AND NOTKIN, D. software in a host-target environment. Tech. Rep.
1988. Voyeur: Graphical views of parallel pro- 212, Univ. of California at Irvine.
mams. In Proceedings of Workshop on Parallel [TO801 TAYLOR, R. N., AND OSTERWEIL, L. J. 1980.
and Distributed Deb&g&. ACM. -Published as Anomaly detection in concurrent software by
SZGPLAN Notices 24, 1 (January 1989). pp. static data flow analysis. IEEE Trans. Softw.
206-215. Eng. SE-6,3 (May), 265-278.
[Seq86] SEQUENT CORP. 1986. Dynix Pdbx Parallel [Vic77] VICTOR, K. E. 1977. The design and imple-
Debugger User’s Manual. mentation of DAD, a multiprocess, multima-
chine, multilanguage interactive debugger. In
[SG86] SCHEIFLER, R. W., AND GETTYS, J. 1986. Proceedings of Hawaii International Conference
The X window system. ACM Trans. Graph. 5, 2 on System Sciences. IEEE, pp. 196-199.
(Apr.). [Web831 WEBER, J. C. 1983. Interactive debugging
[Sno84] SNODGRASS, R. 1984. Monitoring in a soft- of concurrent programs. SZGPLAN Notices 18,8,
ware development environment: a relational 112-113.
approach. In Proceedings of the Software [Wer88] WERNER, L. L. 1988. Fault detection in
Engineering Symposium on Practical Software production programs by means of data usage
Development Environments. SIGPLAN, ACM analysis. Ph.D. dissertation UCSD.
SIGSOFT. [YT86] YOUNG, M., AND TAYLOR, R. N. 1986.
[ST831 SEIDNER, R., AND TINDALL, N. 1983. Combining static concurrency analysis with sym-
Interactive debug requirements. SZGPLAN No- bolic execution. In Proceedings of Workshop on
tices 9-22. Software Testing. pp. 10-178.

Received July 1988; final revision accepted January 1989.

ACM Computing Surveys, Vol. 21, No. 4, December 1989

You might also like