Debugging Concurrent Programs
Debugging Concurrent Programs
The main problems associated with debugging concurrent programs are increased
complexity, the “probe effect,” nonrepeatability, and the lack of a synchronized global
clock. The probe effect refers to the fact that any attempt to observe the behavior of a
distributed system may change the behavior of that system. For some parallel programs,
different executions with the same data will result in different results even without any
attempt to observe the behavior. Even when the behavior can be observed, in many
systems the lack of a synchronized global clock makes the results of the observation
difficult to interpret. This paper discusses these and other problems related to debugging
concurrent programs and presents a survey of current techniques used in debugging
concurrent programs. Systems using three general techniques are described: traditional or
breakpoint style debuggers, event monitoring systems, and static analysis systems. In
addition, techniques for limiting, organizing, and displaying a large amount of data
produced by the debugging systems are discussed.
Categories and Subject Descriptors: A.1 [General Literature]: Introductory and Survey;
D.1.3 [Programming Techniques]: Concurrent Programming; D.2.4 [Software
Engineering]: Program Verification-assertion checkers; D.2.5 [Software
Engineering]: Testing and Debugging-debugging aids; diagnostics; monitors; symbolic
execution; tracing
Additional Key Words and Phrases: Distributed computing, event history,
nondeterminism, parallel processing, probe-effect, program replay, program visualization,
static analysis
‘Ada is a registered trademark of the U.S. Govern- The classic approach to debugging sequen-
ment (Ada Joint Program Office). tial programs involves repeatedly stopping
This work was supported in part by IBM grants SL87033 and SL88096.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM 0360-0300/89/1200-0593 $01.50
the program during execution, examining existent [Lamport 19781. Without a syn-
the state, and then either continuing or chronized global clock, it may be difficult
reexecuting in order to stop at an earlier to determine the precise order of events
point in the execution. This style of debug- occurring in distinct, concurrently execut-
ging is called cyclical debugging. Unfortu- ing processors.
nately, parallel programs do not always
have reproducible behavior. Even when Basic Approaches
they are run with the same inputs, their Some researchers distinguish between
results can be radically different. These monitoring and traditional debugging
differences are caused by races, which occur [Joyce et al. 19871. Monitoring is the pro-
whenever two activities are allowed to pro- cess of gathering information about a pro-
gress in parallel. For example, one process gram’s execution. Debugging, as defined in
may attempt to write a memory location the current ANSI/IEEE standard glossary
while a second process is reading from that of software engineering terms, is “the pro-
memory cell. The second process’s behavior cess of locating, analyzing, and correcting
may differ radically, depending on whether suspected faults,” where a fault is defined
its reads the new or old value. to be an accidental condition that causes a
The cyclical debugging approach often program to fail to perform its required func-
fails for parallel programs because the un- tion. Since monitoring is often an effective
desirable behavior may not appear when procedure for locating incorrect behavior,
the program is reexecuted. If the undesira- it should be considered a debugging tool.
ble behavior occurs with very low probabil- For the purposes of this survey, tech-
ity, the programmer may never be able to niques for debugging concurrent systems
recreate the error situation. In fact, any have been organized into four groups:
attempt to gain more information about the
program may contribute to the difficulty of (1) Traditional debugging techniques can
reproducing the erroneous behavior. This be applied with some success to parallel
has been referred to as the “Heisenberg programs. These are discussed in Sec-
Uncertainty” principle applied to software tion 1.
[LeDoux and Parker 19851 or the “Probe (2) Event-based debuggers view the exe-
Effect” [Gait 19851. For programs that con- cution of a parallel program as a se-
tain races, any additional print or debug- quence (or several parallel sequences)
ging statements may modify a crucial race, of events. The generation and analysis
lowering the probability that the interest- of these sequences or event histories is
ing behavior occurs. This interference can the subject of Section 2.
be disastrous when attempting to diagnose (3) Techniques for displaying the flow of
an error in a parallel program. control and distributed data associated
The nondeterminism arising from races with parallel programs are presented in
is particularly difficult to deal with because Section 3.
the programmer often has little or no con- Static analysis techniques based on
trol over it. The resolution of a race may
(4)
dataflow analysis of parallel programs
depend on each CPU’s load, the amount of are presented in Section 4. These tech-
network traffic, and nondeterminism in the niques allow some program errors to
communication medium (e.g., exponential be detected without executing the
backoff protocols [Tannenbaum 1981, pp. program.
292-2951). It is this nondeterministic
behavior that tends to make understand- This survey covers a large number of
ing, writing, and debugging parallel pro- research and commercial projects designed
grams more difficult than their sequential to help produce error-free concurrent soft-
counterparts. ware. It focuses primarily on systems that
An additional problem found in distrib- are directed toward isolating program er-
uted systems is that the concept of “global rors. A large body of work in formal pro-
state” can be misleading or even non- gram verification and in program testing
has been explicitly excluded from this sur- They have the potential of identifying a
vey. Most of the systems surveyed fall into large class of program errors that are par-
one of two general categories, traditional ticularly difficult to find using current dy-
parallel debuggers (or what are sometimes namic techniques. These techniques have
called “breakpoint” debuggers) and event- been applied mostly to parallel versions of
based debuggers. Of course, some systems FORTRAN that do not support recursion.
contain aspects of both classes. All of the As with the event-based debuggers, static
systems (or in some cases proposed sys- analysis systems are still in the prototype
tems) in these two general categories are stage. The primary problem with most
listed in the tables in Appendix A. static analysis algorithms is that their
In addition to traditional parallel debug- worst-case computational complexity is
gers and event-based debuggers, some static often exponential.
analysis systems are included. The static All three types of debugging systems have
analysis systems surveyed fall somewhere made some progress in presenting the com-
between debugging and testing. The static plex concurrent program state and the ac-
analysis systems are distinguished from companying massive amounts of data to
testing by not requiring program execution the user. Multiple windows is a useful
and by generally checking for structural mechanism for interfacing with traditional
faults instead of functional faults. That is, style debuggers for parallel systems. The
the analysis tools have no knowledge of the abstraction capabilities of event-based de-
intended function of the program and sim- buggers (see Section 2) have been used to
ply identify program structures that are present interesting and potentially useful
generally indicative of an error. These views of system states graphically [Hough
systems do not appear in the comparison and Cuny 19871.
table in Appendix A but are discussed in
Section 4. 1. EXTENDING TRADITIONAL DEBUGGING
Each of the three types of systems sur- TO PARALLEL PROGRAMS
veyed takes a different approach to the
debugging problem. The traditional parallel The simplest type of debugger to imple-
debuggers are the easiest to build and there- ment for parallel systems is (or behaves
fore provide an immediate partial solution. like) a collection of sequential debuggers,
They provide some control over program one per parallel process. To date, all com-
execution and provide state examination. mercially available debuggers for parallel
They are also severely limited by the probe programs fit this description. The primary
effect. differences lie in how the output from the
Event-based debuggers provide better several sequential debuggers is displayed
abstraction than that provided by tradi- and how the separate sequential debuggers
tional style debuggers. They also address are controlled. We will call these collections
the probe effect by permitting deterministic of sequential debuggers traditional parallel
replay of nondeterministic programs. If it debuggers.
is not possible to record event histories The probe effect, discussed in the Intro-
continuously, however, the probe effect will duction, has gone mostly unaddressed by
still be a problem. Also, event-based debug- traditional parallel debuggers. This makes
gers are generally research prototypes, ap- traditional parallel debuggers ineffective
plicable only to systems without shared against timing dependent errors. The probe
memory. A notable exception is instant effect, however, does not always rear its
replay [LeBlanc and Mellor-Crummey ugly head, allowing many program errors
19871, which supports event tracing and to be isolated using traditional cyclic de-
replay on the shared memory BBN Butter- bugging techniques. This can be attributed
fly provided OS protocol routines are used to two factors. First, those errors in parallel
for all shared memory accesses. programs that are not timing dependent
Static analysis tools avoid the probe ef- would never be masked by the probe effect.
fect entirely by not executing the programs. Second, even for timing related errors, the
effect of the probe may not disturb the The Sun Microsystems’ dbxtool is an ex-
outcome of the critical races. ample of applying a set of sequential de-
Another criticism of traditional parallel buggers to concurrent programs without
debuggers is that they operate at too low a any explicit coordination. It is capable of
level. For programs consisting of many con- attaching to an existing UNIX’ process,
currently executing processes, the major making it possible to debug a system of
difficulty may be in understanding what is communicating UNIX processes by attach-
happening at the interprocess level. Tradi- ing a separate copy of dbxtool to each pro-
tional debugging techniques work well for cess. (The UNIX process may not contain
viewing the behavior at the instruction process creation calls such as “fork,” and
level or at the procedure level. In Section the executable image being debugged can-
2.4 some recent developments for viewing not be shared.)
program behavior at a more abstract level An alternative to relying on a window
are presented. manager to direct commands to the proper
sequential debugger is to control all of the
debuggers from a single terminal or window
1.1 Coordinating Several Sequential [Sequent Corp. 19861. Commands are then
Debuggers directed at a specific process using a com-
In addition to the sequential capabilities of mand parameter or by defaulting to a
standard sequential debuggers, traditional specific “current” process. For example,
parallel debuggers should be able to do the “continue Pl” would continue process Pl,
following: and “continue” without a parameter would
continue the “current” process. The “cur-
1. direct any sequential debugger com- rent” process can be changed at any time.
mand to a specific task, The use of a single control window also
2. direct any sequential debugger com- permits the commands to be sent to all
mand to an arbitrary set of tasks, processes. For example, “continue all”
3. differentiate the terminal output from would continue all currently suspended
the different tasks. processes. In general, all processes will not
receive the command at the same instant.
The most primitive debugger for parallel The commands will, however, arrive at
programs would be nothing more than a times that differ by an amount approxi-
sequential debugger capable of attaching to mating the communication delay in the
any single process in a parallel program. system. If all processes could be instanta-
All that would then be necessary is to pro- neously stopped (and started) then, in the
vide the user with multiple real or virtual absence of timeouts, “stop all” breakpoints
terminals from which to execute the mul- would not cause any probe effect. This is,
tiple copies of the debugger. Today’s mul- of course, impossible, but anything that can
tiple window workstations make this more be done to minimize the time difference
practical than it might have been a few for receipt of stop signals should reduce
years ago. With a window manager [Schei- the probe effect. In addition to reducing
fler and Gettys 1986; Sun Microsystems the probe effect, broadcasting a single
19861 points 1 and 3 could be satisfied by command to a set of processes is a useful
selecting the desired window. Satisfying feature.
point 2 could be achieved simply by repeat- The “current” task notion is generalized
ing the desired command in each of the in Griffin [1987] and Intel Corp. [1987] to
desired windows. This approach, however, a current set of processes that all receive
would become fairly unwieldy for more any process-related commands. In Griffin
than a few processes. Furthermore, the time [ 19871, processes can be added to or re-
lapse between sending the command to the moved from the set simply by pointing to a
first and last processes in the set could symbol for the process in a special window.
aggravate the probe effect. This is particu-
larly true for commands such as “stop” and
“continue.” * UNIX is a trademark of AT&T Bell Laboratories.
This could be generalized to permit arbi- notion of logical time that stops when any
trary groupings of processes. For instance, process reaches a breakpoint [Cooper 1987;
it might be desirable to alternate com- DiMaio et al. 19851. In systems that sus-
mands between two disjoint sets of pro- pend only the selected process, other pro-
cesses. With only a single “current” set and cesses will continue to execute until they
no overlap between the desired sets, this encounter some explicitly time-dependent
would require as many commands from the operation. In that case, the logical clock is
user as would be required with no support the one used for time in the time-dependent
for process grouping. It would, however, operation. For example, if a breakpoint is
still reduce the probe effect. It appears that encountered, no timeouts will expire until
the macro capability of Intel Corp. [1987] the suspended process is continued. In sys-
combined with their “context” command tems that stop all processes upon encoun-
for specifying the current set of pro- tering a breakpoint, logical time is stopped
cesses would support this toggling back and so that all of the suspended processes can
forth between disjoint sets of processes. be continued with minimal impact. This
will certainly not eliminate the probe effect,
but it can permit some traditional style
1.2 Breakpoints debugging in the presence of such explicitly
time-dependent operations.
The ability to set breakpoints is possibly The domain of expressions or predicates
the most important feature of a sequential used to describe a breakpoint is larger for
debugger. (Since tracing is equivalent to parallel programs than for sequential pro-
setting a breakpoint that, when encoun- grams. These predicate expressions may
tered, prints some information and auto- involve both process state and events. An
matically continues, the discussion in this event can be loosely defined as any atomic
section will refer only to breakpoints.) Tra- action visible beyond the scope of a single
ditional parallel debuggers generally sup- process.
port the same types of breakpoints as those Predicates involving global state in an
found in sequential debuggers. These executing parallel program can be a prob-
breakpoints include stop at a source state- lem. This results from the lack of global
ment, stop on the occurrence of an excep- clock in most systems. For example, an
tion or some user detectable event, stop expression such as “process A never modi-
when a specific variable is accessed, and fies variable X while process B is modifying
stop when some conditional expression is variable X” may appear to be true due to
satisfied [Seidner and Tindall 19831. Un- the delay in communicating this informa-
like sequential debuggers, there are two tion to the debugger, when in fact concur-
possible actions to take when a breakpoint rent modification has occurred. The use of
is encountered. Either all of the processes events and some notion of consistent global
in the parallel program can be stopped im- time can be used to address this (see Sec-
mediately or only the process encountering tion 2). Possibly more important is that it
the breakpoint can be stopped. The former may not be possible to stop the desired
can be difficult to achieve within a suffi- processes after detecting that the predicate
ciently small interval of time, and the latter is satisfied yet before the state has changed.
can have a serious impact on systems that The distinction between events and
contain such things as timeouts. Assuming global states is admittedly vague in general
message passing is the communication but is usually well defined for any particular
mechanism, an algorithm to stop all pro- system. For example, an event-based pred-
cesses in a consistent state is presented in icate might be “process A sends a message,”
Miller and Choi [1988a]. and a global state-based predicate might be
Using breakpoints to debug systems with “the message buffer contains a message
explicit time-dependent operations (such as from process A.” This could even be repre-
timeouts) can be especially difficult. Some sented as a collection of program counter-
systems have attempted to deal with such based breakpoints, one immediately follow-
explicit race conditions by supporting a ing each send statement in process A. In
ACM Computing Surveys, Vol. 21, No. 4, December 1989
598 l C. E. McDowell and D. P. Helmbold
powerful approach is to record an event any single process. This permits the
history containing all of the events gener- use of a sequential debugger on a pro-
ated by the program. The history can then cess without reexecuting the entire
be examined by the user after the program program.
has completed. Since the event history is
often very large, some debuggers provide Browsing requires only minimal infor-
facilities to browse or query the history. mation about each event. Simply recording
Event histories can also be used to guide the kinds of events executed by a process
the program’s execution, allowing the re- can help isolate an error. Of course, if more
production of erroneous computations. If information is recorded, then more infor-
the history is complete enough, a single mation will be available to the programmer.
process can be debugged in isolation with One problem with browsing event histo-
the history providing the needed commu- ries is that the histories frequently contain
nication. Finally, some systems can auto- enormous numbers of events, making it
matically check the history for suspicious difficult to locate the events of interest.
behavior or transform the lower level his- Some systems allow selective recording of
tory of events into more meaningful high- information, and others include powerful
level events. mechanisms for examining the event his-
tory (see Section 2.2).
2.1 Recording Event Histories To replay an execution requires enough
information so that the next event in which
A common approach is for the debugger to
do as little as possible, mainly recording each process participates can be deter-
mined. LeBlanc and Mellor-Crummey
information, at run time. By limiting the
debugger’s activity, the probe effect should [1987] describe a method that reduces the
be reduced. The recorded information can amount of information needed for replay
then be analyzed following the program’s compared with previous methods that re-
execution. corded the complete contents of all mes-
sages. Their ideas work because the
program generates the contents of the mes-
2.1.1 Which Information to Record
sages during the reexecution.
The amount of information that must be Simulating the rest of the program so
recorded for each event depends upon how that a single process can be debugged in
the event history is going to be used. Three isolation requires that all events visible to
general levels of use that require increasing the process be recorded. This includes both
amounts of detail to be recorded for each the contents of messages and the values
event are the following: written to shared memory. Note that reex-
ecuting a single process requires more in-
(1) Browsing-The event history is exam- formation than reexecuting the entire
ined possibly through the use of spe- system.
cialized tools. Examination methods If the interesting portion of the execution
range from text editors to “movies” can be identified, then the amount of infor-
showing the state changes caused by mation required for replay can be consid-
events [Hough and Cuny 1987; erably reduced. Instead of recording the
Le Blanc and Robbins 19851. entire history, the debugger can take a
(2) Replay-The debugger uses the event snapshot of the program’s state and keep
history to control a reexecution of the only that part of the history that follows
program. This permits the use of con- the snapshot. It may, however, be difficult
ventional debugging techniques, such to obtain accurate snapshots in distributed
as breakpoints, state examination, and systems efficiently (see Chandy and
single stepping, without changing the Lamport [1985] for one method). This tech-
behavior of the program. nique may work best for simulating a single
(3) Simulation-The event history can be process, since only that process’s state
used to simulate the environment of needs to be recorded.
trigger debugging actions; for example, this system is simplified by the restriction
“when El occurs, display counter C and that each specification can only refer to
stop process N.” Other potential actions events from a single process.
include starting and stopping traces The TSL system [Helmbold and Luck-
of memory locations and manipulating ham 1985a] automatically checks specifi-
timers. cations against the events generated by an
A similar approach is taken by the Ada tasking program. Each TSL specifica-
HARD system for Ada tasking programs tion is of the form “when this occurs then
[Di Maio et al. 19851. There the predicates that occurs before something else,” where
and debugging actions are encoded in spe- each of the three parts is an event formula.
cial Ada tasks called D tasks. Manually TSL contains placeholders allowing a sin-
inserted calls to the D tasks enable them gle specification to constrain multiple
to obtain information about the program’s tasks. Additional abstraction is gained by
execution. Based on this information, they using macros for event subformulas. An
can call routines that display or modify the important contribution of the TSL system
program state. All of the Ada facilities can is its use of Ada semantics to guarantee
be used inside of a D task, so the program- that, even in distributed systems, certain
mer can use a familiar high-level language pairs of events appear in the history in the
to control the debugging process. correct order.
Rather than using the stream of events The Event Description Language (EDL)
to control debugging activity, the following takes a slightly different approach [Bates
systems automatically check specifications and Wileden 19831. Instead of checking
for the program. Although this requires specifications against the event history, it
that the programmer learn an additional provides a method for defining multiple
language, it can complement a formal spec- levels of abstract events from the primitive
ification/verification approach to program events generated by the program. Each
development. Most of these systems have high-level event is defined by an event for-
their own way of specifying complex event mula over lower level events. There is one
formulas, usually based on the sequential clause that constrains the values associated
and parallel composition of events. with the lower level events and another that
The IDD system [Harter et al. 19851 uses determines the values associated with the
an interval logic specification of the pro- higher level event. The Belvedere system
gram. This specification is checked against uses EDL to help control its display (see
the program’s behavior. When a specifica- Section 3.3).
tion is violated, the program is stopped for All of these specification methods have
inspection. Temporal logic views the com- simplifying restrictions. In EDL, an accu-
putation as a sequence of states. The main rate global clock is assumed, the event
operators are “always” and “eventually,” recognizer is a potential bottleneck, and
meaning that the following predicate on the some ambiguity arises when a low-level
state is either always true or eventually event can be used in multiple higher level
becomes true. Interval logic adds expressi- events (see, however, Bates [1988]). The
bility by restricting the temporal operators TSL specification checker requires a lin-
to portions of the computation. early ordered stream of events and is also
The ECSP debugger [Baiardi et al. 19861 a bottleneck in the current application. In
can check behavior specifications that the IDD system, events are restricted to
completely describe the allowable commu- broadcasts on a shared medium (such as an
nication behavior of the processes. The Ethernet). A tree structure method for eval-
specifications can refer to various commu- uating the IDD interval logic expressions is
nication activity and can contain assertions briefly described. The ECSP assertion
on the process’s state. One of the constructs checker is for a hierarchical fork-join
in their language causes control to be re- method of parallelism. Its main disadvan-
turned to the user (presumably so that the tages are that each specification deals only
process can be examined). The ordering of with the activity of one process and all
events and checking of specifications in processes must be completely specified.
MRIN -...............................
TRSK 1 i? : : : . SC- . . . . I...(-)(-..I.( . . . .
TRSK2 . R. . . : : : : : : . SP. UJ.. . . . . . . . . . . . . . . . .
TRSK3 . . R.. SI . . . I. : : . . 1 -I(. . . . . -1. (. -I(-
TRSK4.. . R.. -SPUl.. . . . . . ::. . . . . . . . . . . . . . . . .
Display Filter
< <oo
Figure 6. The Idd time-process diagram. Used with permission [Harter 198510 1985 IEEE.
subsets of the messages that fall within the message. Figure 7 shows process histories
time-process space currently being dis- for three processes and the corresponding
played. Similar displays are included in concurrency map. The horizontal lines
PPUTT [Fowler et al. 19881 in which the (partially obscured by the boxes) corre-
emphasis is on the programmer noticing spond to logical time divisions. An event
irregularities in the patterns of communi- may have occurred during any time division
cation. touched by the box containing the event.
For many of the variations on time- Time-process displays appear to have a
process diagrams a global clock is required. definite place in viewing the activity of
At least one system [Stone 19881, however, parallel systems. They do have their limi-
uses a type of time-process display without tations. As the number of processes be-
needing a global clock. This display is called comes large, the display may become too
a concurrency map. Instead of displaying cluttered with information to be useful.
exactly when events occurred based on a This can be addressed to a degree with
global clock, events are arranged to show filtering and such features as the display
only the order in which they occurred. This and magnify options described above for
order is derived from the time dependencies Idd. In the next section, we present an
in the program. For example, the receipt of alternative display that gives a much dif-
a message must follow the sending of a ferent view of the system.
...
Compute
I
Process Histories
/ /+fc;I: Compute t
4.1 Dataflow Analysis of Parallel Programs (6) a process waiting for the completion of
another process that is guaranteed to
Probably the most frequently referenced have already completed, and
work on dataflow analysis of parallel pro-
grams is that of Taylor and Osterweil (7) a process that is scheduled to execute
[ 19801. Their algorithms generate four in parallel with itself.
data-usage sets for each node of a program In addition to not permitting recursion,
flow graph: gen, kill, live, and avail. These the algorithm for item (2) assumes the exis-
correspond to the sets by the same names tence of an algorithm for determining if
used in the global dataflow analysis of op- two statements can execute in parallel.
timizing compilers [Fosdick and Osterweil This is the subject of Section 4.2. Also, it
1976; Hecht and Ullman 19751. The origi- is recognized that it is impossible to “create
nal algorithms used to compute live and a fixed static procedure capable of con-
avail have been extended to pass data-usage structing the PAF of any program written
information across edges in the flow graph in a language which allows run-time deter-
corresponding to synchronization opera- mination of tasks to be scheduled and
tions. By reinterpreting the meaning of gen waited for.”
and kill, it is possible to use the modified The difficulty of using dataflow to ana-
data-usage sets to arrive at algorithms to lyze parallel programs is clearly shown in
detect anomalies in parallel programs. Callahan and Subhlok [1988]. They present
The algorithms in Taylor and Osterweil an algorithm for determining which data
[1980] assume a simple process synchroni- dependencies present in a sequential exe-
zation model. One process may cause an- cution of a program are preserved in a
other process to begin execution with the parallel execution of the program. They
statement “schedule X” and wait for the then show that determining if all data de-
completion of another process with “wait pendencies are maintained is Co-NP-hard
X”. Their model does not permit a process using only the information found in their
to execute (be scheduled) in parallel with verison of the PAF which they call the
itself. This would correspond to a recursive synchronized control flow graph. They also
process invocation. Recursion is also not present approximations that execute in
allowed within any single process. They polynomial time on programs written using
present algorithms based on their modified a simple programming model. Two notable
data-usage sets that operate on a represen- limitations of the model are that no syn-
tation of the program called a Process chronization operations are permitted
Augmented Flowgraph (PAF). This is con- within loops and all synchronization is
structed by taking the flowgraphs of the done with event variables that cannot be
individual processes and connecting them cleared.
with edges to indicate process synchroni-
zation constraints. For example, there 4.2 Parallel (i, j)
would be an edge connecting the “schedule
X” statement in one process with the initial A Boolean function parallel (i, j), which
statement in process X. Their algorithms returns true if it is possible for program
can detect the following: points “i” and “j ” to execute in parallel,
can be used to detect parallel access errors.
(1) a reference to an uninitialized variable, These occur when a variable is being read
and written in parallel or when two pro-
(2) a variable that is referenced while being
cesses can simultaneously write to the same
defined in parallel,
variable. If the program can be represented
(3) a definition of a variable that is never as a Petri net [Peterson 19771, then this
referenced, function can be implemented by examining
(4) a variable that may have an indeter- the reachable states for the net. Unfortu-
minate value, nately, the number of reachable states in a
(5) a process waiting for the completion of bounded Petri net grows exponentially with
an unscheduled process, the number of places (nodes) in the net.
the anomaly report but also the concur- block that is not executed with the given
rency states that led up to that state. A input. The analysis performed to identify
multiwindow user interface is provided that races can also be used to help with break-
displays an anomalous concurrency state point debugging. If the critical point in each
along with a description of the anomaly process involved in a race can be identified,
[McDowell 19881. The concurrency state is then a breakpoint can be placed just before
represented by displaying a small portion that point in each process. This will stop
of the source for each concurrent task in a the system in the state necessary to induce
separate window. The user may then dis- the race and permit close examination. In
play any previous or successor concurrency addition, by selectively continuing the
state to determine how the situation arose. processes, alternative race outcomes can be
This is somewhat like performing a coarse explored.
forward or backward simulation.
4.4 Static Analysis in the Development
4.3 Combining Static Analysis with Process
Dynamic Debugging
In addition to analyzing parallel programs
Taylor [1984] describes several ways in statically, and debugging them with run
which static analysis could be productively time debuggers and monitors, there is the
combined with dynamic analysis. One ap- possibility of eliminating the errors in pro-
proach would be to use the information grams before they occur. Here we would
from static analysis to help develop test like to present some current work that
data for use in conjunction with a dynamic seeks to aid in the development of parallel
debugger. Conversely, information from dy- programs that are free of the kinds of errors
namic monitoring could be used to guide outlined at the beginning of Section 4.
partial static analysis when complete static Automatic vectorizing compilers are the
analysis would generate too many states. If predecessors of the work presented here.
subparts of a program could be shown to be They represent a very restricted form of
free of errors using static analysis, then parallel programs that are free of parallel
those portions of the program would not bugs, assuming, of course, that the compi-
need monitoring. This could reduce the lers are correct. This work has been ex-
overhead associated with monitoring. If tended most notably by Banerjee et al.
run-time assertion testing is included in the [ 19791 to permit parallel execution of a wide
program, then the static analyzer could as- range of loops. Again, assuming that both
sume that the assertions are true, reducing the sequential programs and the compilers
the number of states that must be exam- are correct, the parallel programs that re-
ined. A related technique is the use of sym- sult will also be correct.
bolic execution to reduce the state space of In addition to the fully automatic tech-
a static analysis tool by eliminating infeas- niques of Banerjee et al., researchers at
ible paths [Young and Taylor 19861. Rice University are working on a system
A somewhat different combined use of called PTOOL [Allen et al. 19861. PTOOL
static and dynamic techniques is described performs interprocess dataflow analysis to
in Allen and Padua [ 19871, Miller and Choi determine when a loop can be parallelized.
[1988b], and Stone [1989]. Each of these It does this only for loops selected by the
systems applies static analysis to a dynam- programmer. It interacts with the program-
ically generated trace in order to identify mer for three reasons. First, the amount of
parallel access anomalies that they call time to perform the analysis for all loops
races. If a particular trace can be shown to and all combinations of loops is prohibitive.
be free of races, then the program is free of It is assumed that the programmer under-
races for the given input. This does not stands the overall structure of the program
mean that the program is free of races in and knows which sections are most suitable
general. For example, a race condition for parallelization. Second, the programmer
could be present in a conditionally executed can make a judgment about the typical
values of certain variables at run time that a major problem with the animation dia-
affect the decision of whether to parallelize grams is the placement of the symbols rep-
a particular loop. This is particularly im- resenting the processes. Hough and Cuny
portant when the overhead for parallel ex- [1987] make it clear that proper placement
ecution is relatively high. Finally, by can be very important in comprehension of
interacting with the programmer, PTOOL the display (see Figure 8).
can provide information that might permit In addition to the problem of placement
the programmer to change the program is the problem of too much information-
slightly, thereby allowing an important even for a picture. A possible solution is a
loop to be parallelized. In a fully automatic language for abstracting low-level events
system the compiler would have to reject into higher level events for display. Event
the loop as a candidate, possibly missing description languages can also be used
an important opportunity for parallel to filter out irrelevant events, reducing
speedup. the amount of information that must be
By using automatic or semiautomatic displayed.
techniques based on correctness preserving One prominent feature of several systems
transformations, it is possible to debug a is modularity [Joyce et al. 1987; Victor
sequential version of a program using con- 19771. By carefully designing a modular
ventional debugging tools and then trans- system, the addition or modification of var-
form it into an equivalent parallel version. ious features can be managed easily. An
event-based system might have modules for
low-level event monitoring, filtering, and
5. CONCLUSION
recording of events (from the low-level
Having completed the survey, the question event modules), display of recorded events,
remains, What progress has been made and analysis of recorded events, and controlled
where is more work needed? Because of the reexecution of the program. In traditional
diversity of applications, languages, and parallel debuggers as described in Section
systems, no single approach can satisfy all 1, there may be separate modules for inter-
parallel debugging needs. The following acting with the low-level machine and for
paragraphs summarize what has been interacting with the user. “Plug compati-
achieved and speculate on possible research ble” modules are advantageous because
directions. they allow experimentation with different
The deficiencies of both static and dy- debugger functions.
namic techniques have been discussed in A well-defined interface (or hierarchy of
this paper. One promising approach that interfaces) between the user and the low-
alleviates some of these deficiencies is the level machine isolates most components
creation of a toolkit that integrates both from changes in any one part of the system.
approaches (see Section 4.3). The user modules become machine inde-
As the saying goes, “A picture is worth a pendent; the low-level machine modules
thousand words.” With program activities become user interface and language
distributed across both space and time, independent; and possibly some user in-
simple sequential displays of program ac- terface modules may become language
tivity are inadequate. The time-process dia- independent.
grams (see Section 3.2) give a compact view The probe effect is possibly the most
of the event history, whereas the animation significant difference between debugging
diagrams (see Section 3.3) give a more parallel programs and sequential programs.
detailed view of a single instant in time. The most obvious solution to the problem
Both representations are valuable, and the of the probe effect is to have the probes
use of multiwindow workstations makes it permanently in place. This does not help
possible to have both. with breakpoint debugging, but it solves
The appropriateness of each of these dia- the problem for event-based debugging
grams needs further research. For example, using monitoring and event histories. The
’ DISDEB uses additional hardware to eavesdrop on the network traffic. This allows the DISDEB debuggers
to run transparently, without disturbing the program’s timing.
(continued)
’ DISDEB allows complex events to be built out of very low-level machine code like events using a low-level
language. For example, the language can only refer to physical addresses rather than using identifiers from the
source code.
’ Filtering is done by transactions. The nested function calls can be hidden, giving a clearer picture of the high-
level activity (see Section 3.1).
(continued)