Understanding Complex, Real-World Systems Through Asynchronous
Understanding Complex, Real-World Systems Through Asynchronous
www.elsevier.com/locate/jss
Abstract
Traditionally, the underlying decision-making algorithms for most real-world systems have been centralized. The term, real-
world, refers to systems under computer control that relate to everyday life, are bene®cial to the society in the large, and are
generally large-scale in scope. Examples include AT&T's dynamic non-hierarchical routing (DNHR) for routing telephone calls, the
North American advanced train control system (ATCS) for routing railways, the Swiss banking system (SIC), and inventory
management algorithms. While centralized algorithms are simple, easy to conceive and implement, they execute sequentially on
uniprocessors and are slow. In addition, by their very nature, centralized algorithms are highly susceptible to natural and arti®cial
disasters. Synchronous distributed algorithms constitute a performance improvement over centralized algorithms, and have been
used in fault simulation within the discipline of computer-aided design of digital systems and in matrix manipulations. However,
their performance is limited due to frequent inherent synchronizations. This paper critically examines the nature of large-scale, real-
world systems and observes that, fundamentally, most complex systems are composed of entities ± concurrent, independent, and
self-contained units of decision-making, that interact with each other, asynchronously. This paper presents a new class of algorithms ±
asynchronous, distributed, decision-making (ADDM) algorithms, to constitute the underlying control of such systems. While
ADDM algorithms are closely related to autonomous decentralized systems (ADS) in the principal elements, their characteristics
and boundaries are de®ned rigorously. While ADDM algorithms are dicult to conceive, design, and implement, they constitute the
natural and logical choice for systems control, and hold the promise of extracting the maximal parallelism inherent in these systems.
In addition, in principle, true asynchronous systems can be described accurately only by asynchronous, distributed algorithms, never
by synchronous distributed algorithms. This paper reasons the nature of most complex real-world systems from ®rst principles and
reasons for its increasing importance in the design of future, large-scale, systems. It then presents the underlying principle of ADDM
algorithms, details their fundamental characteristics, enumerates a number of successful ADDM algorithms for problems from
dierent disciplines, and brie¯y reviews the nature of three of them ± (1) real-time, domestic payments processing system, (2) dis-
tributed scheduling in railway networks, and (3) distributed routing in ATM networks. Ó 2001 Elsevier Science Inc. All rights
reserved.
Keywords: Systems; Physical processes; Asynchronous interactions; Distributed algorithms; Real-world systems
our desire to ®nd a simple parallel approach that applies the number of sub-systems is large and the problem does
to all problems, quickly and uniformly. not intrinsically require all sub-systems to synchronize
According to the literature, the earliest underlying periodically, the use of the synchronous algorithm is
algorithms utilized to model and control physical sys- neither natural nor logical. For, the single scheduler
tems were centralized in nature. In this paradigm, data node will require signi®cant time to complete its com-
from one or more sources are collected at a central site munication, corresponding to a speci®c iteration, with
and a single processor utilizes it to compute the system- every sub-system, one at a time, before it can proceed to
wide decisions through sequential execution. Centralized the next iteration cycle. As a result, the sub-systems will
decision-making algorithms have been used to model be forced to slow down unnecessarily. Fundamentally,
battle®elds, schedule trains in a railway network, per- however, the use of the synchronous paradigm to model
form inventory management, realize highway manage- a physical or natural process that is inherently asyn-
ment, and for distributed federated database chronous, implies inaccuracy in modeling. The eorts
management. While the use of the sequential, central- reported in Tron et al. (1993), Ronngren et al. (1996),
ized scheduler, inherent in the centralized algorithms, to Gupta and Kumar (1993a,b), Brehm et al. (1995), Ku-
model the asynchronous, distributed physical system is shwaha (1993), Braddock et al. (1992), Lecuivre and
unnatural, there are additional diculties. With in- Song (1995), Kumar et al. (1994), Arrouye (1996),
creasing system complexity, the computational burden Clement and Quinn (1994), Kremien (1995), Manwaring
on the central processor continues to increase, eventu- et al. (1994), Celenk (1994), Westphal and Popovic
ally leading to lower throughput and poor eciency. (1994), Ertel (1994), Barr and Hickman (1992), Wieland
For many systems such as real-time international pay- et al. (1992), Yan and Listgarten (1993), Lalgudi et al.
ments processing, a centralized algorithm may not even (1994), Capon (1992), Culler et al. (1996), Bilardi et al.
be realizable. Centralized algorithms are also highly (1996) are generally applicable to data-parallel, i.e.
vulnerable to catastrophic failures. SIMD, and synchronous-iterative distributed programs
In contrast, distributed algorithms, in general, (Tsitsiklis and Stamoulis, 1995), both of which have
promise higher throughput and eciency through limited range of practical application in the real world.
sharing the overall computational task among multiple, In addition, all of the above eorts are either based on
concurrent processors. Markas et al. (1990) report a pure theoretical assumptions or limited to small-scale
distributed implementation of fault simulation of digital implementations and they fail to address the unique
systems and notes throughput improvements over the characteristics of large-scale, asynchronous, distributed
traditional, centralized approach. While distributed and systems.
parallel algorithms are generally viewed as synonyms, The literature on synthesizing distributed decision-
Tel (1990) reveals a subtle and accurate distinction be- making algorithms and evaluating the quality of dis-
tween them. Parallel algorithms are used to perform a tributed decision-making, is sparse. Asynchronous,
computational task and the parallelism contributes to a distributed systems feature dispersed processes and data,
faster completion time, especially where the computa- where the processes operate concurrently, and, thus the
tional task is prohibitively large for a sequential pro- underlying algorithms must necessarily be both distrib-
cessor. Parallel algorithms do not capture the basic uted and parallel. Rotithor (1991) proposes to distribute
de®nition of parallelism, i.e. events occurring simulta- the overall tasks ± system state estimation and decision
neously. In contrast, distributed algorithms are designed making, of a distributed decision-making system among
for tasks wherein the processes and the data are physi- independent decision-making processing elements and
cally dispersed and they are meaningless within the attempt to improve the overall performance of the sys-
context of a single sequential process. The distributed tem by optimizing the performance of the individual
nature is an inherent and intrinsic property of a dis- decision makers. Rotithor models the problem of load
tributed algorithm. balancing in distributed computing and notes substan-
Of the two major forms of distributed algorithms, the tial performance improvements. Capon (1992) accu-
synchronous style been most widely used in practice and rately notes that understanding the behavior of
reported in the literature. The synchronous distributed asynchronous parallel programs is extremely dicult
approach is characterized by the presence of a single, and is chie¯y responsible for the limited parallelism re-
control processor that schedules the executions of all of ported in the literature.
the remaining processors. The presence of the sequen- Tsitsiklis and Athans (1984) considered the distrib-
tial, control node, also termed scheduler, theoretically uted team decision problem wherein dierent agents
limits the performance advantage of the synchronous obtain dierent stochastic measures related to an un-
approach. As the number of processors are increased, certain random vector from the environment and at-
the synchronization requirement imposed by the tempt to converge, asymptotically, on a common
scheduler will eectively counteract the potential ad- decision, and they had derived the conditions for con-
vantages of the multiple concurrent processors. Where vergence. Tsitsiklis and Athans (1985) studied the
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 155
complexity of basic decentralized decision problems that slave interactions. A sub-system does not function under
are variations of ``team decision problems'' and con- instructions from any other sub-system and does not
cluded that optimality may be an elusive goal. Bertsekas itself issue any such instructions. All information is
and Tsitsiklis (1988) had de®ned distributed asynchro- transmitted by the generating point to all adjoining sub-
nous algorithms and utilized them to execute iterative systems and each sub-system, in turn, detects and picks
problems of the type, x f
x. In their de®nition, the up a message targeted for it. The basic approach applied
iterative problem is decomposed and assigned to a to both control systems such as trac and plant control
number of processors and, during execution, each pro- and information systems such as telephone and banking
cessor performs the computations asynchronously, uti- applications. While partial loss of data is permitted in
lizing the old outdated values in the event that not all control applications, no data loss is tolerated in infor-
messages have been received from the previous iteration. mation systems. Ishida (1997) states two key charac-
While it is unclear that such iterative problems are teristics of ADS. First, ADS consists of autonomous
naturally asynchronous, distributed systems, a diculty agents on which most of the decisions of the ADS ride.
with the proposed de®nition is that aims to compensate Second, the structure of ADS being dynamic, it may not
for the lack of accurate input data through asynchro- be either designed or speci®ed a priori. The diculty
nicity. The existence of asynchronicity between two or with these characterizations is that they leave the be-
more units imparts to them the characteristics of inde- havior of the agents unspeci®ed, unknown, and poten-
pendence and concurrency, but cannot intrinsically tially arbitrary, which raises concerns of safety and
compensate for erroneous input data. Thus, the occur- reliability. Kremien et al. (1990) proposes to use scal-
rence of the convergence problem is not unexpected. able, decentralized state-feedback algorithms for dis-
Tsitsiklis and Stamoulis (1995), proposed another de®- tributed systems resource management. They note that
nition of asynchronous, distributed algorithms. The while formal analysis of correctness of such algorithms
simplest form for an asynchronous, distributed algo- are dicult, a distributed simulation supported trial and
rithm, according to them, is one where each processor error approach is a superior alternative to prototype
stores in memory a local variable while estimates of the implementation. In addition to yielding scalability and
value of the local variable are maintained by each of its performance, the execution of a decentralized algorithm
neighboring processors. The local variable is updated on a distributed system allows a user to use the same
occasionally, based on some function of its previous algorithm that will be used by an implementation using
value and other estimates and this new value is sent to all the primitives required for expressing distributed al-
the neighbors that, in turn, update their estimates. This gorithms.
characterization is also limited since in any distributed Upon careful examination, neither of the following
system, no processor can ever hope to know immedi- books on distributed systems (Coulouris and Dollimore,
ately and precisely the exact state of a dierent processor 1988; Cormen et al., 1990; Leighton, 1992) address the
due to latency. Furthermore, this de®nition of asyn- topic of ``asynchronous distributed algorithms'' while
chronous, distributed algorithms is dicult to generalize the recent book Lynch (1996) allocates a single para-
since there is more than one example of an asynchro- graph to the topic. In de®ning timed asynchronous
nous, distributed system where a processor is not con- systems as a model for current distributed systems such
cerned with estimating the variable values of its as a network of workstations, Christian and Fetzer
neighbors. In summary, the problems studied in Tsi- (1999) require all services to be timed. That is, any
tsiklis and Athans (1984), Tsitsiklis and Athans (1985), message sent by a correct process to a correct destina-
Bertsekas and Tsitsiklis (1988), Tsitsiklis and Stamoulis tion is received at the destination within a known
(1995) re¯ect limited applications and the proposed amount of time. They also argue that measurements on
de®nitions are unable to capture the general behavior of actual message and processing delays and hardware
asynchronous, distributed systems. A general asyn- clock drifts con®rm their theory. There are two di-
chronous, distributed system is one where each entity culties with this de®nition. First, the key requirement
derives its own unique decisions, asynchronously and that a message propagation be completed within a
independent of the decisions of other entities, with the known time interval, poses a fundamental contradiction
goal of obtaining a solution to the overall problem. with the de®nition of any asynchronous system (Kohavi,
Ihara and Mori (1984) were the ®rst to introduce the 1978). The only guarantee in any asynchronous system
notion of autonomous, decentralized computer control is that no constituent entity can a priori state with cer-
systems (ADS) in 1984 and described an implementation tainty the exact time interval necessary for the comple-
for train control in Japan. A key characteristic of their tion of any message communication between any two
approach is the absence of hierarchy, integrated control, entities. Of course, message communications may be
and a centralized controller. Instead, all sub-systems are ensured by utilizing special mechanisms such as hand-
homogeneous, uniform, intelligent, equivalent in ca- shaking (Kohavi, 1978). Second, today's measurements
pacity and performance, and are free from any master- of message and processing delays and clock drifts, which
156 S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167
may lie in the range ± millisecond±microsecond, are formation to other remote sub-systems, where neces-
likely to be grossly superseded by future systems which sary. Thus, the decision for the entire system is re¯ected
may exhibit a range of nanosecond±femtosecond. Thus, by the local decisions at all of the sub-systems and the
Christian and Fetzer's timed asynchronous distributed local decisions, in turn, are derived utilizing appropriate
model that relies on such ¯eeting parameters raises information obtained from a wide geographical area.
concerns from the scienti®c perspective. For many real-world problems, from dierent disci-
This paper argues that the real world is highly com- plines, the formulation as an asynchronous, distributed
plex and that there is a fundamental, micro-cosmic decision-making system, is natural.
design principle which extends from the astronomical- Although asynchronous, distributed systems exhibit a
sized galaxies down to the minutest atom, in the uni- number of unique characteristics, which will be exam-
verse. At any level of abstraction, the sub-systems of a ined subsequently, a simple yet powerful distinguishing
system inherently possess independence, autonomy, feature is that its sub-systems interact with each other
their own sense of timing, and where they interact with and with the external world, asynchronously. That is,
each other, the nature of the interaction is asynchro- each sub-system maintains its own unique sense of
nous. The independence of each sub-system with respect timing control, manifested through a clock perhaps, that
to all others, poses no paradox. It refers to the fact that is unique and independent of each other and of the ex-
the existence of a sub-system is independent of all oth- ternal world. A simple, every day example is the man-
ers. While there may be data and information depen- made smart trac light controller 1 and the automobiles
dency between two or more sub-systems of a system, passing through it. The controller sits at an intersection,
each sub-system possesses its own processing or deci- driven by its own clock, and has absolutely no a priori
sion-making engine. The interactions enable the sub- knowledge of when an automobile will land on one of its
systems to learn, grow, and evolve. This paper presents magnetic loop sensors on the pavement and cause it to
asynchronous, distributed, decision-making algorithms be activated. On the other hand, the driver of the au-
(ADDM) as an encapsulation of the micro-cosmic de- tomobile has no a priori knowledge of the phase of the
sign principle. When an ADDM algorithm is success- clock of the controller, at any given instant of time.
fully designed for a given real-world problem, it re¯ects Each unit is independent, autonomous, concurrent, and
the high, meta-level purpose or intent of the system and interacts with the other unit asynchronously, i.e. at ir-
it underlies the behavior of every sub-system of the regular intervals of time that may not be predicted
problem. Fundamentally, the goal of ADDM algo- ahead of time with an exact precision.
rithms is to encapsulate such intent for real-world Consider radioactive decay, a natural process, and
problems, scienti®cally. This paper presents the concept assume that we have a system consisting of two separate
of ADDM algorithms as a branch of knowledge within lumps, L1 and L2, of an identical radioactive material.
the discipline of distributed systems and claims that Radioactivity is spontaneous, which is de®ned as un-
ADDM algorithms constitute the path to large-scale premeditated, triggered without any apparent external
systems design in the future. Key dierences between cause, and self-generated. Thus, the occurrences of the
Ihara and Mori's approach and ADDM algorithms are a particle emission from L1 and L2 will be asynchro-
that in the latter, the sub-systems are heterogeneous, nous, i.e. they occur irregularly in time. What is even
information is exchanged between speci®c sub-systems more fascinating is that if L1 is further subdivided into
as warranted by the original problem, control is integral two separate lumps ± L11 and L12, the emission of
though distributed, and, unless speci®c exceptions are particles from L11 and L12 will be asynchronous with
incorporated, loss of data is generally not tolerated. respect to each other. Other examples of asynchronous
natural processes include the ice ages that occur irreg-
ularly in time, the earth's rotation that is not exactly
2. Principles and fundamental characteristics of ADDM periodic, the reversal of the earth's magnetic polarity
algorithms that has occurred irregularly in time, and the unpre-
dictable, sudden, and apparently unprovoked attack of
2.1. Principles a living body by a virus that has otherwise been dor-
mant within for a very long time. Examples of man-
Most physical and natural systems may be classi®ed made physical processes that are asynchronous and
as asynchronous, distributed decision-making systems, distributed include the ®eld of digital design, today's
wherein, despite signi®cant and unique diversity in the battle®eld, computer networks, banking system, retail
system details, the geographically dispersed sub-systems industry, etc.
acquire local data, receive information from other sub-
systems asynchronously, where necessary, and compute 1
We are not referring to the simplistic trac light controllers that
and execute decisions independently and concurrently. merely change the lights at a predetermined interval, regardless of the
Following execution, the sub-system may transmit in- trac.
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 157
The arguments in favor of the asynchronous nature systems in the computer for simulation, debugging, and
of the interactions between the sub-systems of a complex performance estimation or in actual system implemen-
system, are two-fold. First, careful analysis of physical tations. ADDM algorithms constitute innovative solu-
and natural processes reveal that for every given system, tions and hold the potential of exploiting the maximal
the constituent sub-systems possess independence, au- parallelism inherent in the problem. Furthermore, local
tonomy, their own sense of timing, and where they in- computations are maximized while communications
teract with each other, the interaction is asynchronous. between the entities are minimized, implying high
The case of a system with no sub-systems is conceivable throughput, robustness, and scalability. Unlike the data-
but uninteresting. This phenomenon appears to be parallel and synchronous-iterative algorithms, there is
manifest among the galaxies of the universe, among the no forced synchronization in asynchronous algorithms.
star systems (solar systems) of each galaxy, among the The key elements of an ADDM algorithm consist of the
planets of every star system, and down to the individual entities of the system, the exact behavior of the entities,
molecules of a gas enclosed in a given container. One the asynchronous though precise interactions between
possible reason may lie in the idea that, at any given the entities, and the overall task of intelligently distrib-
level of abstraction, every sub-system embodies an en- uting the computational task among the entities. The
tity, where an entity re¯ects a self-contained unit, that is nature of ADDM algorithms is re¯ected in the funda-
independent, autonomous, and concurrent relative to mental characteristics.
other units at that level of abstraction. Conceivably, the ADDM algorithms may be dicult to understand,
notion of entity constitutes the most fundamental, mi- even more dicult to implement and test on parallel
cro-cosmic design principle of this universe. processors, and most dicult to conceive and synthe-
Second, it is known from physics, independently, that size. They require a total and comprehensive grasp of
this universe is characterized by two key properties ± the the entire system, with all of its innumerable but ®nite
intervening space (or distance) and the ®nite propaga- possibilities, at once. Ironically, these diculties eclipse
tion of electromagnetic radiation. Thus, the propagation the fact that ADDM algorithms hold the promise of
of any information between any two points in space delivering the maximal parallelism, inherent in a system,
requires a ®nite interval. To understand how these and, therefore, the highest eciency. ADDM algorithms
properties relate to asynchronicity and the signi®cance are ideally suited to describe most natural, i.e. occurring
of asynchronicity in the universe, consider one of the in nature, and physical processes, including man-made
fundamental conservations laws of physics ± the law of ones, that are generally distributed and asynchronous
conservation of charge. The law, due to Faraday, orig- systems. ADDM algorithms hold enormous potential
inally stated that the net charge in the universe is con- and, thus far, they have been successfully utilized to
served. Consider that a non-local system is devised, one model problems from the disciplines of broadband-
where the distances between two extreme points may be ISDN networks, computer aided design of digital sys-
non-trivial. For Faraday's law of conservation of charge tems, railway scheduling, intelligent vehicle highway
to hold true in this system, if a charge at point A dis- system, banking, and military command and control.
appears, an identical charge must reappear at another For instance, a maximum of 45,000 asynchronous, in-
point B. Feynman (1990) argues that one cannot know telligent, independent, concurrent processes have been
precisely whether a charge reappears at B in the system executed successfully and synergistically on a total of 65
at the very instant it disappears at A. Simultaneity of workstations con®gured as a loosely-coupled parallel
events at two spatial points, according to relativistic processor, to solve a single problem (Utamaphethai and
physics, is subjective. This forces the law of conservation Ghosh, 1996) correctly and accurately, under algorith-
of charge to be altered to apply only locally. Conse- mic control.
quently, the two properties of the universe, described
earlier, preclude the occurrences at A and B from being 2.2. Fundamental characteristics of ADDM algorithms
synchronous. Thus, the universal properties of spatial
distance and ®nite propagation of electromagnetic ra- The character of ADDM algorithms is re¯ected
diation corroborate the micro-cosmic design principle to through the eight fundamental characteristics ± the
imply asynchronous interaction as the general mecha- de®nition of entities, the asynchronous behavior of the
nism. entities, concurrent execution, asynchronous interac-
ADDM algorithms constitute the accurate, underly- tions between the entities, proof of correctness of the
ing control of asynchronous, distributed systems. The algorithm including consistency, robustness, perfor-
overall decision-making or control is distributed among mance, and stability.
all of the sub-systems. ADDM algorithms represent the
highest level architecture and the meta intelligence un- 2.2.1. Entities in ADDM algorithms
derlying the design of ADDM systems. They are a Entities constitute the basic decision-making units of
natural choice and may ®nd use in either modeling the ADDM algorithm and de®ne the resolution of the
158 S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167
decision behavior of the asynchronous, distributed, 2.2.2. Asynchronous nature of the entities
system. Entities must be natural, i.e. they must corre- In most real-world systems, the nature of the sub-
spond to the actual elements of the system. Entities are systems are asynchronous. Thus, the behavior of one
self-contained, independent, and asynchronous. The entity is independent of others and, unless an entity, A,
choice of the granularity of the entities is a function of propagates its state explicitly to other entities, no one
the desired resolution of the decision-making, the degree has knowledge of the exact state of A. Even where two
of concurrency, and the complexity of the communica- or more entities relate to the same generic unit, i.e. they
tion between the entities. In principle, a natural or share a common description, during execution, each
physical system may possess a natural hierarchical or- entity would assume its own unique state, re¯ected by
ganization, although, for simplicity, many real-world the data value of the variables, etc. In the event that a
systems may be expressed as a collection of entities at a real-world system is designed to operate synchronously,
single level. Examples of systems organized hierarchi- the asynchronous interaction model is a general princi-
cally include digital systems and international payments ple and is ideally suited to represent both synchronous
processing. and asynchronous interactions in systems.
The concept of entity is fundamental in this universe Consider that a system is organized hierarchically,
and also to our understanding of the universe. Although with E1 and E2 as entities at a given level with {E11,
the universe and all knowledge about it may be, con- E12, . . .} and {E21, E22, . . .} constituting the lower level
ceivably, one continuous thread to the creator, our un- entities of E1 and E2, respectively. Under these cir-
derstanding is that the universe, at any level of cumstances, entities E11, E12; . . . and E21; E22; . . . all
abstraction, consists of entities. The word entity is de- interact asynchronously.
®ned as: ``The fact of existence, being. Something that Time constitutes an important component in the be-
exists independently, not relative to other things''. Thus, havior of the entities and their interactions. Although
an entity exists and its existence is guaranteed indepen- every entity, by virtue of its independent nature, may
dent of all other entities. Because it exists, an entity must possess its own unique notion of time, when a number of
be self-contained, i.e. its behavior, under every possible entities E1 , E2 ; . . . choose to interact with each other,
scenario, is completely de®ned within itself. Because the they must necessarily share a common notion of time,
entity exists independent of all other things, its behavior termed universal time, that enables meaningful interac-
is known only to itself. Unless the entity shares its be- tion. The universal time is derived from the lowest
havior with a dierent entity, no one has knowledge of common denominator of the dierent notions of time
its unique behavior. Conceivably, an entity may interact and re¯ects the ®nest resolution of time among all of the
with other entities. Under such conditions, its behavior interacting entities. However, the asynchronicity mani-
must include the scope and nature of the interactions fests as follows. Where entities A and B interact, be-
between itself and other entities. The notion of entity is tween their successive interactions, each of A and B
deeply rooted in our philosophy and culture. proceed independently and asynchronously. That is, for
For example, in the discipline of digital systems, A, the rate of progress is irregular and uncoordinated
hardware is organized through entities, at any level of and re¯ects lack of precise knowledge of that of B and
abstraction. Examples include an AND gate at the gate- vice-versa. At the points of synchronization, however,
level, an ALU at the register-transfer level, and the in- the time values of A and B must be identical. Where
struction decode unit at the architecture-level. For each entities X and Y never interact, their progress with time
entity, its behavior must include all that is relevant at is absolutely independent and uncoordinated with one
that level of abstraction, such as the logic, timing, con- another and the concept of universal time is irrelevant.
trol, exceptions, etc. Since every entity is independent of At a given level of abstraction, although each entity
others and no one entity has explicit knowledge of an- may have its own notion of time, both entities A and B,
other entity's behavior, conceivably, each entity may must understand the universal time. Otherwise, A and B
possess its own notion of timing including clocks, timing will fail to interact. Consider the following scenario.
constraints, delays, etc. Philosophically, asynchrony GG1 and GG2 are two great gods of time. While the
appears to be a manifestation of the concept of entity. length of the interval of GG1's opening and closing of
However, one may argue that asynchrony is the more the eye is 1 million years of our time, that for GG2 is 1
basic of the two and the point is therefore debatable. ns. Thus, the resolutions of the messages emanating
Although we may desire the behavior of every entity from GG1 and GG2 are 1 million years and 1 ns, re-
to be self-consistent, it does not necessarily follow from spectively. Clearly, during the million years that GG1
the concept of entity. However, one may argue that the has the eye closed, all messages from GG2 to GG1, sent
issue of existence, independent of time, provides the at intervals as low as 1 ns, will be ignored by GG1. In
motivation for self-consistent behavior of the entities. contrast, assuming that GG2 has a ®nite life span, when
We will assume that in hardware systems, every entity is GG1 opens the eyes and sends a message to GG2, GG2
characterized by a self-consistent behavior. has long been dead. Even if GG2 is not dead, there is no
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 159
output O~1 is generated. Where the output is identical to 2.2.4. Communication between the entities
its previous value, the event driven simulation considers Although, in theory, entities may choose to interact
the case to be insigni®cant since there is no change at the with others or remain completely aloof, in reality, most
~1 is dierent from
output of the entity. In the case that O real-world systems are characterized by some interaction
its previous value, the event driven simulation considers between the entities. Conceivably, an entity may not
the case signi®cant and the new value is propagated to possess total and accurate knowledge, of everything it
other entities that are connected to the entity in ques- may need for its continued functioning, at every instant
tion. Thus, in event driven simulation, only changes in of time, due to the universal properties of intervening
the logical value of input and output ports of the entities space and the ®nite speed of propagation of electro-
are important. magnetic radiation. Therefore, the sharing of data and
In Fig. 2, assume that the circuit is powered, the knowledge through interaction with other entities, may
circuit is stable, and that the starting point of the cir- constitute an important and integral character of the
cuit operation is considered as time 0. Clearly, new system. We argue that the case where the entities are
signals are asserted at the input ports of gate G1 and it completely aloof, is uninteresting.
must be executed. In contrast, gate G2 does not have a In ADDM algorithms, entities are assumed to in-
new signal asserted at its input at time 0, so it need not teract with one another. The nature of the interaction
be executed. Thus, at time 0, only gate G1 needs to be may assume dierent forms. First, each set of entities
executed and the potential concurrency is 1. Consider that interact between themselves, are identi®ed and they
that gate G1 generates a new signal value at time 10 re¯ect the corresponding real-world system, exactly. In
which then is asserted at the input of gate G2. Also an extreme case, any entity may interact with any other
assume that a new signal value is asserted at the pri- entity. Second, the necessary data and information
mary input ports of gate G1 at time 10. There is no must be shared between the interacting entities and
activity between time 0 and time 10, so none of the appropriate message structures must be developed.
gates need to be executed and concurrency is 0. At time Third, the information may be shared on a need to
10, however, both G1 and G2 must be executed, im- know basis, to ensure privacy and precision. Fourth, all
plying a potential concurrency of 2. This describes the message communication is assumed to be guaranteed.
®rst source of concurrency. That is, if the host com- That is, once a sender propagates a message, it is
puter system had two separate processors available, guaranteed to be received by the receiver(s). In princi-
they could simultaneously execute G1 and G2 and ple, when an entity possesses an input port, it expects to
accelerate the overall execution task. It is pointed out receive input stimulus through it. The input stimulus
that despite G2's dependence on G1 for signal values, arrives from the external world or another entity, al-
both G1 and G2 may be executed simultaneously at though the entity may not possess exact knowledge.
time 10. An automobile assembly line manufacturing Upon receiving the stimulus, the entity then utilizes it
scenario may serve as an excellent analogy. Consider to compute its own response. A missing stimulus is
that worker W1 attaches a car body onto a chassis and viewed as a system failure, unless the ADDM algo-
upon completion, the conveyor belt forwards it to the rithm is specially designed for such exceptions. For an
next station where worker W2 inserts the headlamps output port of an entity, failure to successfully propa-
onto the body and chassis. While W2 is working on gate an output response, also corresponds to system
the headlamps, W1 is working to attach another body failure.
onto the subsequent chassis. Clearly, both workers W1 Clearly, the sharing of data and information among
and W2 are working simultaneously, despite W2's entities implies a communication network that inter-
dependence on W1 for a body attached to a chassis, connects the entities and is an integral component of the
and as a result, the overall rate of auto production is ADDM algorithm. The topology of the network is de-
high. termined by the nature of the interactions.
In Fig. 2, now consider both the upper and lower
circuits. At time 0, gates G1 and G3 both receive new 2.2.5. Proof of correctness of ADDM algorithms
signals at their input ports. They do not depend on one Flynn (1996) notes that while the mathematical rep-
another for signal values. Therefore, they may be exe- resentations constitute the basis for our representation
cuted simultaneously, yielding a potential concurrency of reality, they were invented to facilitate the sequential
of 2 at time 0. Thus, if a host computer makes two reasoning by the human mind. Thus, by their very na-
distinct processors available, G1 and G3 may be exe- ture, ADDM algorithms, involving hundreds of auton-
cuted simultaneously, thereby speeding up the execu- omous entities executing simultaneously and
tion. This describes the second source of concurrency. asynchronously with no centralized agent responsible
This scenario is analogous to two automobiles traveling for controlling their activities, are dicult for the se-
simultaneously from point A to point B along two quential human mind to comprehend. To ensure accu-
parallel lanes of a highway. racy, correctness, and safety of a real-world system
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 161
under ADDM algorithm control, it is therefore crucial entity. One may reason that where an overall goal is to
to develop a proof of correctness. achieved, the system's decisions, synthesized by the
Fundamentally, the proof of correctness must guar- centralized decision-maker, will move in a coordinated
antee that the system operates correctly. Where the manner to approach that goal. If one were to argue,
purpose of the ADDM algorithm is to simulate a nat- based on this reasoning, that these decisions are ``cor-
ural process, correctness is synonymous to the simula- rect'', a counter-argument may be made as follows.
tion accurately re¯ecting reality. Where the ADDM First, the data obtained from the dierent geographical
algorithm constitutes an implementation of a real-world sites are subject to latency. Second, the collection of all
system, correctness implies the lack of any inconsis- of the data at a central unit is time consuming, especially
tency, i.e. there should be no violation of any basic if the number of sites is large. Third, the propagation of
premise. The requirement is especially important since the decisions from the central decision-maker to the
each decision-making entity utilizes a subset though a individual entities at dierent sites is again subject to
relevant fraction of the system-wide information and the latency.
data from other entities is subject to latency. Further-
more, the execution of the ADDM algorithm must en- 2.2.6. Robustness of ADDM algorithms
sure that the system progresses towards its unique A natural outcome of utilizing an ADDM algorithm
objective. For the system to operate correctly, while the is a robust system which, unlike a centralized system, is
execution of each entity and the interactions between the much less susceptible to natural and arti®cial disasters.
entities must be correct, the set of all individual deci- Each geographically dispersed entity is a decision-mak-
sions generated by the entities must be consistent and ing unit and the likelihood of a disaster aecting every
imply the progress of the system towards the ®nal so- entity, is remote. Thus, even if one or more of the
lution. asynchronous entities fail, the remaining entities are not
System characteristics may also impose special re- likely to be completely aected and the system is likely
quirements on the proof of correctness. Where a system to continue to function, in a degraded mode. For an
is expected to converge to a solution following the exe- ADDM algorithm to operate under partial failures,
cution of a ®nite number of steps, the proof of cor- exception handling must be incorporated into the design
rectness must include a proof of termination of the of the entities and their asynchronous interactions.
ADDM algorithm. However, the proof of termination is
irrelevant for systems that, by design, must run contin- 2.2.7. Performance of ADDM algorithms
uously. The use of multiple processors, executing concur-
Deadlock manifests in the form of two or more in- rently under any ADDM algorithm implies superior
dependent processes, each of which is waiting for the performance. The degree to which a speci®c ADDM
other to respond ®rst before executing its response. algorithm is able to exploit the parallelism inherent in
Analysis of the occurrence of deadlock re¯ects the lack the underlying real-world problem, is re¯ected in its
of knowledge of a process of the states of other pro- performance metric. While every problem is likely to
cesses, as the primary reason and the presence of feed- require its unique set of metrics, two criteria may be
back loops as the chief cause. Clearly, in a uniprocessor applied uniformly across all ADDM algorithms.
where a single centralized, global scheduler is completely The ®rst, ``performance scalability'' (Iyer and Ghosh,
aware of the state of every process, deadlock will not 1995), re¯ects the ability of the ADDM algorithm to
occur. For asynchronous, distributed simulation of continue to function, i.e. achieve the primary perfor-
queueing networks, ®rst Peacock et al. (1979) and later mance objective, despite increasing system size, i.e. in-
Chandy and Misra (1981) had proposed algorithms to crease in the number of entities constituting the system.
avoid deadlock through the use of ``probe'' and ``null'' Since the computation underlying the system-wide de-
messages which essentially violate the principles of event cision-making is distributed among every entity, as the
driven simulation. Later, Chandy et al. (1983) developed system size increases, both the demand for increased
an asynchronous, distributed simulation algorithm computational power and the number of computational
which did not utilize null messages but suered from the engines increase. Assuming that the communication
problem of deadlocks. Implementations of their ap- network experiences proportional expansion, the ratio
proach are reported to be non-linear and highly ine- of available computational power to the required com-
cient and, in the course of its execution, the algorithm putational power is expected to decrease only by a
runs from deadlock to deadlock. Thus, it is crucial to marginal fraction, implying that the achievement of the
develop mathematical proofs of freedom from (Ghosh, primary performance objective will be aected only
1996) deadlock for any ADDM algorithm especially if marginally. Unless the nature of the real-world system is
the underlying system contains feedback loops. one where the increase in the communications network
In a traditional centralized algorithm, the centralized is non-linear with an increase in the number of entities,
decision-maker has complete access to the state of every the corresponding ADDM algorithm must exhibit
162 S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167
performance scalability. Second, consider a hypothetical ments built on democratic principles. The belief in the
mechanism that is capable of determining the absolute existence of an ADDM algorithm underlying the human
performance of any given real-world problem which, in society ®nds evidence in the unmistakable observation
turn, may serve as the ideal metric against which the that the civilization as a whole has progressed with the
performance of any ADDM algorithm may be evalu- march of time. Furthermore, while greater freedom, fair
ated. For further details, the reader is referred to Ghosh assignment of responsibility of tasks, and a wide latitude
et al. (2000). of autonomy among people to determine their own
destiny, subject to certain basic principles such as non-
2.2.8. Stability of ADDM algorithms injury to others, etc., have generally been associated
Where the ADDM algorithm constitutes an imple- with prosperity, oppression and lack of independence
mentation of a real-world system, it is likely to be sub- have invariably contributed to misery and sti¯ed pro-
ject to unexpected changes in the operating conditions. gress. It may be conjectured that perhaps the human
The property of stability refers to the behavior of the civilization is a grand manifestation of an ADDM al-
ADDM algorithm under representative perturbations to gorithm conceived by the creator. However, even as
the operating environment. For further details, the complex and as large a problem as the human society
reader is referred to Lee and Ghosh (1999, 2000). has lend itself to a successfully architected ADDM al-
gorithm. By its very nature, the underlying ADDM al-
2.3. A meta issue: on the existence and synthesis of gorithm is unceasing in its drive to bring equality to all
ADDM Algorithms for arbitrary real-world problems individuals and its ultimate goal may well be a society
with no hierarchy.
The concern ± whether an ADDM algorithm exists
for a given problem and, if so, how can one synthesis it
for a given asynchronous, distributed, real-world sys- 3. Examples of ADDM algorithms
tem, constitutes a meta issue. Although the literature is
rich on the development of sequential algorithms and According to the literature, ADDM algorithms have
substantial research is reported on the design of syn- been successfully conceived, designed, and simulated for
chronous, distributed algorithms, a general principle a number of real-world systems from dierent disci-
towards the synthesis of ADDM algorithms for natural plines, all utilizing the basic principles elaborated earlier
and physical problem is yet elusive. Although ADDM in Section 2 of this paper. A notable example is the
algorithms have been successfully developed for a autonomous decentralized transport operation system
number of real-world problems, as described in the (ATOS) (Yamanouchi, 1999), that controls the world's
subsequent chapters, the approach utilized has been to largest transportation system ± East Japan Railway
address each problem individually. First, the problem is Company. Designed and built by Hitachi, ATOS in-
analyzed carefully following which the ADDM algo- cludes 5000 autonomous computers that control 6200
rithm is synthesized. trains/day. Other examples include (Iyer and Ghosh,
The lack of a guarantee that an ADDM algorithm 1995; Lee and Ghosh, 1994; Lee and Ghosh, 1998; Chen
does exist for any arbitrary real-world problem, char- and Ghosh, 1997; Lee and Ghosh, 1994; Utamaphethai
acterized by the eight primary characteristics, plus the and Ghosh, 1996) and (Debenedictis et al., 1991). The
absence of a general mathematical theory to synthesize CCITT (Hac and Mutlu, 1989) had originally conceived
one, may be viewed as a limitation. However, from a an ADDM algorithm for the distributed routing of
dierent perspective, the task of ADDM algorithm de- ATM cells in a B-ISDN network. It has been subse-
velopment oers an unique opportunity for human quently implemented in (Chai and Ghosh (1993)) and
imagination, creativity, and a determination to solve a re®ned by the ATM Forum (1995). The DARYN (Iyer
problem. In addition to the successful development of and Ghosh, 1995) eort represents a successful algo-
ADDM algorithms for over a dozen real-world prob- rithm for distributed scheduling in railway networks and
lems, enumerated later in Section 3, a great source of it is observed to be both superior than the centralized
con®dence in the existence of ADDM algorithms for approach and scalable. The eort reported in Lee and
arbitrary real-world problems is one of the most im- Ghosh (1994) represents a real-time, distributed pay-
portant real-world problems in the universe ± the human ments processing scheme for a partially-connected set of
civilization. In this world, each human being may be banks. The eort described in Lee and Ghosh (1996)
viewed as an entity ± independent, concurrent, and in- refers to a decentralized command and control strategy
teracting asynchronously with other human beings. The for a representative battle®eld scenario involving tanks
degree to which an individual may be viewed as an entity and artillery units. The approach is observed to yield
has ¯uctuated with time ± less independence during superior performance than the traditional, centralized
oppressive regimes and ruthless rulers and greater free- command and control strategy, for both oensive and
dom during the reign of philosopher-kings and govern- defensive scenarios, under representative battle®eld
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 163
conditions. The RYNSORD (Lee and Ghosh, 1998) tions are realized through the utilization of the princi-
eort represents a successful, new distributed routing ples of batch-mode processing. Batch-mode processing
principle for railway networks, utilizing soft reservation. is a conservative and secure means of transaction pro-
The approach has been modeled for the actual railway cessing wherein transactions, initiated by users, are
network covering the eastern US, and performance re- stored within the system for a certain length of time,
sults attest to its signi®cance relative to the absolute or typically a few hours to a day or a week, and completed
ideal performance. HDDI (Chen and Ghosh, 1997) re- during o-hours, i.e., when the bank is closed to users.
¯ects a successful distributed inventory management Batch-mode processing suers from many limitations,
scheme, where the decisions are not only fast, relative to the principal ones being that (i) users are denied real-
the traditional scheme, but that the quality of the deci- time access to their money and that (ii) a user's banking
sions is high. The NOVAHID (Lee and Ghosh, 1994) privileges cannot be extended anywhere in the US. A
approach represents a novel, distributed approach to centralized banking algorithm, similar to the Swiss int-
international payments processing that is guaranteed to erbank clearing system (SIC), is inadequate for the
be accurate. The DICAF (Utamaphethai and Ghosh, United States of America with nearly 12,700 ®nancial
1996) eort represents a new architecture for intelligent institutions and is extremely vulnerable to a natural
vehicle highway system that is potentially cost-eective calamity or an act of terrorism.
and delivers strong performance, relative to the cen- Analysis of the payment-processing problem reveals
tralized trac management scheme. The most signi®- the following. First, it is geographically distributed. A
cant success in ADDM algorithm design consists of check may be deposited by a payee at a ®nancial insti-
YADDES (Debenedictis et al., 1991) which realizes, for tution while the payer's bank may be situated at a dis-
the ®rst time, asynchronous, distributed, deadlock-free, tant location. Second, the processing of a check requires
and null message-free event driven simulation on a both computation as well as communication between
network of workstations, con®gured as a loosely-cou- the relevant banks. Third, at any given time instant, the
pled parallel processor. P 2 EDAS (Walker and Ghosh, system may consist of multiple checks involving dierent
1997) represents a successful algorithm for accurate, ®nancial institutions. Fourth, the introduction of a
concurrent execution of VHDL models of hardware check into the system may occur at any time, i.e. the
systems on a loosely-coupled parallel processor system. system is asynchronous. An ADDM algorithm has been
P 2 EDAS marks a signi®cant improvement over YAD- designed and implemented (Lee and Ghosh, 1994) for
DES in that it detects and preempts inconsistent events payments processing. In this approach, every banking
correctly, thereby guaranteeing accuracy. node is characterized by the functions of routing and
In the remainder of this section, the nature of three processing transactions. Only a banking node owns ex-
real-world problems from dierent disciplines are ana- clusive access to and maintains the most recent balances
lyzed, the motivations to develop ADDM algorithms for for all of its accounts. Any transaction introduced into
them are noted, and brief descriptions of the ADDM the system is routed by the network to the appropriate
algorithms design are presented. The problems relate to target bank where it is executed. Following the execu-
the disciplines of real-time payments processing, sched- tion, an acknowledgment is returned to the originating
uling in railway networks, and distributed routing in B- bank. In addition, every banking node is assumed to
ISDN networks. Detailed descriptions of each of the possess complete topological knowledge of the network
algorithms including the conception of the entities, their and computes routes to every other node in the network.
behaviors that encapsulate the underlying ADDM A banking transaction that is initiated at a bank, termed
control algorithm, the nature of the communication depositing bank, may be propagated to a dierent bank,
between them, proof of correctness, and stability anal- termed payer bank, on which the instrument of payment
ysis, are lengthy and beyond the scope of this paper. The is drawn. The information to be propagated is encap-
reader is referred to the corresponding references cited sulated as a packet, and given the partially connected
here. nature of the network, a packet may be routed by one or
more nodes to the ultimate destination. Following the
3.1. An ADDM Algorithm for real-time, domestic pay- completion of processing at the payer bank, a new event
ments processing in the form of an acknowledgment is propagated to the
payee bank, which may or may not be identical to the
The payment-processing system in the banking in- depositing bank. The message may either state that
dustry consists of deposits, withdrawals, and transfers of the transaction succeeded or that it failed. Thus, infor-
monies through the use of cash, checks, magnetic tapes, mation may ¯ow in a cycle and, consequently, the
and electronic transactions. Of these, nearly all of the banking system corresponds to a discrete-event simula-
check processing through the United States Federal tion system with feedback loops. Given the fact that (i)
Reserve System, most of the check processing in the the banking network is characterized by geographically
private networks, and a part of the electronic transac- distributed, processing nodes, (ii) multiple, concurrent
164 S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167
computational engines, and (iii) the fact that transac- 3.3. An ADDM algorithm for routing in ATM networks
tions are asynchronous i.e., they are introduced at ir-
regular time instants, this approach utilizes a variation The traditional approach to routing of calls in tele-
of the discrete event simulation algorithm (Chandy and phone networks consists of centralized scheme such as
Misra, 1981), the details of which are presented in Lee AT&T's dynamic non-hierarchical routing (DNHR)
and Ghosh (1994). (Key and Cope, 1990). While the centralized approach is
slow and inecient, especially for large number of calls
3.2. An ADDM algorithm for railway networks and a network spread over a wide geography, the cen-
tralized unit is highly vulnerable to natural and arti®cial
In the traditional centralized control systems for disasters. Recognizing the limits of centralized routing,
railways including the North American advanced train the designers of the high-speed B-ISDN network in the
control system (ATCS), the state of all trains are fed late 1980s at CCITT, conceived a distributed routing
periodically to the centralized unit which ®rst computes algorithm which is re¯ected in the B-ISDN design and
the decision for each train sequentially and then relays it one that constitutes an ADDM algorithm. The algo-
to the corresponding train. Consequently, where the rithm had been validated through a large-scale modeling
number of trains is large, the decision-making is slow. A and simulation eort (Chai and Ghosh, 1993). The dis-
careful examination of the railway system reveals the tributed routing algorithm has been signi®cantly re®ned
following. First, the decision-making consists in dy- by the ATM Forum through its PNNI speci®cation
namically allocating tracks to railways, as needed. Sec- (ATM Forum, 1995). An implementation of a signi®-
ond, trains are inserted into the system asynchronously, cant subset of PNNI has been utilized in a study (Ghosh
i.e. at irregular times. Third, the railway system consists and Robinson, 1999) relative to the vulnerability anal-
of two basic types of entities ± trains and tracks. An ysis of ATM networks.
ADDM algorithm, DARYN (Iyer and Ghosh, 1995), In this approach, every node performs the dual
has been successfully designed and implemented. It ex- functions of determining the routes for the users' calls
ploits the ability for trains to carry computing engines and switching the cells of the calls towards their ®nal
and to communicate, while in motion, with other, destination. For every call, while the originating node
stationary computers. RYNSORD (Lee and Ghosh, undertakes a signi®cant fraction of the computational
1998) constitutes a sophisticated re®nement of DARYN burden, other nodes in the network also participate in
and for the details of a stability analysis study of the routing decision. In this approach, before the cells of
RYNSORD, the reader is referred to Seshasayi et al. a call are launched into the switches, ®rst, a route ±
(1999). virtual path, is established from the source to the des-
In DARYN, the overall decision process is distrib- tination, based on a set of criteria. Then, the necessary
uted onto every train and station of the system. The resources, such as bandwidth, are reserved. To deter-
decision process for every train is executed by an on- mine the virtual path, the originating node requires
board processor that negotiates, dynamically and pro- knowledge of the resource status of the remote nodes.
gressively, for temporary ownership of the tracks with This is achieved through the ¯ooding, i.e. periodic
the respective station controlling the tracks, through broadcasting, of its status by every node. Thus, the
explicit processor to processor communication primi- routing related decision-making is distributed among all
tives. This processor then computes its own route uti- nodes of the network. The bene®ts are tri-fold. First,
lizing the results of its negotiation, its knowledge of the routing decisions are achieved faster relative to the
track layout of the entire system, and its evaluation of centralized scheme. Second, as a result of distributed
the cost function. Every station's decision process is also routing, the overall network is more robust. Third, the
executed by a dedicated processor that, in addition, impact of increasing the network size on the routing
maintains absolute control over a given set of tracks and performance is far less severe than the centralized
participates in the negotiation with the trains. As more scheme.
trains and stations are added to the system, both the
requirement for computational power and the number
of computational engines increase. Assuming that a 4. Conclusions
train's origin and destination is uncorrelated with that
of any another train, the ratio of available computa- This paper has introduced a new class of algorithms ±
tional power to the required computational power is ADDM algorithms, to constitute the underlying control
expected to decrease only by a marginal fraction. Con- of most large-scale, complex, real-world systems.
sequently, the time required by a train to travel from ADDM algorithms constitute the natural and logical
locations A to B at a given speed is expected, in general, choice for systems control, and hold the promise of
to be marginally aected by an increase in the network extracting the maximal parallelism inherent in these
size and the number of trains. systems. Furthermore, true asynchronous systems can
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 165
be described accurately only by asynchronous, distrib- Chai, A., Ghosh, S., 1993. Modeling and distributed simulation of
uted algorithms. This paper has also reasoned the nature broadband ± ISDN network on a network of Sun workstations
con®gured as a loosely-coupled parallel processor system. IEEE
of most complex real-world systems from ®rst princi- Computer 26 (9), 37±51.
ples, presented the principle of ADDM algorithms, and Chandy, K.M., Misra, J., 1981. Asynchronous distributed simulation
argued in favor of its increasing importance in future, via a sequence of parallel computations. Communications of the
large-scale, systems design. It has also detailed the ACM 24 (4), 198±206.
principal characteristics of ADDM algorithms, enu- Chandy, K.M., Haas, L.M., Misra, J., 1983. Distributed deadlock
detection. ACM Transactions on Computer Systems 1 (2), 144±156.
merated a number of successful ADDM algorithms, Chen, L.R., Ghosh, S., 1997. Modeling and simulation of a hierar-
reported in the literature, and brie¯y reviewed the nature chical, distributed, dynamic inventory management (HDDI)
of three of them ± (1) real-time, domestic payments scheme. Simulation ± The Journal of the Society for Computer
processing system, (2) distributed scheduling in railway Simulations 68 (6), 340±362.
networks, and (3) distributed routing in ATM networks. Christian, F., Fetzer, C., 1999. The timed asynchronous distributed
model. IEEE Transactions on Parallel and Distributed Systems 10
(6), 642±657.
Clement, M.J., Quinn, M.J., 1994. Architectural scaling and analytical
performance prediction. In: Seventh International Conference on
Acknowledgements Parallel and Distributed Computing Systems, Las Vegas, NV,
October, pp. 16±21.
The author gratefully acknowledges the insight, en- Cormen, T.H., Leiserson, C.E., Rivest, R.L., 1990. Introduction to
couragement, thoughts, and support of many individu- Algorithms. MIT Press, Cambridge, MA.
als over the last several years. Without support from the Coulouris, G.F., Dollimore, J., 1988. Distributed Systems: Concepts
and Design. Addison-Wesley, Reading, MA.
US BMDO and Army Research Oces under the grants Culler, D.E., Karp, R.M., Patterson, D., Sahay, A., Santos, E.E.,
DAAL03-91-G-0158, DAAH04-93-G-0126, and Schauser, K.E., Subramonian, R., von Eicken, T., 1996. LogP: a
DAAH04-95-1-0101, this research might not have been practical model of parallel computation. Communications of the
possible. ACM 39 (11), 78±85.
Debenedictis, E., Ghosh, S., Yu, M.L., 1991. An asynchronous
distributed discrete event simulation algorithm for cyclic circuits
using data-¯ow network. IEEE Computer 24 (6), 21±33.
References Ertel, W., 1994. On the de®nition of speedup (Parallel algorithms). In:
6th International PARLE Conference Proceedings, Athens,
Arrouye, Y., 1996. Scope: an extensible interactive environment for the Greece, July, pp. 289±300.
performance evaluation of parallel systems. Microprocessing and Feynman, R.P., 1990. The Character of Physical Law. The Messenger
Microprogramming 41 (8±9), 609±623. Lectures, The MIT Press, Cambridge.
ATM Forum Technical Committee. Private network±network interface Flynn, M.J., 1996. Parallel processors were the future ... and may yet
speci®cation version 1.0 (PNNI 1.0). www.atmforum.com/atmfo- be. IEEE Computer 29 (12), 151±152.
rum/specs/approved.html March 1996, ATM Forum, March 1996. Ghosh, S., 1996. On the proof of correctness of yet another
Barr, R.S., Hickman, B.L., 1992. On reporting the speedup of parallel asynchronous distributed discrete event simulation algorithm
algorithms: a survey of issues and experts. In: Computer Science (YADDES). IEEE Transactions on Systems, Man, and Cybernet-
and Operations Research. New Developments in their Interfaces, ics 26 (1), 68±74.
Williamsburg, VA, January, pp. 279±293. Ghosh, S., Robinson, P., 1999. A framework for investigating security
Bertsekas, D.P., Tsitsiklis, J.N., 1988. Distributed asynchronous attacks in ATM networks. In: Proceedings of the IEEE MIL-
algorithms. In: Proceedings of the 1988 IEEE International COM'99 Conference, Atlantic City Convention Center, NJ,
Conference on Systems, Man, and Cybernetics, pp. 591±593. October 31±November 3, pp. 724±728.
Bilardi, G., Herley, K., Pietrecaprina, A., Pucci, G., Spirakis, P., 1996 Ghosh, S., Lee, T., Joo, S.S., 2000. A frame of reference for the
BSP Vs. LogP. In Proceedings of the 1996 8th ACM Symposium performance evaluation of asynchronous, distributed decision-
on Parallel Algorithms and Architectures, Padua, Italy, June 24± making algorithms. Journal of Systems and Software 55 (1), 45±56.
26, pp. 25±32. Gupta, A., Kumar, V., 1993a. Analyzing performance of large scale
Braddock, R.L., Claunch, M.R., Rainbolt, J.W., Corwin, B.N., 1992. parallel systems. In: Proceedings of the Twenty-Sixth Hawaii
Operational performance metrics in a distributed system, metrics International Conference on System Sciences, Wailea, Hawaii,
and interpretation. In: Proceedings of the ACM/SIGAPP Sympo- January, pp. 144±153.
sium on Applied Computing, Kansas, MO, March, pp. 873±882. Gupta, A., Kumar, V., 1993b. Performance properties of large scale
Brehm, J., Madhukar, M., Smirni, E., Dowdy, L., 1995. PerPreT ± a parallel systems. Journal of Parallel and Distributed Computing 19
performance prediction tool for massively parallel systems. In: 8th (3), 234±244.
International Conference on Modelling Techniques and Tools for Hac, A., Mutlu, H.B., 1989. Synchronous optical network and
Computer Performance Evaluation, Heidelberg, Germany, Sep- broadband ISDN protocols. IEEE Computer 22 (11), 26±34.
tember, pp. 284±298. Ihara, H., Mori, K., 1984. Autonomous decentralized computer
Capon, P.C., 1992. Understanding the behaviour of parallel systems. control systems. IEEE Computer 17 (8), 57±66.
In: Proceedings of the Workshop on Performance Measurement Ishida, Y., 1997. The immune system as a prototype of autonomous
and Visualization of Parallel Systems, Moravany, Czechoslovakia, decentralized systems: an overview. In: Proceedings of the Third
October, pp. 201±223. IEEE International Symposium on Autonomous Decentralized
Celenk, M., Yang W., 1994. Performance evaluation of the networks Systems, Berlin, Germany, April 9±11, pp. 85±92.
of workstations for parallel processing applications. In: Proceed- Iyer, R.V., Ghosh, S., 1995. DARYN, a distributed decision-making
ings of the 26th Southeastern Symposium on System Theory, algorithm for railway networks: modeling and simulation. IEEE
Athens, OH, March, pp. 540±544. Transactions on Vehicular Technology 44 (1), 180±191.
166 S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167
Key, P.B., Cope, G.A., 1990. Distributed dynamic routing schemes. Rotithor, H.G., 1991. Enhanced Bayesian decision model for decen-
IEEE Communications Magazine, October, pp. 54±64. tralized decision making in a dynamic environment. In: Proceed-
Kohavi, Z., 1978. Switching and Finite Automata Theory. McGraw- ings of the IEEE International Conference on System, Man, and
Hill, New York. Cybernetics, vol. 3, Charlottesville, VA, June, pp. 2091±2096.
Kremien, O., 1995. Scalability in distributed systems, parallel systems Seshasayi, P., Lee, T., Ghosh, S., 1999. Stability of RYNSORD, a
and supercomputers. In: Proceedings of the International Confer- decentralized algorithm for railway networks, under perturbations.
ence on High-Performance Computing and Networking, Milan, In: Proceedings of the 49th IEEE Annual Vehicular Technology
Italy, May, pp. 532±541. Conference (VTC 1999), Amsterdam, The Netherlands, September
Kremien, O., Kramer, J., Magee, J., 1990. Rapid Assessment of 21±23, pp. 805±809.
Decentralized Algorithms. In: Proceedings of the IEEE Interna- Gerard, T., 1991. Topics in distributed algorithms, Cambridge
tional Conference on Computer Systems and Software Engineering International Series on Parallel Computation, Press Syndicate of
(COMPSAC), pp. 329±335. the University of Cambridge, p. 1 (Chapter 1).
Kumar, A., Ramakrishnan, S., Deshpande, C., Dunning, L., 1994. Tron, C., Arrouye, Y., de Kergommeaux, J.C., Kitajima, J.P., Maillet,
Performance comparison of two algorithms for task assignment. E., Plateau, B., Vincent, J.M., 1993. Performance evaluation of
In: Proceedings of the 1994 IEEE International Conference on parallel systems. ALPES environment. In: Proceedings of the
Parallel Processing, Raleigh, NC, August, pp. 83±87. International Conference on Parallel Computing: Trends and
Kushwaha, R., 1993. Methodology for predicting performance of Applications, Grenoble, France, September, pp. 715±718.
distributed and parallel systems. Performance Evaluation 18 (3), Tsitsiklis, J.N., Athans, M., 1984. Convergence and asymptotic
189±204. agreement in distributed decision problems. IEEE Transactions
Lalgudi, K.N., Bhattacharya, D., Agrawal, P., 1994. On the perfor- on Automatic Control 29 (1), 42±50.
mance prediction of parallel algorithms. In: Seventh International Tsitsiklis, J.N., Athans, M., 1985. On the complexity of decentralized
Conference on Parallel and Distributed Computing Systems, Las decision making and detection problems. IEEE Transactions on
Vegas, NV, October, pp. 330±335. Automatic Control 30 (5), 440±446.
Lecuivre, J., Song, Y.Q., 1995. A framework for validating distributed Tsitsiklis, J., Stamoulis, G., 1995. On the average communication
real time applications by performance evaluation of communica- complexity of asynchronous distributed algorithms. Journal of the
tion pro®les. In: Proceedings of the 1995 IEEE International ACM 42 (2), 382±400.
Workshop on Factory Communication Systems, Leysin, Switzer- Utamaphethai, N., Ghosh, S., 1996. DICAF, a distributed architecture
land, October, pp. 37±46. for intelligent transportation. IEEE Computer 31 (3), 78±84.
Lee, P.C., Ghosh, S., 1994. International payments processing in real- Walker, P., Ghosh, S., 1997. Exploiting temporal independence in
time: a distributed architecture. IEEE Computational Science and distributed preemptive circuit simulation Approach. In: Proceed-
Engineering 1 (3). ings of the IEEE/ACM European Design and Test Conference,
Lee, T., Ghosh, S., 1994. A distributed approach to real-time ED&TC97, Paris, France, March 17±20, pp. 378±382.
payments-processing in a partially-connected network of banks: Westphal, H., Popovic, D., 1994. Performance evaluation of distrib-
modeling and simulation. Simulation ± The Journal of the Society uted, intelligent real-time control systems. In: Proceedings of the
for Computer Simulations 62 (3), 180±201. 1994 American Control Conference, Baltimore, MD, June, pp.
Lee, T., Ghosh, S., 1996. A novel approach to asynchronous 2662±2666.
decentralized decision-making in military command and control. Wieland, F., Jeerson, D., Reiher, P., 1992. Experiences in parallel
IEEE Computational Science and Engineering 3 (4), 69±79. performance measurement: the speedup bias. In: Symposium on
Lee, T., Ghosh, S., 1998. RYNSORD: a novel decentralized algorithm Experiences with Distributed and Multiprocessor Systems, New-
for railway networks with soft reservation and dynamic routing by port Beach, CA, March, pp. 205±215.
autonomous trains towards ecient resources utilization. IEEE Yamanouchi, S. 1999. Essential information systems for railways and
Transactions on Vehicular Technology 47 (4), 1350±1364. intensive application of ADS technology ± COSMOS and ATOS.
Lee, T., Ghosh, S., 1999. On the concept of stability in asynchronous In: Keynote Address: International Symposium on Autonomous
distributed decision-making systems. In: Proceedings of the Fourth Decentralized Systems, Tokyo, October, pp. 2±9.
International Symposium on Autonomous Decentralized Systems, Yan, J.C., Listgarten, S., 1993. Intrusion compensation for perfor-
ISADS99, Tokyo, Japan, March 21±23, pp. 85±92. mance evaluation of parallel programs on a multicomputer. In:
Lee, T., Ghosh, S., 2000. On stability in asynchronous distributed Sixth International Conference on Parallel and Distributed Com-
decision-making systems. IEEE Transactions on Systems, Man, puting Systems, Louisville, KY, October, pp. 427±431.
and Cybernetics Part B 30(4), August.
Leighton, F.T., 1992. Introduction to Parallel Algorithms and Archi-
tectures. Morgan Kaufmann, San Mateo, CA. Sumit Ghosh is the Thomas E. Hattrick Professor of Information
Lynch, N., 1996. Distributed Algorithms. Morgan Kaufmann, San Systems Engineering in the Department of Electrical and Computer
Mateo, CA. Engineering at Stevens Institute of Technology in Hoboken, New
Jersey. He also serves as the director of the computer engineering
Manwaring, M., Chowdhury, M., Malbasa, V., 1994. An architecture program. Prior to Stevens, he had served as the associate chair for
for parallel interpretation: performance measurements. In: Pro- research and graduate programs in the Computer Science and Engi-
ceedings of the 20th EUROMICRO Conference, Liverpool, UK, neering Department at Arizona State University, before which he had
been on the faculty at Brown University, Rhode Island, and even
September, pp. 531±537. before he had been a member of technical sta (principal investigator)
Markas, T., Royals, M., Kanopoulos, N., 1990. On distributed fault at Bell Laboratories Research in Holmdel, New Jersey. He received his
simulation. IEEE Computer 23(1), pp. 40±52, January. B. Tech degree from the Indian Institute of Technology at Kanpur,
Peacock, J.K., Wong, J.W., Manning, E.G., 1979. Distributed India, and his M.S. and Ph.D. degrees from Stanford University,
California. Sumit's additional industrial experience includes Silvar-
simulation using a network of processors. Computer Networks 3 Lisco in Menlo Park, CA., Fairchild Advanced Research and Devel-
(1), 44±56. opment, and Schlumberger Palo Alto Research Center. His research
Ronngren, R., Barriga, L., Ayani, R., 1996. An incremental bench- focuses on fundamental and challenging yet practical problems that
mark suite for performance tuning of parallel discrete event are of potential bene®t to society. Principal areas include next gener-
ation nVHDL, next generation secure ATM network design, next
simulation. In: Proceedings of the Twenty-Ninth Hawaii Interna- generation IP router architecture, determining network operating
tional Conference on System Sciences, Wailea, Hawaii, January, point for operational networks, deep space networking and distributed
pp. 373±382. visualization, and next generation asynchronous distributed simula-
S. Ghosh / The Journal of Systems and Software 58 (2001) 153±167 167
tion-based netcentric complex system design, validation, and testing. A associate editors for the Transactions of the Society for Computer
more detailed list of current research pursuits may be viewed at the Simulation International, IEEE Transactions on Fuzzy Systems, IEEE
URL site, http://attila.stevens-tech.edu/sghosh2. Sumit is the author/ Transactions on Systems, Man, and Cybernetics, and is on the edito-
co-author of three original monographs/books: Hardware Description rial board of the IEEE Press Book Series on Microelectronic Systems
Languages: Concepts and Principles (IEEE Press, 2000); Modeling and Principles and Practice. Sumit is the founder (1995) of the Networking
Asynchronous Distributed Simulation of Complex Systems (IEEE and Distributed Algorithms Lab. at ASU. Sumit is a US citizen. Sumit
Press, 2000); and Intelligent Transportation Systems: New Principles has held visiting professor positions at Federal University of Rio de
and Architectures (CRC Press, 2000). Sumit has written 85+ transac- Janeiro (Brazil), University of Marseilles (France), and Kuwait Uni-
tions/journal papers and 80+ refereed conference papers. He serves as versity (Kuwait).