0% found this document useful (0 votes)
44 views

Programming Distributed Systems P310-Feldman

Uploaded by

chucku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Programming Distributed Systems P310-Feldman

Uploaded by

chucku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PROGRAMMING DISTRIBUTED

SYSTEMS
J. A. Feldman, d. R. Low, and P. D. Rovner

Computer Science Department


The University of Rochester
Rochester, New York 14627

Key words: Distributed computing; programming systems and languages; computer


networks. The fundamental design decision in the implementa-
tion of RIG was to allow a strict message discipline with
1. Introduction no shared data structures. All communication among
For a number of technical and economic reasons, dis- both user and server processes is through messages
tributed computing is increasing rapidly in importance. which are routed by the Aleph kernel. This message dis-
The main technical considerations are the. increased cipline has proven to be very flexible and reliable. The
performance of small machines and of communications basic system has been described in [Ball et al., 1976],
systems relative to that of giant computers and the op- and more detailed descriptions are available as internal
portunity for greater system reliability through redun- memos.
dancy. The economic considerations include all of the There are two other distributed computing projects
that we started shortly after RIG and which were origi-
technical ones plus additional aspects including the geo- nally quite separate. The first of these was the develop-
graphic dispersal of needs and the organizational advan-
ment of low-level communication protocols for efficient
tages of more specialized facilities (cf. Bank of America and reliable transmission of messages within our local
article in Datamation [1976]). The Computer Science network. This has led to a sophisticated connectionless
Department of the University of Rochester has, from its protocol [Rovner, 1977] which seems to have significant
inception in 1974, directed much of its research effort advantages over previous proposals; Section 4 contains
toward a better understanding of distributed computing. a discussion of this work. At some point in the develop-
ment, it occurred to us that we were using very similar
2. RIG, an Intelligent Gateway message systems within RIG and externally in the net-
The initial laboratory facility of the Department was a work. We have made some progress in unifying these two
network of five (now six) minicomputers linked into larger projects with the third project, PLITS.
campus and national resources. This was designed to al-
low for very flexible usage of a wide variety of computing
3. PLITS, a Language for Distributed Computing
facilities and is called RIG, for Rochester's Intelligent
Gateway (Figure 1). The PLITS project originally had no direct relation to
distributed computing, but was concerned with develop-
The term "gateway system" has been coined to describe ing a non-trivially new programming language. There
a computer system designed primarily to connect users were two basic underlying assumptions: (1) that program-
to other computers and computer networks. An "intelli- ming languages had changed little in the previous decade
gent gateway" goes beyond the simple function of com- despite advances in many related areas and (2) that one
munication. It plays a central role in the handling of in- could hypothesize compilers of the sophistication of the
formation, actively translating the intentions of users best current artificial intelligence programs. We began
into communication forms which can be understood by
by trying to isolate the most important concepts current-
other computers. Because a "user" may in fact be another
ly available in programming systems and to see where
machine, an intelligent gateway Can serve as a communi-
they were compatible and incompatible. The project was
cation link between the various computers and computer
called PLITS (Programming Language in the Sky) and,
networks to which it is connected.
although it has come down a little closer to the ground,
The RIG system development effort has been divided
the name has stuck.
into three major parts:
The two fundamental building blocks underlying any
(1) An operating system kernel, called Aleph, has been
PLITS system are modules and messages. A module is a
developed. Afeph's primary function is to oversee
self-contained entity, something like a Simula or Small-
the transfer of information and control between
talk class, a SAIL process, or a CLU module. It is not im-
processes running within the system.
portant for the moment which programming language is
(2) Server Processes have been developed to handle
used to encode the body of a module; we wish explicitly
requests for resources available to the system. The
to account for the case in which various modules are
use of independent processes to handle resource
coded in different languages on a variety of machines.
allocation allows RIG to provide intelligent access
For now, let's consider modules to be programmed in
to its facilities.
Algol-6O and also assume that there are some modules
(3) The writing of User Processes has begun. These
available for input, output, and file manipulation.
are the intelligent agents which translate user de-
sires into requests to Server Processes.

Permission to copy without fee all or part of this material is granted provided
that the copies are not made or distributed for direct commercial advantage,
the ACM copyright notice and the tide of the publication and its date appear,
and notice is given that copying is by permission of the Association for Com-
puting Machinery, Inc. To copy otherwise, or to republish, requires a fee and/or
specific permission.

1978 ACM 0-89791-000-1/78/0012/0~J0 /$00.75

310
1,01
Figure 1, RIG Hardware Configuration

Modules communicate with one another solely through here to stay and had to be accounted for. Structured
messages. In order to have communication, there must programming seemed to be attacking the right problem
be something that is understood by both communicating with unreasonable methods. Messages were known to
modules. The common element in PLITS iS a name be a very good control primitive and were the coin of
which may be thought of as an uninterpreted string of networking. The experience with RIG convinced us that
characters. A message is a set of (name~value) pairs messages also seemed to be a good mechanism for pro-
called slots. The value portion of a slot will be an element ducing reliable yet still flexible software.
of some primitive domain (think of integers) whose rep- The message-module paradigm became established
resentation is also generally understood. quickly as the fundamental solution for PLITS.
The modules of any PLITS system will have to be able The decision to have public names as the basis of com-
to compose, send, receive, and decompose messages. munication seems obvious in retrospect, but was difficult
For this purpose, we must add some data types and op- to arrive at. By sharing names rather than variables or
erations to ALGOL or any other body language. In this sequential positions in some structure, modules could be
case, the primitive data types of ALGOL will have to be written in a way that was clear, but did not have the
extended to include module and message. Each modu;e problems of shared storage.
also will contain an explicit declaration (Public) of every It was apparent from work in automatic programming
slot name that it can deal with along with the data type and verification that more declarative information was
of that slot. There is a process analogous to link-editing needed--hence we included the general notion of asser-
that insures that public slot names are used consistently. tions. Although many difficult questions remain, enough
For a first example, suppose there were s module, clean solutions have been found to convince us that
Fibonacci, which provided the service of supplying con- there is something fundamentally sound in the PLITS
secutive positive Fibonacci numbers, and a module, world view.
George, which wanted to make use of this service. The Example 1 is basically bad PLITS code; the module
code for this would be something like that shown in Ex- Fibonacci contains no error checking. Let us consider an
expanded but still weak version which will not cause in-
ample 1 (George and Fibonacci are actual constant
named modules, not module prototypes or class defini- teger overflow (Example 2).
tions}. The first new notion occurs on line 4, where a public
slot name of type "problem_type" is declared. The type
We see that George and Fibonacci both know the slot problem_type is a fixed sequence of uninterpreted sym-
names "Object" and "Recipient" and thus can commu- bols exactly like the Pascal "enumeration" type. There
nicate. At the appropriate time, George composes a mes- will be several public enumerations in a PLITS system.
sage with one slot, having as a value the system identifier
In lines 9-11, a prepackaged message is assembled and
for the module George itself. After sending the message
stored in the message variable, My_Complaint. The other
to Fibonacci, George is suspended until a message from
new code is in lines 21-27; the Recipient slot of My_Com-
Fibonacci is received, i,e., this is essentially a subroutine
call. The Fibonacci module simply waits for a request plaint is filled in from the Request. If there is a Com-
and fulfills it. The syntax for accessing and modifying plainLDept slot in this request, the module which is its
messages treats them like the records of, e.g., Pascal value will be sent the complaint. Otherwise, some default
complaint handler, City_Hall, will hear about it. The
[Hoare, 1973].
name of the Recipient module (which may have been
Starting from a survey of the "powerful ideas" of pro-
awaiting an answer) is passed along to the Complaint-
gramming systems, we attempted to see if there were in-
Dept, because there might be some appropriate response
herent incompatibilities among them. It was immediate- to the problem. For example, there could be some double
ly clear that one could not combine all Of the useful precision Fibonacci module which would be able to re-
language primitives in a consistent way--so PLITS had turn an appropriate value if George were prepared to
to include different languages. Networking was clearly

311
1 Begin "George" Begin "Fibonacci"
2 Public Integer Object; Public Integer Object;
3 Public module Recipient; Public module Recipient;

4 Begin Comment George's thing; Message Request;


5 Integer I,J,NexLFib; Integer This, Last, Previous;
6 MessageMess1, Mess2; Last'O; This-l;

While true do
Begin
7 Send {Recipient~Me} to Fibonacci; Request-Receive from Any;

8 Mess2-Recelve from Fibonacci; Previous-Last;


Last-This;
9 NexLFib-Mess2.Object; This-Last+Previous;

Send {Object~Thisl to Request-Recipient;

10 End End
11 End "George" End "Fibonacci"
Example 1

1 Begin "Fibonacci"
2 Public Integer Object; 1 Begin "Fibonacci"
3 Public module Recipient, ComplainLDept, Complainer; 2 Public integer Object;
4 Public problem_type Problem; 3 Public module Recipient;
4 Public action type Action;
5 message Request, My_Complaint;
6 module Complainee; 5 message Request;
7 integer This, Last, Previous, Biggest;
6 map This[trecsection} =>integer;
8 Last-0; This-l; Biggest-23~ ~1 ; 7 map Last[transaction) => integer;

9 My_Complaint- {Problem~Overflow, 8 Integer Biggest,Previous,ActiveCount;


10 Complainer~Me 9 transaction ThisTrans;
11 } tO transaction set Active;

12 While True do 11 Biggest-23~ - 1;


13 Begin 12 ActiveCount-O;
14 Request-Receive from any; 13 Active-{Start NewSequence};
15 Previous-Last;
16 Last-This; 14 While true do
15 Begin
17 If Biggest - Last> Previous 16 Request-receive from any about Active;
18 then Begin This-Last+Previous; 17 case Request.Action of
19 Send {Object~This} to Request. Recipient 18 Begin
20 End 19 [Start[
21 else Begin 20 Begin
22 Put (Recipient~Request. Recipient) In My_Comptaint; 21 ThisTrans-new transaction;
23 Complainee-lf Present Request.ComplainLDept then 22 ActiveCou nt -ActiveCount +
Request. ComplainLDept else City_Hall; 23 If ActiveCount = MaxCount then
24 remove StartNewSequence from Active;
24 Send My_Complaint to Complainee 25 put ThisTrans in Active;
25 End 26 Last[ThisTrans]-l;
26 End While Loop 27 This[ThisTrans)-l;
27 End "Fibonacci" 28 send {Object~This[ThisTrans}} to Request.Recipient
Example 2 about ThisTrans;
29 End;
30 [Generate]
31 Begin
32 ThisTrans-Request.About
33 Previous-Last {ThisTrans);
34 Last[ThisTranS} -This[ThisTrans};
35 This[ThisTrans] -Last[ThisTrans}+ Previous;
36 send IObject~This[ThisTrans}} to Request.Recipient
about ThisTrans;
37 End;
38 [Terminate]
39 Begin
40 if ActiveCount = MaxCount then
41 put StartNewSequence in Active;
42 ActiveCount-ActiveCount - 1;
43 destroy This[Request.About];
Last[Request.Aboutl;
44 remove Request.About from Active;
45 End;
46 End;
47 End;
48 End "Fibonacci"
Example 3

312
accept it. This would require that George handle double-
4.1 System Overview
size integers: that is not hard to arrange, for example, by
an extra slot for the high order part. Even on a single machine, there will have to be some
There is a more interesting problem in the control dis- underlying programs which handle messages. We will
cipline used in the coding of the module George given in call this collection of programs the kernel for a PLITS site.
Example 1. The statement on line 8 is: The kernel is a conventional multi-programming monitor
which sequences through the modules on its "ready"
Mess2-Recelve from Fibonacci.
queue. The kernel also maintains data structures describ-
But we saw in the expanded Fibonacci module of Exam- ing modules which are "suspended," waiting tO Receive
ple 2 that there might be an error recovery module that a message of a specified sort. These data structures, to-
would supply the answer if Fibonacci could not. The gether with analogous ones for messages which result
coding style of line 8 requires that the answer be con- from Send statements, suffice to implement the PLITS
veyed back to Fibonacci and then to George, but there is message primitives.
nothing to be gained by retracing our steps. To solve this A problem arises if the modules are written in different
and a number of other control problems, we will add one body languages. It may be the case that languages differ
more ~:onstruct, transaction, to PLITS. Intuitively, a trans- in their representation of primitive data types (e.g., real).
action is a key which can be used in the regulation of We require that the representation of primitive data
message traffic. We could replace 8 with: types be uniform within a site. This, as well as other con-
8' Mess2-Receive about Key4 ; siderations, may give rise to the situation where there is
more than one site on a given machine involved in an in-
where Key4 is a transaction which is identified with the
dividual distributed job (D JOB). Figure 2 is a graphic
generation of this sequence of Fibonacci numbers. Selec-
representation of the breakdown of functions and termi-
tive receives based on transaction keys allow a receiving
no)ogy which we have adopted. It is convenient to divide
module to be programmed without regard to which mod-
the PLITS support functions into two subsets carried OUt
ule will ultimately send it the message. Yet the receiving
by the site kernel and by the DSYS Host Control Program
module is still able to keep separate "conversations"
(DHCP) respectively. In the example, there are two
distinct.
DJOBs, A and B, which have no connection but happen
This leads us to the use of transactions as tags for
to be both distributed over Machines 1 and 2. DJOB A
different streams of communication to and from multi-
consists of three sites: $11 and $12 on Machine 1 and $21
plexing modules, In Example 3, we have a new Fibonacci
on Machine 2. Each site has a kernel associated with it
module ~,hich can simu}taneously maintain several dif-
as described above. The kernel performs the following
ferent streams of Fibonacci sequences for one or many
functions:
users.
In the example, "map" (lines 8-7, 28, 27, 34, ;35, etc.) (1) distributes messages within the site;
indicates a data structure which maps a transaction key (2) forwards messages to and from other sites;
into an integer. Think of it as an integer array indexed by (3) carries out needed representation shifts for inter-
transaction keys. "Action_type" is an enumeration type site messages;
consisting of SMART, GENERATE, and TERMINATE. (4) allocates resources within the site;
MaxActive is a constant which is the maximum number (5) generates unique (world-wide) names;
of active sequences allowed. (6) checks for errors and assertion violations.
Every PLITS message contains two slots placed there We have briefly discussed the first three functions. The
by the system. The "From" slot contains the source of the fourth function, resource allocation within the site, is
message. The "About" slot contains the transaction key concerned with storage allocation and reclamation, sched-
associated with the message. (A dummy key is supplied uling of ready modules, etc. The fifth function is the
if the user has not provided one.) generation of unique names for modules and transaction
To obtain a Fibonacci stream, a user module will send keys (see Section 4.6). Error and assertion checking is
a message with Action field equal to "Start" and with the discussed in Section 4.8.
Each DHCP is an extension of its machine's operating
globally known constant transaction key "StartNew-
Sequence." The Fibonacci module sends the first element system. It performs four main functions (see Section 4.5
of a new sequence back along with a new transaction for more detail):
key, which will be used by the user module in each future (1) distributes messages among sites local to this ma-
request. chine;
(2) forwards messages to and from other machines;
Note the power of the selective receive in line 16. The
(3) starts and stops D JOBs, and provides access to
Fibonacci module has complete control over what it
responds to. Originally (line 16), it is listening for any other operating system services;
(4) checks for errors and assertion violations,
request for service. When a stream is opened, it adds that
stream to its set (line 28). When the maximum allowable Let us first consider the problem of setting up a DJOB.
number of active streams is reached, it turns deaf to new If there are two sites on the same machine with the same
requests for service (line 24). Finally, when a stream dies, representations, the DHCP only has to check that the use
it turns deaf to that stream and again allows new re- of public slot names is compatible--essentially the same
quests for service (lines 40-44). process as combining the externals of two load modules.
If there are several machines involved and there is an
4. DSYS--A Distributed System incompatibility in representation of a primitive data type,
With the PLITS style of programming as background then some conversion routines will have to be automati-
and a source of examples, we are developing an experi- cally invoked. The ARPA network voice protocol [Cohen,
mental system (DSYS) to support high-level distributed 1976] presents a good model of a scheme in which a
computing. DSYS will run on the seven computers in our dialog between machines is used to reconcile representa-
laboratory: four ALTOs, two Eclipses, and a PDP/10. It tion differences before messages containing data are
will provide facilities for defining and running PLITS dis- sent. All of this is fairly messy, but should only be neces-
tributed jobs (DJOBs). sary when a new PLITS language processor is brought
The remainder of this section outlines both our progress up on a machine. In the usual case, the standard conver-
to date and our present design ideas for DSYS. The dis- sions between sites will have been established and the
cussion in Sections 4.10 and 4.11 summarizes our goals negotiations between machines will be simple.
and states some questions that we find useful for DSYS When a PLITS message is sent by a module in a site, its
planning. destination is checked. If it is within a site, the site kernel
handles it; if not, it is given to the local DHCP. If the
destination is within another site on the same machine, it
is given to the kernel for that site; if not, the DHCP has it
forwarded to the appropriate machine. This is the job of
DHCP functions 1 and 2 above. To do this effectively re-
quires quite a lot of mechanism beneath the surface.
Problems faced include reliable transmission, flow con-
trol, error handling, and providing user services in a dis-
tributed operating system.

313
D job A !
/

> Link
<
Djob B

Machine 1 Machine 2
Figure 2. Example Overview of PUTS DJOBs

4.2 Implicit Connections 4.5 OSYS Organization


Our desire to allow free collaboration among modules DSYS is distributed among the operating systems in
has led to a design in which there are no explicit "con- our network. We refer to the DSYS component of each
nections." Modules that want to communicate do not operating system as the DSYS Host Control Program
need to explicitly establish a communication path (i.e., (DHCP). Each DHCP has two parts, a "DSYS Job Mana-
ger" (for distributed jobs) and a "DSYS Communications
connection) before communicating. All a module needs Manager" (DCM). This organization reflects the two
in order to send a message to another module is its name separate facilities of DSYS: operating system support
(see Section 4.6). If Module 1 receives Module 2's name
and services for PLITS DJOBs, and basic message com-
in a message from Module 3, Module 1 can send a mes- munication in the PLITS style. The former facility is actu-
sage to Module 2 without opening a connection first, or ally an application of the letter. That is, each local Job
checking to see if a (1-2) connection already exists. Also, Manager uses a message communication protocol with
Module 1 does not have to close the connection when it the other Job Managers in the network to get its work
finishes. Thus, programs can deal with intra-site com- done. The DCM provides the same communication ser-
munication and inter-site communication uniformly. It is vices for the local Job Manager as it does for local PLITS
the responsibility of the DHCP to route messages to for-
(user) sites.
eign receivers, and to arrange for flow control and reli-
able transmission over a machine-to-machine communi-
4.5.1 The DSYS Job Manager
cation network if necessary. From the point of view of the
high-level logic of its task, the user program is not con- Each Job Manager:
(1) provides services for D JOBs (i.e., start, stop, access
cerned with the location of modules.
to operating system services [see Section 4.7]);
Traditional communication systems (e.g., ARPANET)
(2) remembers which services are allocated to which
use connections as loci for strategies that do resource
D JOBs, and which site is the controlling site for
management (e.g., buffer space allocation, flow control)
and error handling. The need to develop such facilities each D JOB;
(3) arranges to recover resources used by such services
for our scheme has provided an opportunity to take a
when a D JOB finishes.
fresh look at these issues. Some of our present ideas are
outlined below. In addition, each Job Manager keepstrack (for each D JOB
whose controlling site is local) of the other computers
4.3 Resource Management that are involved with the D JOB. That is, the Job Mana-
Resource management is the central design problem ger for the controlling site of a D JOB knows which other
for any operating system. In a multi-processing, message- DHCPs to notify when the D JOB finishes (or dies).
based system such as RIG, the basic resources to be
managed include: buffer space for messages end system 4.5.2 The DSYS Communications Manager (DCM)
data structures; and processor cycles. By controlling the The DCM on each computer is responsible for forward-
allocation of these resources, RIG controls the rate of ing messages to and from modules on other computers.
message flow between senders and receivers. DSYS uses The DCM accepts messages for forwarding to foreign
similar methods to manage resources in a distributed modules from local site kernels, and passes messages
computing environment. being forwarded from other DCMs to local site kernels.
An operating system that provides services (e.g,, disk In addition to dealing with communications I/O de-
files, ARPANET, printer, terminal I/O, text editor, facili- vices, the DCM controls the flow of messages from local
ties for running user programs) for multiple users must senders on the basis of the rate of acceptance by intended
keep track of which services are in use by which users. receivers ("back pressure") and the availability of local
That is, another aspect of the resource management prob- buffer space. The DCM also provides a "reliable trans-
lem is the provision of such services to user jobs, and the mission" service.
ability to recover a job's resources when it finishes. Like the site kernel, the DCM allocates buffer space for
messages on a "destination" basis. Each (receiving mod-
4.4 DJOBs ule, transaction) pair is considered a "destination" for
A distributed job (D JOB) which uses the facilities of messages, and DSYS maintains a data structure (the
the distributed system (DSYS) will in general use re- "destination descriptor") for it. Each such destination
sources on more than one computer. A D JOB might con- has its own allotment of buffer space for messages. This
sist of sites on several computers and use the services of space is not committed a priori, but is rather an estimate
several computers. For example, a distributed vision ap- (changeable) of how much of a backlog of messages
plication might consist of an image processing site on the should be allowed for the destination. The basic flow
PDP/IO, an interactive site on an ALTO, a site on the control mechanism is this: a sending module is kept sus-
Eclipse for managing the Grinnell color display, and file pended by its site kernel until space on the destination's
system services on the PDP/10 and on the Eclipse. One primary input queue becomes available. If the destina-
of the sites in each D JOB is the "controlling site" for the tion is local, the site kernel does the queue management
DJOB. In the example, the controlling site might be the
and flow control. If the destination is foreign, the DCM
one on the ALTO. The controlling site for a D JOB is re-
(i.e., the forwarder) does the queue management and
sponsible for initializing and terminating the D JOB and
for taking appropriate action when one of the other sites flow control.
of the D JOB fails. A D JOB is uniquely identified in DSYS
by the "name" of its controlling site (see Section 4.6).

314
A destination descriptor is a distributed data structure. 4.6 Names
The destination's site kernel has a portion, and each
There is a question of how to generate unique names
computer upon which there is at least one module send-
in a distributed system. If there were a central source of
ing messages to the destination has a portion (maintained
names, it might take a long time to get one, and the cen-
by its DCM). One can view the portions on foreign com-
tral source might be sometimes inaccessible. If each site
puters as queue extensions. The primary job of the DCM
created its own, there would either have to be a lot of
is to maintain this distributed data structure to support
handshaking or there would be a danger of duplications.
module-to-module c o m m u n i c a t i o n across c o m p u t e r
Our solution is simple and quite general: a name (in the
boundaries.
present design) is a 32-bit number, composed of four
fields--a computer number, an "incarnation number," a
Flow Control site number, and a "local module number." The com-
The DCM extends the basic flow control strategy that puter number uniquely identifies one of the computers
is used by site kernels to include flow control for mes- in our network. The incarnation number is used to dis-
sages to foreign modules. This is done by providing tinguish old incarnations of the operating system on the
(limited) local queue space for each foreign destination, indicated computer from the most recent one. DSYS uses
a mechanism for forwarding messages to foreign site this information to trap references to defunct operating
kernels, and a mechanism for communicating state in- system incarnations (see the discussion on error handling
formation about a destination from its site kernel to in Section 4.8). The site number identifies a site on the
(forwarding) DCMs. Thus, for message communication indicated computer, and the local module number identi-
to a foreign module, the DCM acts as (an agent for) the fies a module at the site. Thus, a DSYS module name
foreign site kernel. That is, the DCM makes it appear to uniquely identifies a module in the distributed system.
the sending site kernel as if it (the DCM) were the foreign One consequence of these definitions is that a given
site kernel. From the point of view of the sending module, module instance always resides on the same machine,
its site kernel responds uniformly to messages sent to a somewhat contrary to current fantasies about distributed
local module or to a foreign one: the SENDMESSAGE call computing. In our view, a module will be compiled to
returns (perhaps after a delay) with a code that specifies take full advantage of the hardware and software re-
either an error condition (see Section 4.8 below) or that sources of its machine. There will be equivalent modules
the message was posted for delivery. on various machines, and programs will be able to choose
between them, but each will have a distinct unique
Renable Transmlsslon name and machine of residence.
One of the special problems of network communica- DHCPs and site kernels have names too. If the site
tion is "reliable transmission." In general, messages sent number in a name is zero, the name identifies the DHCP
over a communication line may be lost, garbled, or du- on the indicated computer. If the site number is non-zero
plicated, and may arrive in a different order than they but the local module number is zero, the name identifies
were sent. A communication system can provide a reli- the indicated site kernel on the indicated computer.
able transmission service in any of several ways, all of The system uses the names of DHCPs and site kernels
which depend on feedback from the receiver to the in its protocols for connections, flow control, and reliable
sender. DSYS provides reliable transmission on an "end- transmission. That is, the distributed system uses mes-
to-end" basis, rather than between each pair of com- sages to get its work done, just as a user DJOB does.
puters along the way. The sending end is a (forwarding)
DCM, and the receiving end is a destination, i.e., a (mod- 4.7 Access to Services
ule, transaction) pair. It is the responsibility of the One of the tasks of the DSYS Host Control Program
receiving site kernel to remember for each of its destina- (OHCP) on each computer is to provide D JOBs with ac-
tions the state (i.e., message sequence number) of the cess to the services that the local operating system
message stream from each foreign D C M A "positive- provides. Typical services include file system access,
acknowledgment, retransmission" protocol for reliable ARPANET, printer, TELNET, text editor, and facilities
transmission is used between receiving site kernels and for creating and running a site as part of a D JOB. Each
forwarding DCMs. This is exactly the communication DHCP is equipped with a built-in module called "Request-
path needed for end-to-end flow control! It turns out that Fielder" which provides D JOBs with such access. There
the mechanisms for end-to-end flow control can be used is a DSYS call that allows any module to find the name
(with very small additional cost) for reliable transmission of the "RequestFielder" module at any DHCP. This is one
as well. If the transmission line error rate is low enough application of a general "name service" facility within
(our experience indicates that it is), the expense of an DSYS (not described here). A special message protocol is
(occasional) end-to-end retransmission is offset by the used to arrange with a DSYS RequestFielder for a service.
advantages of a simple and flexible low-level (i.e., com-
puter-to-computer) protocol. 4.8 Error Handling
Many of the speciai problems of distributed computing
Computer-to-Computer Flow Control relate to handling errors. In a conventional program-
A separate (but related) issue is flow control between ming style (i.e., not message-based), subroutines are
adjacent computers. If a receiving computer cannot keep used as the primary structuring mechanism. The usual
up with a sending one, it can either discard information assumption about subroutines is that they are available
when buffer space is exhausted or somehow ask the when called, and that they function properly. The analog
sender to "wait a while." The former strategy has the of a subroutine call in a message-based programming
advantage of simplicity, but causes information to be style is the "handshake": send a message, wait for a
lost, thus effectively increasing the communication line reply. In general, of course, message activity can be pipe-
error rate. It is usually a bad idea to increase the effec- lined or multiplexed, and the relationships between in-
tive line error rate to compensate for too simple a design. coming and outgoing messages can be much richer than
DSYS uses a straightforward version of the "wait a while" a direct response to each query.
idea to control flow between computers. The basic strat-
egy is the same for computer-to-computer flow control 4.8.1 Errors Unlque to Distributed Computing
as it is for end-to-end flow control on DSYS connections: In addition to all of the ways in which a subroutine can
when the receiver finds that its remaining buffer space is fair (bugs, bad specifications, name conflicts, etc.), mes-
.critically low, it sends a "stop sending" request to the sages can fail in (at least) the following other ways:
sender. As soon as enough buffer space becomes avail- 1. "Synchronous" errors
able, it sends a "continue sending" request. Enough extra Synchronous errors are those that can be detected
space at the receiver is allocated to accommodate data when a module executes a system call to send or
that arrives while the "stop sending" message is in tran- receive a message. Synchronous errors arise because
sit. A sender that has been stopped will resume sending call parameters are bad in some way. Such errors
after a time if no "continue" message is received (it may can be reported as "failure" of the system call, Ex-
have been lost). In pathological cases, data will be lost, amples:
and end-to-end retransmission will be necessary. Once (a) Specified site (or module) does not exist.
again, we assume that this will happen very infrequently, (b) Specified computer is down.
and that the parameter values for the space and time (c) Incarnation number of specified computer oper-
thresholds can be adjusted for an acceptable trade-off ating system is out of date.
between minimal expected Iossage and efficient normal
operation.

315
2. "Asynchronous" errors What can be done to provide systematic conventions
Asynchronous errors are not immediately de- for dealing with the errors and exceptional conditions
tectable as problems with the parameters to a sys- that occur in distributed computations? In particular,
tem call, and can occur at any time. Examples: how can such a system be made robust? What can be
(a) A message that was previously queued for de- done to maintain the integrity of a distributed system
livery couldn't be delivered after all (because of (and of innocent user jobs) when either a user job or a
a, b, or c above). part of the system fails?
(b) A "demon" has discovered a problem. A "de- How should "user job" be defined? What services
mon" is a service provided by DSYS whereby a should the distributed system provide, and how should
module can request explicit (asynchronous) noti- user jobs deal with the distributed system? What are the
fication (via an EMERGENCY message) when a special problems of user jobs in such an environment,
specified (other) module ceases to exist. and how can the distributed system help?
(c) A foreign site or service that was being used by
this D JOB terminated abnormally. 4.11.2 Longer-Range Questions
Unfortunately, there are more opportunities for errors HOW can performance be monitored and distributed
to occur in systems for distributed computing than in computations (and systems) be tuned? In general, how
conventional systems. The programmer of a distributed should the programmer think about an execution of his
computation must therefore give more thought to the computation? What tools can the system provide to help
problem of dealing with errors and "exceptional condi- in this regard? Such tools should also be helpful to the
tions" to provide an adequately robust program. Sys- system designer.
tematic conventions for how to deal with such errors How can such a system be made reliable? Are there
should help, and we are developing some ideas along practical descriptive techniques for the protocols of real
these lines for DSYS. distributed computations? How can such a description
be used effectively to uncover design problems or gen-
4.8.2 Emergency Messages erate tests? How much of this can be automated?
The present DSYS design provides "emergency" mes-
sages as the mechanism that the system uses to report
References
asynchronous errors to a module. If a module has an
emergency message on its input queue, the system will Ball, et al., "RIG, Rochester's Intelligent Gateway: Sys-
include a notice that there is a pending emergency mes- tem Overview," TR5, Computer Science Department,
sage as part of the normal response to any call that sends University of Rochester, April 1976; also appeared in
or receives a message. This is only an initial attempt at IEEE Transactions on Software Engineering, Vol. SE-2,
providing a uniform mechanism for errors and other No. 4, December 1976.
asynchronous conditions. Cohen, D., "Specifications for the Network Voice Proto-
col," ISI/RR-75-39, Information Sciences Institute,
4.9 Implementation University of Southern California, March 1976.
Foster, John D., "Distributive Processing for Banking,"
An experimental version of DSYS is up and working in
Datamation, July 1976.
our local network. There are experimental DHCPs for
Hoare, C. A. R., "Communicating Sequential Processes,"
the ALTOs and for the PDP/IO, and the Eclipse DHCP is
Computer Science Department, Queen's University,
in the final stages of debugging. Each DHCP has most of
Belfast, March 1977.
a Communications Manager, a name server, and a rudi-
Hoare, C. A. R. and Wirth, N., "An Axiomatic Definition of
mentary Job Manager (presently a RequestFielder that
the Programming Language Pascal," ACTA Informatica,
provides file service).
Vol. 2, 1973.
Rovner, P. D., working paper, to appear as TR22, Com-
4.10 Summary
puter Science Department, University of Rochester,
There is a rapidly growing awareness [Hoare, 1977] 1977.
that the paradigm of a collection of communicating se-
quential processes is a useful and powerful concept for
solving problems and for developing computer systems.
In the usual way, progress requires the development Of
concrete systems which both test ideas and lead to new
ones.

Our work on DSYS is motivated by the requirements


of PLITS and by our experience with RIG. Our desire
to provide flexible communication facilities for user jobs
in a distributed operating system has led us to take a
fresh look at some of the problems of distributed com-
puting. In particular, we are developing a scheme th~tt
provides both a uniform user view of inter-module com-
munication and a flexible system view of resource man-
agement.
Further, we are developing the idea of a distributed
user job, and designing mechanisms for handling errors
and exceptional conditions in distributed systems. At the
low level, we are working on communication protocols
that use end-to-end flow control and reliable transmis-
sion, allow fine control over buffer space allotments for
arriving messages, and provide detailed feedback for in-
telligent flow control when such information is available.

4.11 DSYS Research Questions


TO help guide the work on DSYS, we find it useful to
express design goals as questions. The present collection
of such questions is outlined below.

4.11.1 Medium-Range Questions


What kind of a system is required to support a program-
ming methodology in which sequential processes ("mod-
ules") communicate via messages? How can such a
system be designed to present a uniform user view of
inter-module communication, independent of whether
the communicating modules run on the same computer?

316

You might also like