0% found this document useful (0 votes)

16 views4 pages

Group Communication

The document discusses the concept of groups in both human society and distributed computing systems, emphasizing their role in enhancing efficiency and coordination for common purposes. It explores various models of process failures, particularly crash and arbitrary failure models, and the implications for group communication protocols. The document also highlights ongoing research and prototype systems aimed at improving group management and communication in diverse network environments.

Uploaded by

smaugqwer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views4 pages

Group Communication

Uploaded by

smaugqwer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Com

Group David Powell, Guest Editor

G
roups are ubiquitous in human groups could be limited to the convenience of collec-

society. We often use collective tively designating a set of processes1 using a common

names for groups of people as a name or address. Such facilities are already offered in

convenient means for referring many local-area networks (LANs), and we are all

to or addressing some part of the population as if it accustomed to using Internet news groups or mailing

were a single entity, like a school class, an age group, lists. The full benefits of the group concept, however,

or a social category. People get together in groups can be reaped only if we know how to set up and coor-

whenever concerted action can be expected to pro- dinate groups of processes that work together to ful-

vide gains in efficiency or to improve the chances of fill a common purpose, like sharing a computational

success in some endeavor, like a road crew, a military load, increasing performance, or providing a fault-

platoon, or a research team. tolerant service. This special section presents some of

Similarly, groups can be used in distributed com- the current ideas on how such groups can be created

Illustration: Will Terry

puting systems to help master the complexity of large and managed.

applications or to help provide non-functional prop- 1

The term “process” is used here for simplicity of expression. Of course,
many different sorts of computing entities can be considered in groups, such
as physical processors, database servers, and sub-networks of a larger com-
erties, such as availability or security. Computation munication network.
munication
What facilities or semantics a group management model is appropriate if the probability of less well-

or group communication service should provide is behaved failures can be neglected. A crash failure

still the subject of much debate within the research model might be appropriate, for example, for a gen-

community. The ease with which a given group’s ser- eral-purpose computing network in which the most

vice semantics can be provided or, indeed, whether or common problem is the unavailability of certain sites.

not such a service can be implemented at all, depends In such an environment, any service that can tolerate

heavily on what can be said or assumed about the process crash failures provides a useful improvement

computation environment in which the service is to be in dependability over one that cannot tolerate any fail-

provided. The most important assumptions concern ures at all. A crash failure model can also be appro-

how processes can fail and how well they communi- priate for ultra-dependable distributed systems if

cate with each other. nodes have extensive built-in self-checking.

The strongest and most common assumption At the other end of the failure spectrum is the arbi-

about process failures is the crash failure model—a trary failure model in which no restrictive assumptions

process acts in full accordance with its specification are made about the way processes can fail. For exam-

until it suddenly ceases all activity. A crash failure ple, they could fail by sending erroneous messages, by
saturating the network, or even by colluding with people. Imagine that a committee meeting is con-
other faulty processes to bring down the system. This vened and attendees gather round a table in a meet-
is a “worst-case” failure model that frees the system ing room. It’s a long meeting, so the attendees get
designer of any obligation to justify the realism of a very tired, and some of them doze off now and again.
more restrictive assumption. It is particularly appro- However, when they are awake they can easily see
priate for building ultra-dependable systems or for who else is awake, since all are sitting round the same
dealing with processes under the control of a mali- table. With a little organization (a protocol), those
cious intruder. Unfortunately, protocols that can tol- that stay awake can (for example) take turns to
erate such arbitrary failures require more redundancy address the meeting, and they should all know who
and more messages than if they were designed to tol- else is awake, what they heard, and what they should
erate only crash failures; they are also much more dif- have learned.
ficult to design and validate. This analogy illustrates several points:
The strongest assumption that can be made about
inter-process communication is that any message sent • There are at least three sorts of groups to be con-
by a correct process to another correct process is sidered: the people eligible to attend the meeting
always received within a given delay—the so-called syn- (the committee); those who attend the meeting
chronous communication assumption. The nice thing (the attendees); and those who participate in the
about this assumption is that one process can reliably committee’s work at a given instant because they
detect whether another process is alive just by sending were awake (the participants).
it a query and waiting a known bounded time for a • The meeting room setting is analogous to the syn-
response. Unfortunately, in a system where processes chronous communication model discussed earlier
must communicate over a shared network, such per- in which communication is reliable and timely, and
fection is guaranteed only with a certain probability, it is easy for people (processes) to detect whether
by using multiple communication paths and/or mes- some of them have fallen asleep (crashed). Thus,
sage retransmissions. Often, however, it is impossible they can make strong statements about who is
to give even a probabilistic guarantee, since the actual awake (the current membership of the group of
load on the network may be totally unpredictable. participants), what they have heard (the messages
The opposite approach is to consider that there is delivered), and what they have all learned
no known limit on the time it takes for a message to (changes to internal state resulting from the order
reach its destination. Protocols designed without of message delivery).
knowledge of time limits could easily be ported from • It shows that we must also worry about how new
one environment to another, since they would oper- attendees (people who join the meeting after it has
ate correctly whatever the performance of the net- started) and recovered attendees (participants who
work. Unfortunately, with such totally asynchronous fall asleep and later awaken) are brought up to
communication, a process cannot decide whether date with what has been decided while they were
another process has crashed or whether its query or absent or asleep.
the expected response is still on its way across the net-
work. In practice, it is essential to introduce some Now let us consider another setting for the com-
notion of time so that processes know how long to wait mittee meeting. Let us suppose that, instead of meet-
for an expected response before suspecting that the ing round a table in a quiet room, the committee tries
originator of the response might have failed. to conduct its business in a large and very busy hotel
Note, however, that suspicion of a crash is not the lobby. Because of the hustle and bustle, the attendees
same as detection of a crash; the suspected process cannot always see or talk directly to one another. Even
might still be perfectly healthy. It is easy to see, there- when one attendee can see another attendee, it’s not
fore, that it is impossible to achieve any sort of deter- certain that the latter is looking at the former. Some
ministic agreement between correct processes. The of the attendees could fall asleep or go home without
best that can be done in such an environment is to the others ever noticing. It’s quite plain that in this
ensure that certain safety properties are guaranteed setting the committee has a much harder job to
whatever the communication delays or safety proper- process its agenda in some consistent way. We can sup-
ties, and that useful progress is made whenever the pose that the attendees try to gather together to get
network performs well enough for processes to com- some work done, but to do so they also have to reach
municate with each other in a timely manner. some sort of agreement about who they all think are
It might be instructive at this point to draw a few in their particular gathering (for example, to decide
analogies between groups of processes and groups of who will act as chair).

In practice, it is essential to
introduce some notion of time so that processes know
how long to wait for an expected response.

52 April 1996/Vol. 39, No. 4 COMMUNICATIONS OF THE ACM

Communication
Group
As people drift in and out of one anoth- provides a totally ordered multicast service
er’s sight, they have to successively reach to application process groups. It is partic-
new decisions about who is in their gather- ularly suitable for supporting fault-toler-
ing. Furthermore, there could be many ant soft real-time applications. Totem is a
such gatherings in different parts of the scalable system built using a hierarchy of
lobby at the same time. In this case, differ- group communication protocols for
ent gatherings of attendees could end up groups of processors on a LAN, for groups
making conflicting decisions. The only way of interconnected LANs, and for groups
to avoid such conflicts is to impose a rule of application processes.
stating, for example, that only gatherings with a In their article on the Transis system, Danny Dolev
majority of committee members are allowed to make and Dalia Malki from the Hebrew University of
any decisions. Alternatively, the committee could Jerusalem consider some of the difficult problems that
attempt to reconcile conflicting decisions whenever arise in diverse network settings. The authors discuss
two or more gatherings are able to merge and form a how different components of a partitioned network
new gathering. can operate autonomously and then merge operations
This meeting-in-a-crowded-lobby scenario is analo- when they become reconnected (such partitioned
gous to the asynchronous communication model dis- operation is of special interest in mobile applications).
cussed earlier. It illustrates several points: They also consider the need for different protocols for
fast local communication and for the unavoidably slow-
• Since people (processes) cannot reliably detect er communication between local clusters.
whether some of them are absent or, equivalently, Most articles in this special section assume that
have fallen asleep (crashed), they cannot decide processes fail only by crashing or by just being slow.
exactly who is attending the meeting. The exception is the short article by Mike Reiter from
• It is impossible to prevent the attendees from split- AT&T Bell Laboratories, sketching some of the novel
ting into separate sub-meetings (gatherings). Such group communication ideas embodied in the Rampart
gatherings or groups of participants are sometimes system to provide tolerance for malicious intrusion.
called “views” of the meeting, since the participants Here, faulty processes can collude and fail in quite
consider that their gathering is the meeting. This arbitrary fashion, and groups are used to mask the
logical partitioning of “meetings” is unavoidable in malicious actions of a minority of the group members.
asynchronous settings, so asynchronous group pro- There are many different ways of defining and
tocols must be able to deal with it. using process groups. Features that may be useful in
• When an attendee joins an existing gathering (and one setting may be impediments in others. The Horus
thus forms a new, larger gathering), the partici- system, described in the article by Robbert van
pants have to work out what this new participant Renesse, Ken Birman, and Silvano Maffeis of Cornell
already knows about the committee’s work and University, aims to provide a very flexible environ-
bring him or her up to date. This reporting has to ment for system programmers to configure group pro-
be done either if the new participant just woke up tocols specifically adapted to the problem at hand.
from a nap (recovered from a crash) or if he or The last two articles do not describe specific sys-
she lost touch with an earlier gathering (became tems but address instead the fundamentals of group
disconnected). In an asynchronous setting, it’s dif- communication. André Schiper (EFPL, Switzerland)
ficult to tell the difference. If the participants are and Michel Raynal (IRISA, France) trace some inter-
not going to have to constantly tell “new” partici- esting directions for future research into various sorts
pants everything they have done since the begin- of process groups. In particular, they consider the dif-
ning of time, they have to remember (keep on ferences between groups for replicated, fault-tolerant
stable storage) what they did in earlier gatherings, objects and groups for implementing atomic transac-
and the protocol has to ensure that all work done tions. They propose a multicast primitive for carrying
in successive gatherings is done in some consistent out a specific class of transactions on a set of replicat-
fashion. ed, fault-tolerant objects.
Flaviu Cristian of the University of California, San
* * * Diego, compares the properties of group communica-
tion protocols for the synchronous and asynchronous
Over about the last decade, there has been consider- communication models. This comparison underlines
able research into the management of process groups the advantages and drawbacks of both models and
and protocols for communication within and between should be considered essential reading for anyone
such groups. The articles in this special section pre- interested in group communication. C
sent some of the prototype systems and current
research activities typical of the prevalent ideas in this
DAVID POWELL is Directeur de Recherche CNRS at the Labora-
fascinating area. toire d’Analyse et d’Architecture des Systèmes where he works in
The article by Louise Moser, Michael Melliar- the Dependable Computing and Fault Tolerance Research Group.
Smith, and their team at the University of California, Current Address: LAAS-CNRS, 7 Avenue du Colonel Roche, 31077
Santa Barbara, describes the Totem system, which Toulouse, France; email: [email protected]

COMMUNICATIONS OF THE ACM April 1996/Vol. 39, No. 4 53

Inter-Process Communication and Synchronization
No ratings yet
Inter-Process Communication and Synchronization
43 pages
Distributed System Lab
100% (1)
Distributed System Lab
36 pages
Time and Global States PDF
No ratings yet
Time and Global States PDF
18 pages
Cluster Computing (Unit 1-5)
No ratings yet
Cluster Computing (Unit 1-5)
74 pages
Fault
No ratings yet
Fault
101 pages
Sem 05 ECE 2007 Batch
No ratings yet
Sem 05 ECE 2007 Batch
225 pages
Module5 - Part 2
No ratings yet
Module5 - Part 2
49 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Chapter Two - Distributed System-1
No ratings yet
Chapter Two - Distributed System-1
30 pages
Driver HP LaserJet 107W
No ratings yet
Driver HP LaserJet 107W
2 pages
Lec 09 S
No ratings yet
Lec 09 S
23 pages
Inter-Process Communication: Dr. Talal Ashraf Butt Talal - Ashraf@aue - Ae
No ratings yet
Inter-Process Communication: Dr. Talal Ashraf Butt Talal - Ashraf@aue - Ae
27 pages
C1 C2 C3 Review DCmodel GlobalStates TimeCausality
No ratings yet
C1 C2 C3 Review DCmodel GlobalStates TimeCausality
81 pages
Se342: Distributed Computing: Lecture # 03-b Fundamental Models
No ratings yet
Se342: Distributed Computing: Lecture # 03-b Fundamental Models
26 pages
Lect 2
No ratings yet
Lect 2
48 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Reliable Communication in The Presence of Failures
No ratings yet
Reliable Communication in The Presence of Failures
17 pages
OS Unit 1st
No ratings yet
OS Unit 1st
12 pages
ProcessResilience FaultTolerance Recovery
No ratings yet
ProcessResilience FaultTolerance Recovery
21 pages
Group Communication
No ratings yet
Group Communication
19 pages
Unit 3 Coordinaton and Agreement Algorithm
No ratings yet
Unit 3 Coordinaton and Agreement Algorithm
119 pages
Samsung Electronics
100% (1)
Samsung Electronics
31 pages
Unit 1 Part 2
No ratings yet
Unit 1 Part 2
37 pages
Distributed Operating Systems: Andrew S. Tanenbaum
No ratings yet
Distributed Operating Systems: Andrew S. Tanenbaum
5 pages
Operating System CSET209: Inter Process Communication (Ipc)
No ratings yet
Operating System CSET209: Inter Process Communication (Ipc)
45 pages
Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Extended Abstract)
No ratings yet
Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Extended Abstract)
5 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Chapter 6-Synchronozation
No ratings yet
Chapter 6-Synchronozation
24 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
37 pages
Lec 3
No ratings yet
Lec 3
30 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
P1 DS Merged
No ratings yet
P1 DS Merged
50 pages
Fault Tolerance FDCC
No ratings yet
Fault Tolerance FDCC
76 pages
Dos Notes
No ratings yet
Dos Notes
18 pages
EBM2.1 MANUAL For Compute and Tablet
No ratings yet
EBM2.1 MANUAL For Compute and Tablet
40 pages
03 SystemModels Fundamental
No ratings yet
03 SystemModels Fundamental
8 pages
06 Synchronization
No ratings yet
06 Synchronization
52 pages
Chapter 6 Synchronization
No ratings yet
Chapter 6 Synchronization
50 pages
Week 04
No ratings yet
Week 04
49 pages
Technical Spec ZJ40BD
100% (1)
Technical Spec ZJ40BD
63 pages
6.2 Lamport 1 Logical
No ratings yet
6.2 Lamport 1 Logical
27 pages
Document 32distributed Computing Concept
No ratings yet
Document 32distributed Computing Concept
16 pages
Computer Science 425 Distributed Systems: CS 425 / ECE 428
No ratings yet
Computer Science 425 Distributed Systems: CS 425 / ECE 428
34 pages
Coordination and Agreement: Check Point Threat Extraction Secured This Document
No ratings yet
Coordination and Agreement: Check Point Threat Extraction Secured This Document
18 pages
Chapte Four DS
No ratings yet
Chapte Four DS
37 pages
3 Synchronization
No ratings yet
3 Synchronization
45 pages
F
No ratings yet
F
124 pages
Consensus Failure
No ratings yet
Consensus Failure
79 pages
Coordination: CE32204 - Distributed System Presented By: Eka Stephani Sinambela Institut Teknologi Del
No ratings yet
Coordination: CE32204 - Distributed System Presented By: Eka Stephani Sinambela Institut Teknologi Del
16 pages
Modicon LMC078: Motion Controller Programming Guide
No ratings yet
Modicon LMC078: Motion Controller Programming Guide
276 pages
FailureDetector ds14
No ratings yet
FailureDetector ds14
33 pages
Lecture 4 - Failure Detection and Membership
No ratings yet
Lecture 4 - Failure Detection and Membership
18 pages
Distributed Computing: Farhad Muhammad Riaz
No ratings yet
Distributed Computing: Farhad Muhammad Riaz
18 pages
Learning Material 1 in MMW, Ch3
No ratings yet
Learning Material 1 in MMW, Ch3
16 pages
Chapter 6
No ratings yet
Chapter 6
31 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
DistributedComputing (University) PartA
No ratings yet
DistributedComputing (University) PartA
19 pages
Group Communications: (A) Fault Tolerance Based On Replicated Servers
No ratings yet
Group Communications: (A) Fault Tolerance Based On Replicated Servers
10 pages
08 GT I9070 Tshoo 7
No ratings yet
08 GT I9070 Tshoo 7
49 pages
KT88 3200 Opration 090911
No ratings yet
KT88 3200 Opration 090911
46 pages
Unit5 Compressed Fault Tolerance - PACE
No ratings yet
Unit5 Compressed Fault Tolerance - PACE
11 pages
MX-CPG Bim Impplan Rev0
No ratings yet
MX-CPG Bim Impplan Rev0
17 pages
AOS PPT Unit 1,2 - 20241112 - 222203 - 0000
No ratings yet
AOS PPT Unit 1,2 - 20241112 - 222203 - 0000
20 pages
Fault System One
No ratings yet
Fault System One
19 pages
When Should You Use The Spearman's Rank-Order Correlation?
No ratings yet
When Should You Use The Spearman's Rank-Order Correlation?
6 pages
Unit 4
No ratings yet
Unit 4
11 pages
Mark-VIe Power Supply Specificationsforprojectcl
No ratings yet
Mark-VIe Power Supply Specificationsforprojectcl
6 pages
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
No ratings yet
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
32 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
0417 s13 QP 31
No ratings yet
0417 s13 QP 31
8 pages
DC 2marks Answer
No ratings yet
DC 2marks Answer
11 pages
QA Bible Updated
No ratings yet
QA Bible Updated
95 pages
Slides Erp - SCM
No ratings yet
Slides Erp - SCM
79 pages
Diagnostic Systematic Reviews Road Map V3
No ratings yet
Diagnostic Systematic Reviews Road Map V3
2 pages
Codigo de Barras EP2000
No ratings yet
Codigo de Barras EP2000
48 pages
Vio 3 Service
No ratings yet
Vio 3 Service
84 pages
Scribbed 223751127-Chapter-12-Enhanced-Entity-Relationship-Modeling PDF
No ratings yet
Scribbed 223751127-Chapter-12-Enhanced-Entity-Relationship-Modeling PDF
16 pages
Delcam - PowerMILL 2015 R2 WhatsNew EN - 2015
No ratings yet
Delcam - PowerMILL 2015 R2 WhatsNew EN - 2015
71 pages
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
No ratings yet
ChatLog Indore ML Python Batch 2 2021 - 07 - 21 15 - 00
22 pages
Chapter 6 Synchronization
No ratings yet
Chapter 6 Synchronization
37 pages
User Manual: ATEQ D570
No ratings yet
User Manual: ATEQ D570
120 pages
Code:: Bahria University, Islamabad Campus Short Assignment (Quiz 01) (Fall 2020 Semester)
No ratings yet
Code:: Bahria University, Islamabad Campus Short Assignment (Quiz 01) (Fall 2020 Semester)
4 pages
Netflix On AWS
No ratings yet
Netflix On AWS
6 pages
CE 212 Digital Systems Ch4
No ratings yet
CE 212 Digital Systems Ch4
37 pages
Battery Capacity and Battery Backup Time Calculation
No ratings yet
Battery Capacity and Battery Backup Time Calculation
3 pages
Java For Selenium
No ratings yet
Java For Selenium
45 pages
5th Sem
No ratings yet
5th Sem
1 page
OKR - Strategy Development and Implementation in an Agile Environment: Introduction to the World's Most Successful Framework for Strategy Execution in the 21st Century
From Everand
OKR - Strategy Development and Implementation in an Agile Environment: Introduction to the World's Most Successful Framework for Strategy Execution in the 21st Century
Martin J. Leopold
5/5 (1)
Foundations of Systems Sensing: An Exploratory Guide to the Sensing Journey Method
From Everand
Foundations of Systems Sensing: An Exploratory Guide to the Sensing Journey Method
Nancy Zamierowski
No ratings yet
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet

Group Communication

Uploaded by

Group Communication

Uploaded by

Com

Group David Powell, Guest Editor

Illustration: Will Terry

applications or to help provide non-functional prop- 1

cate with each other. nodes have extensive built-in self-checking.

52 April 1996/Vol. 39, No. 4 COMMUNICATIONS OF THE ACM

COMMUNICATIONS OF THE ACM April 1996/Vol. 39, No. 4 53

You might also like