Fault Tolerant Message Passing Distributed Systems An Algorithmic Approach Michel Raynal Download PDF
Fault Tolerant Message Passing Distributed Systems An Algorithmic Approach Michel Raynal Download PDF
com
https://textbookfull.com/product/fault-
tolerant-message-passing-distributed-systems-
an-algorithmic-approach-michel-raynal/
textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/adaptive-and-fault-tolerant-
control-of-underactuated-nonlinear-systems-1st-edition-
jiangshuai-huang/
https://textbookfull.com/product/advanced-methods-for-fault-
diagnosis-and-fault-tolerant-control-steven-x-ding/
https://textbookfull.com/product/intelligent-video-surveillance-
systems-an-algorithmic-approach-first-edition-maheshkumar-h-
kolekar/
https://textbookfull.com/product/fault-tolerant-systems-2nd-
edition-koren-d-sc-electrical-engineering-israel-institute-of-
technology-haifa/
Robust Integration of Model-Based Fault Estimation and
Fault-Tolerant Control Jianglin Lan
https://textbookfull.com/product/robust-integration-of-model-
based-fault-estimation-and-fault-tolerant-control-jianglin-lan/
https://textbookfull.com/product/robust-and-fault-tolerant-
control-neural-network-based-solutions-krzysztof-patan/
https://textbookfull.com/product/computational-network-science-
an-algorithmic-approach-1st-edition-hexmoor/
https://textbookfull.com/product/advances-in-gain-scheduling-and-
fault-tolerant-control-techniques-1st-edition-damiano-rotondo-
auth/
https://textbookfull.com/product/bio-inspired-fault-tolerant-
algorithms-for-network-on-chip-1st-edition-muhammad-athar-javed-
sethi-author/
Michel Raynal
Fault-Tolerant
Message-Passing
Distributed
Systems
An Algorithmic Approach
Fault-Tolerant Message-Passing Distributed Systems
Michel Raynal
Fault-Tolerant
Message-Passing
Distributed Systems
An Algorithmic Approach
Michel Raynal
IRISA-ISTIC Université de Rennes 1
Institut Universitaire de France
Rennes, France
Parts of this work are based on the books “Fault-Tolerant Agreement in Synchronous Message-
Passing Systems” and “Communication and Agreement Abstractions for Fault-Tolerant Asynchro-
nous Distributed Systems”, author Michel Raynal, © 2010 Morgan & Claypool Publishers (www.
morganclaypool.com). Used with permission.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Je suis arrivé au jour où je ne me souviens plus quand j’ai cessé d’être immortel.
In Livro de Crónicas, António Lobo Antunes (1942)
1
French: Mais j’ai déjà fourni une vaste carrière, il est temps de dételer les chevaux tout fumants.
English: But now I have traveled a very long way, and the time has come to unyoke my steaming horses.
v
vi Preface
What is distributed computing? Distributed computing was born in the late 1970s when researchers
and practitioners started taking into account the intrinsic characteristic of physically distributed sys-
tems. The field then emerged as a specialized research area distinct from networking, operating sys-
tems, and parallel computing.
Distributed computing arises when one has to solve a problem in terms of distributed entities
(usually called processors, nodes, processes, actors, agents, sensors, peers, etc.) such that each entity
has only a partial knowledge of the many parameters involved in the problem that has to be solved.
While parallel computing and real-time computing can be characterized, respectively, by the terms
efficiency and on-time computing, distributed computing can be characterized by the term uncertainty.
This uncertainty is created by asynchrony, multiplicity of control flows, absence of shared memory
and global time, failure, dynamicity, mobility, etc. Mastering one form or another of uncertainty is
pervasive in all distributed computing problems. A main difficulty in designing distributed algorithms
comes from the fact that no entity cooperating in the achievement of a common goal can have an
instantaneous knowledge of the current state of the other entities, it can only know their past local
states.
Although distributed algorithms are often made up of a few lines, their behavior can be difficult
to understand and their properties hard to state and prove. Hence, distributed computing is not only
a fundamental topic but also a challenging topic where simplicity, elegance, and beauty are first-class
citizens.
Why this book? In the book “Distributed algorithms for message-passing systems” (Springer, 2013),
I addressed distributed computing in failure-free message-passing systems, where the computing enti-
ties (processes) have to cooperate in the presence of asynchrony. Differently, in my book “Concurrent
programming: algorithms, principles and foundations” (Springer, 2013), I addressed distributed com-
puting where the computing entities (processes) communicate through a read/write shared memory
(e.g., multicore), and the main adversary lies in the net effect of asynchrony and process crashes
(unexpected definitive stops).
The present book considers synchronous and asynchronous message-passing systems, where pro-
cesses can commit crash failures, or Byzantine failures (arbitrary behavior). Its aim is to present in a
comprehensive way basic notions, concepts and algorithms in the context of these systems. The main
difficulty comes from the uncertainty created by the adversaries managing the environment (mainly
asynchrony and failures), which, by its very nature, is not under the control of the system.
A quick look at the content of the book The book is composed of four parts, the first two are on
communication abstractions, the other two on agreement abstractions. Those are the most important
abstractions distributed applications rely on in asynchronous and synchronous message-passing sys-
tems where processes may crash, or commit Byzantine failures. The book addresses what can be done
and what cannot be done in the presence of such adversaries. It consequently presents both impossi-
bility results and distributed algorithms. All impossibility results are proved, and all algorithms are
described in a simple algorithmic notation and proved correct.
• Parts on communication abstractions.
– Part I is on the reliable broadcast abstraction.
Preface vii
• Parts on agreement.
– Part III is on agreement in synchronous systems.
– Part IV is on agreement in asynchronous systems.
On the presentation style When known, the names of the authors of a theorem, or of an algorithm,
are indicated together with the date of the associated publication. Moreover, each chapter has a bib-
liographical section, where a short historical perspective and references related to that chapter are
given.
Each chapter terminates with a few exercises and problems, whose solutions can be found in the
article cited at the end of the corresponding exercise/problem.
From a vocabulary point of view, the following terms are used: an object implements an abstrac-
tion, defined by a set of properties, which allows a problem to be solved. Moreover, each algorithm
is first presented intuitively with words, and then proved correct. Understanding an algorithm is a
two-step process:
• First have a good intuition of its underlying principles, and its possible behaviors. This is nec-
essary, but remains informal.
• Then prove the algorithm is correct in the model it was designed for. The proof consists in a
logical reasoning, based on the properties provided by (i) the underlying model, and (ii) the
statements (code) of the algorithm. More precisely, each property defining the abstraction the
algorithm is assumed to implement must be satisfied in all its executions.
Only when these two steps have been done, can we say that we understand the algorithm.
Audience This book has been written primarily for people who are not familiar with the topic and
the concepts that are presented. These include mainly:
• Senior-level undergraduate students and graduate students in informatics or computing engineer-
ing, who are interested in the principles and algorithmic foundations of fault-tolerant distributed
computing.
• Practitioners and engineers who want to be aware of the state-of-the-art concepts, basic princi-
ples, mechanisms, and techniques encountered in fault-tolerant distributed computing.
Prerequisites for this book include undergraduate courses on algorithms, basic knowledge on operat-
ing systems, and notions on concurrency in failure-free distributed computing. One-semester courses,
based on this book, are suggested in the section titled “How to Use This Book” in the Afterword.
Origin of the book and acknowledgments This book has two complementary origins:
• The first is a set of lectures for undergraduate and graduate courses on distributed computing I
gave at the University of Rennes (France), the Hong Kong Polytechnic University, and, as an
invited professor, at several universities all over the world.
Hence, I want to thank the numerous students for their questions that, in one way or another,
contributed to this book.
• The second is the two monographs I wrote in 2010, on fault-tolerant distributed computing,
titled “Communication and agreement abstractions for fault-tolerant asynchronous distributed
viii Preface
I also want to thank my colleagues (in no particular order) A. Mostéfaoui, D. Imbs, S. Rajsbaum,
V. Gramoli, C. Delporte, H. Fauconnier, F. Taı̈ani, M. Perrin, A. Castañeda, M. Larrea, and Z. Bouzid,
with whom I collaborated in the recent past years. I also thank the Polytechnic University of Hong
Kong (PolyU), and more particularly Professor Jiannong Cao, for hosting me while I was writing parts
of this book. My thanks also to Ronan Nugent (Springer) for his support and his help in putting it all
together.
Last but not least (and maybe most importantly), I thank all the researchers whose results are pre-
sented in this book. Without their work, this book would not exist. (Finally, since I typeset the entire
text myself – LATEX2 for the text and xfig for figures – any typesetting or technical errors that remain
are my responsibility.)
Academia Europaea
Institut Universitaire de France
Professor IRISA-ISTIC, Université de Rennes 1, France
Chair Professor, Hong Kong Polytechnic University
June–December 2017
Rennes, Saint-Grégoire, Douelle, Saint-Philibert, Hong Kong,
Vienna (DISC’17), Washington D.C. (PODC’17), Mexico City (UNAM)
Contents
I Introductory Chapter 1
ix
x Contents
8 A Broadcast Abstraction
Suited to the Family of Read/Write Implementable Objects 131
8.1 The SCD-broadcast Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 132
8.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.1.2 Implementing SCD-broadcast in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 133
8.1.3 Cost and Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.1.4 An SCD-broadcast-based Communication Pattern . . . . . . . . . . . . . . . . 139
8.2 From SCD-broadcast to an MWMR Register . . . . . . . . . . . . . . . . . . . . . . . 139
8.2.1 Building an MWMR Atomic Register in CAMP n,t [SCD-broadcast] . . . . . . 139
8.2.2 Cost and Proof of Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2.3 From Atomicity to Sequential Consistency . . . . . . . . . . . . . . . . . . . 142
8.2.4 From MWMR Registers to an Atomic Snapshot Object . . . . . . . . . . . . . 143
8.3 From SCD-broadcast to an Atomic Counter . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3.1 Counter Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3.2 Implementation of an Atomic Counter Object . . . . . . . . . . . . . . . . . . 145
8.3.3 Implementation of a Sequentially Consistent Counter Object . . . . . . . . . . 146
8.4 From SCD-broadcast to Lattice Agreement . . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.1 The Lattice Agreement Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.2 Lattice Agreement from SCD-broadcast . . . . . . . . . . . . . . . . . . . . . 148
8.5 From SWMR Atomic Registers to SCD-broadcast . . . . . . . . . . . . . . . . . . . . 148
8.5.1 From Snapshot to SCD-broadcast . . . . . . . . . . . . . . . . . . . . . . . . 148
8.5.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11 Expediting Decision
in Synchronous Systems with Process Crash Failures 189
11.1 Early Deciding and Stopping Interactive Consistency . . . . . . . . . . . . . . . . . . . 189
11.1.1 Early Deciding vs Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . 189
11.1.2 An Early Decision Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.1.3 An Early Deciding and Stopping Algorithm . . . . . . . . . . . . . . . . . . . 191
11.1.4 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.1.5 On Early Decision Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.1.6 Early Deciding and Stopping Consensus . . . . . . . . . . . . . . . . . . . . . 195
11.2 An Unbeatable Binary Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . 196
11.2.1 A Knowledge-Based Unbeatable Predicate . . . . . . . . . . . . . . . . . . . 196
11.2.2 PREF0() with Respect to DIFF() . . . . . . . . . . . . . . . . . . . . . . . . 197
11.2.3 An Algorithm Based on the Predicate PREF0(): CGM . . . . . . . . . . . . . 197
11.2.4 On the Unbeatability of the Predicate PREF0() . . . . . . . . . . . . . . . . . 200
11.3 The Synchronous Condition-based Approach . . . . . . . . . . . . . . . . . . . . . . . 200
xiv Contents
16 Consensus:
Power and Implementability Limit in Crash-Prone Asynchronous Systems 287
16.1 The Total Order Broadcast Communication Abstraction . . . . . . . . . . . . . . . . . 287
16.1.1 Total Order Broadcast: Definition . . . . . . . . . . . . . . . . . . . . . . . . 287
16.1.2 A Map of Communication Abstractions . . . . . . . . . . . . . . . . . . . . . 288
16.2 From Consensus to TO-broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.1 Structure of the Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.2 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
16.3 Consensus and TO-broadcast Are Equivalent . . . . . . . . . . . . . . . . . . . . . . . 292
16.4 The State Machine Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.4.1 State Machine Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.4.2 Sequentially-Defined Abstractions (Objects) . . . . . . . . . . . . . . . . . . 294
16.5 A Simple Consensus-based Universal Construction . . . . . . . . . . . . . . . . . . . . 295
16.6 Agreement vs Mutual Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.7 Ledger Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.7.2 Implementation of a Ledger in CAMP n,t [TO-broadcast] . . . . . . . . . . . . 299
16.8 Consensus Impossibility in the Presence of Crashes and Asynchrony . . . . . . . . . . 300
16.8.1 The Intuition That Underlies the Impossibility . . . . . . . . . . . . . . . . . . 300
16.8.2 Refining the Definition of CAMP n,t [∅] . . . . . . . . . . . . . . . . . . . . . 301
16.8.3 Notion of Valence of a Global State . . . . . . . . . . . . . . . . . . . . . . . 303
16.8.4 Consensus Is Impossible in CAMP n,1 [∅] . . . . . . . . . . . . . . . . . . . . 304
16.9 The Frontier Between Read/Write Registers and Consensus . . . . . . . . . . . . . . . 309
16.9.1 The Main Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
16.9.2 The Notion of Consensus Number in Read/Write Systems . . . . . . . . . . . 310
16.9.3 An Illustration of Herlihy’s Hierarchy . . . . . . . . . . . . . . . . . . . . . . 310
16.9.4 The Consensus Number of a Ledger . . . . . . . . . . . . . . . . . . . . . . . 313
16.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
16.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
16.12 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
17.4 Enriching CAMP n,t [t < n/2] with an Eventual Leader . . . . . . . . . . . . . . . . . 323
17.4.1 The Weakest Failure Detector to Implement Consensus . . . . . . . . . . . . . 323
17.4.2 Implementing Consensus in CAMP n,t [t < n/2, Ω] . . . . . . . . . . . . . . 324
17.4.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
17.4.4 Consensus Versus Eventual Leader Failure Detector . . . . . . . . . . . . . . 329
17.4.5 Notions of Indulgence and Zero-degradation . . . . . . . . . . . . . . . . . . 329
17.4.6 Saving Broadcast Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
17.5 Enriching CAMP n,t [t < n/2] with Randomization . . . . . . . . . . . . . . . . . . . 330
17.5.1 Asynchronous Randomized Models . . . . . . . . . . . . . . . . . . . . . . . 330
17.5.2 Randomized Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
17.5.3 Randomized Binary Consensus in CAMP n,t [t < n/2, LC] . . . . . . . . . . . 331
17.5.4 Randomized Binary Consensus in CAMP n,t [t < n/2, CC] . . . . . . . . . . . 334
17.6 Enriching CAMP n,t [t < n/2] with a Hybrid Approach . . . . . . . . . . . . . . . . . 337
17.6.1 The Hybrid Approach: Failure Detector and Randomization . . . . . . . . . . 337
17.6.2 A Hybrid Binary Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . 338
17.7 A Paxos-inspired Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 339
17.7.1 The Alpha Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 340
17.7.2 Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
17.7.3 An Implementation of Alpha in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 341
17.8 From Binary to Multivalued Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . 344
17.8.1 A Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
17.8.2 Proof of the Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 345
17.9 Consensus in One Communication Step . . . . . . . . . . . . . . . . . . . . . . . . . . 346
17.9.1 Aim and Model Assumption on t . . . . . . . . . . . . . . . . . . . . . . . . 346
17.9.2 A One Communication Step Algorithm . . . . . . . . . . . . . . . . . . . . . 346
17.9.3 Proof of the Early Deciding Algorithm . . . . . . . . . . . . . . . . . . . . . 347
17.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
17.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
17.12 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
18 Implementing Oracles
in Asynchronous Systems with Process Crash Failures 353
18.1 The Two Facets of Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
18.1.1 The Programming Point of View: Modular Building Block . . . . . . . . . . . 354
18.1.2 The Computability Point of View: Abstraction Ranking . . . . . . . . . . . . 354
18.2 Ω in CAMP n,t [∅]: a Direct Impossibility Proof . . . . . . . . . . . . . . . . . . . . . . 355
18.3 Constructing a Perfect Failure Detector (Class P ) . . . . . . . . . . . . . . . . . . . . 356
18.3.1 Reminder: Definition of the Class P of Perfect Failure Detectors . . . . . . . . 356
18.3.2 Use of an Underlying Synchronous System . . . . . . . . . . . . . . . . . . . 357
18.3.3 Applications Generating a Fair Communication Pattern . . . . . . . . . . . . . 358
18.3.4 The Theta Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
18.4 Constructing an Eventually Perfect Failure Detector (Class 3P ) . . . . . . . . . . . . . 361
18.4.1 Reminder: Definition of an Eventually Perfect Failure Detector . . . . . . . . 361
18.4.2 From Perpetual to Eventual Properties . . . . . . . . . . . . . . . . . . . . . . 361
18.4.3 Eventually Synchronous Systems . . . . . . . . . . . . . . . . . . . . . . . . 361
18.5 On the Efficient Monitoring of a Process by Another Process . . . . . . . . . . . . . . 363
18.5.1 Motivation and System Model . . . . . . . . . . . . . . . . . . . . . . . . . . 363
18.5.2 A Monitoring Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
18.6 An Adaptive Monitoring-based Algorithm Building 3P . . . . . . . . . . . . . . . . . 366
18.6.1 Motivation and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
xviii Contents
18.6.2 A Monitoring-Based Adaptive Algorithm for the Failure Detector Class 3P . . 366
18.6.3 Proof the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
18.7 From the t-Source Assumption to an Ω Eventual Leader . . . . . . . . . . . . . . . . . 369
18.7.1 The 3t-Source Assumption and the Model CAMP n,t [3t-SOURCE] . . . . . 369
18.7.2 Electing an Eventual Leader in CAMP n,t [3t-SOURCE] . . . . . . . . . . . . 370
18.7.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
18.8 Electing an Eventual Leader in CAMP n,t [3t-MS PAT] . . . . . . . . . . . . . . . . . 372
18.8.1 A Query/Response Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
18.8.2 Electing an Eventual Leader in CAMP n,t [3t-MS PAT] . . . . . . . . . . . . 374
18.8.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
18.9 Building Ω in a Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
18.10 Construction of a Biased Common Coin from Local Coins . . . . . . . . . . . . . . . . 377
18.10.1 Definition of a Biased Common Coin . . . . . . . . . . . . . . . . . . . . . . 377
18.10.2 The CORE Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 377
18.10.3 Construction of a Common Coin with a Constant Bias . . . . . . . . . . . . . 380
18.10.4 On the Use of a Biased Common Coin . . . . . . . . . . . . . . . . . . . . . . 381
18.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
18.12 Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
18.13 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
VI Appendix 409
Afterword 425
Bibliography 431
Index 453
Notation
Symbols
The notation broadcast TYPE(m), where TYPE is a message type and m a message content, is used
as a shortcut for “for each j ∈ {1, · · · , n} do send TYPE(m) to pj end for”. Hence, if it is not faulty
during its execution, pi sends the message TYPE(m) to each process, including itself. Otherwise there
is no guarantee on the reception of TYPE(m).
(In Chap. 1 only, j ∈ {1, · · · , n} is replaced by j ∈ neighborsi .)
xxi
xxii Notation
Acronyms (1)
Acronyms (2)
CO Causal order
FIFO First in first out
TO Total order
SCD Set-constrained delivery
FC Fair channel
CRDT Conflict-free replicated data type
MS PAT Message pattern
ADV Adversary
FD Failure detector
HB Heartbeat
MS PAT Message pattern
SO Send omission
GO General omission
MS Message scheduling assumption
LC Local coin
CC Common coin
BCCB Binary common coin with bias
3.1 Uniform reliable broadcast in CAMP n,t [- FC, t < n/2] (code for pi ) . . . . . . . . 45
3.2 Building Θ in CAMP n,t [- FC, t < n/2] (code for pi ) . . . . . . . . . . . . . . . . 50
3.3 Quiescent uniform reliable broadcast in CAMP n,t [- FC, Θ, P ] (code for pi ) . . . . 53
3.4 Quiescent uniform reliable broadcast in CAMP n,t [- FC, Θ, HB ] (code for pi ) . . . 56
3.5 An example of a network with fair paths . . . . . . . . . . . . . . . . . . . . . . . . 60
xxv
xxvi List of Figures and Algorithms
7.1 Building a failure detector of the class Σ in CAMP n,t [t < n/2] . . . . . . . . . . . 120
7.2 An algorithm for an atomic SWSR register in CAMP n,t [Σ] . . . . . . . . . . . . . 121
7.3 Extracting Σ from a register D-based algorithm A . . . . . . . . . . . . . . . . . . 122
7.4 Extracting Σ from a failure detector-based register algorithm A (code for pi ) . . . . 124
7.5 From atomic registers to URB-broadcast (code for pi ) . . . . . . . . . . . . . . . . 127
7.6 From the failure detector class Σ to the URB abstraction (1 ≤ t < n) . . . . . . . . 128
7.7 Two examples of the hybrid communication model . . . . . . . . . . . . . . . . . . 129
8.1 An implementation of SCD-broadcast in CAMP n,t [t < n/2] (code for pi ) . . . . . 134
8.2 Message pattern introduced in Lemma 16 . . . . . . . . . . . . . . . . . . . . . . . 137
8.3 SCD-broadcast-based communication pattern (code for pi ) . . . . . . . . . . . . . . 139
8.4 Construction of an MWMR atomic register in CAMP n,t [SCD-broadcast] (code for
pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.5 Construction of an MWMR sequentially consistent register in CAMP n,t [SCD-broadcast]
(code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.6 Example of a run of an MWMR atomic snapshot object . . . . . . . . . . . . . . . 143
8.7 Construction of an MWMR atomic snapshot object in CAMP n,t [SCD-broadcast] . . 144
8.8 Construction of an atomic counter in CAMP n,t [SCD-broadcast] (code for pi ) . . . . 145
List of Figures and Algorithms xxvii
10.1 A simple (unfair) t-resilient consensus algorithm in CSMP n,t [∅] (code for pi ) . . . . 175
10.2 A simple (fair) t-resilient consensus algorithm in CSMP n,t [∅] (code for pi ) . . . . . 176
10.3 The second case of the agreement property (with t = 3 crashes) . . . . . . . . . . . 177
10.4 A t-resilient interactive consistency algorithm in CSMP n,t [∅] (code for pi ) . . . . . 179
10.5 Three possible one-round extensions from Et−1 . . . . . . . . . . . . . . . . . . . . 183
10.6 Extending the k-round execution Ek . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.7 Extending two (k + 1)-round executions . . . . . . . . . . . . . . . . . . . . . . . 185
10.8 Extending again two (k + 1)-round executions . . . . . . . . . . . . . . . . . . . . 185
13.1 A consensus-based NBAC algorithm in CSMP n,t [∅] (code for pi ) . . . . . . . . . . 232
13.2 Impossibility of having both fast commit and fast abort when t ≥ 3 (E3) . . . . . . . 234
13.3 Impossibility of having both fast commit and fast abort when t ≥ 3 (E4, E5) . . . . 235
13.4 Fast commit and weak fast abort NBAC in CSMP n,t [3 ≤ t < n] (code for pi ) . . . . 237
13.5 Fast abort and weak fast commit NBAC in CSMP n,t [3 ≤ t < n] (code for pi ) . . . . 242
xxviii List of Figures and Algorithms
13.6 Fast commit and fast abort NBAC in the system model CSMP n,t [t ≤ 2] (code for pi ) 243
14.1 Interactive consistency for four processes despite one Byzantine process (code for pi ) 248
14.2 Proof of the interactive consistency algorithm in BSMP n,t [t = 1, n = 4] . . . . . . 249
14.3 Communication graph (left) and behavior of the t Byzantine processes (right) . . . . 251
14.4 EIG tree for n = 4 and t = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
14.5 Byzantine EIG consensus algorithm for BSMP n,t [t < n/3] . . . . . . . . . . . . . 253
14.6 EIG trees of the correct processes at the end of the first round . . . . . . . . . . . . 254
14.7 EIG tree tree2 at the end of the second round . . . . . . . . . . . . . . . . . . . . . 255
14.8 Constant message size Byzantine consensus in BSMP n,t [t < n/4] . . . . . . . . . . 258
14.9 From binary to multivalued Byzantine consensus in BSMP n,t [t < n/3] (code for pi ) 260
14.10 Proof of Property PR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
14.11 Deterministic vs non-deterministic scenarios . . . . . . . . . . . . . . . . . . . . . 263
14.12 A Byzantine signature-based consensus algorithm in BSMP n,t [SIG; t < n/2]
(code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
15.1 Stacking of abstraction layers for distributed renaming in CAMP n,t [t < n/2] . . . . 273
15.2 A simple snapshot-based size-adaptive (2p − 1)-renaming algorithm (code for pi ) . 274
15.3 A simple snapshot-based approximate algorithm (code for pi ) . . . . . . . . . . . . 277
15.4 What is captured by Lemma 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.5 Safe agreement in CAMP n,t [t < n/2] (code for process pi ) . . . . . . . . . . . . . 281
16.1 Adding total order message delivery to various URB abstractions . . . . . . . . . . 288
16.2 Adding total order message delivery to the URB abstraction . . . . . . . . . . . . . 289
16.3 Building the TO-broadcast abstraction in CAMP n,t [CONS] (code for pi ) . . . . . . 290
16.4 Building the consensus abstraction in CAMP n,t [TO-broadcast] (code for pi ) . . . . 293
16.5 A TO-broadcast-based universal construction (code for pi ) . . . . . . . . . . . . . . 295
16.6 A state machine does not allow us to retrieve the past . . . . . . . . . . . . . . . . . 298
16.7 Building the consensus abstraction in CAMP n,t [LEDGER] (code for pi ) . . . . . . 298
16.8 A TO-broadcast-based ledger construction (code for pi ) . . . . . . . . . . . . . . . 299
16.9 Synchrony rules out uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.10 To wait or not to wait in presence of asynchrony and failures? . . . . . . . . . . . . 301
16.11 Bivalent vs univalent global states . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
16.12 There is a bivalent initial configuration . . . . . . . . . . . . . . . . . . . . . . . . 305
16.13 Illustrating the sets S1 and S2 used in Lemma 70 . . . . . . . . . . . . . . . . . . . 306
16.14 Σ2 contains 0-valent and 1-valent global states . . . . . . . . . . . . . . . . . . . . 307
16.15 Valence contradiction when i = i . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
16.16 Valence contradiction when i = i . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
16.17 k-sliding window register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.18 Solving consensus for k processes from a k-sliding window (code for pi ) . . . . . . 311
16.19 Schedule illustration: case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.20 Schedule illustration: case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.21 Building the TO-broadcast abstraction in CAMP n,t [- FC, CONS] (code for pi ) . . . 316
17.1 Binary consensus in CAMP n,t [t < n/2, MS] (code for pi ) . . . . . . . . . . . . . 319
17.2 A coordinator-based consensus algorithm for CAMP n,t [P ] (code for pi ) . . . . . . 322
17.3 Ω is a consensus computability lower bound . . . . . . . . . . . . . . . . . . . . . . 325
17.4 An algorithm implementing consensus in CAMP n,t [t < n/2, Ω] (code for pi ) . . . 326
17.5 The second phase for AS n,t [t < n/3, Ω] (code for pi ) . . . . . . . . . . . . . . . . 330
17.6 A randomized binary consensus algorithm for CAMP n,t [t < n/2, LC] (code for pi ) 332
17.7 What is broken by a random oracle . . . . . . . . . . . . . . . . . . . . . . . . . . 333
List of Figures and Algorithms xxix
17.8 A randomized binary consensus algorithm for CAMP n,t [t < n/2, CC] (code for pi ) 336
17.9 A hybrid binary consensus algorithm for CAMP n,t [t < n/2, Ω, LC] (code for pi ) . . 338
17.10 An Alpha-based consensus algorithm in CAMP n,t [t < n/2, Ω] (code for pi ) . . . . 340
17.11 An algorithm implementing Alpha in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 342
17.12 A reduction of multivalued to binary consensus in CAMP n,t [BC] (code for pi ) . . . 344
17.13 Consensus in one communication step in CAMP n,t [t < n/3, CONS] (code for pi ) . 347
17.14 Is this consensus algorithm for CAMP n,t [t < n/2, AΩ] correct? (code for pi ) . . . 351
19.1 Binary consensus in BAMP n,t [t < n/3, TMS] (code for pi ) . . . . . . . . . . . . . 387
19.2 An algorithm implementing BV-broadcast in BAMP n,t [t < n/3] (code for pi ) . . . 390
19.3 A BV-broadcast-based binary consensus algorithm for the model BAMP n,t [n >
3t, CC] (code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
19.4 From multivalued to binary Byzantine consensus in BAMP n,t [t < n/3, BBC]
(code of pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
19.5 VBB-broadcast on top of reliable broadcast in BAMP n,t [t < n/3] (code of pi ) . . . 400
19.6 From multivalued to binary consensus in BAMP n,t [t < n/3, BBC] (code for pi ) . . 403
19.7 Local blockchain representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
14.1 Upper bounds on the number of faulty processes for consensus . . . . . . . . . . . . . 245
xxxi
Part I
Introductory Chapter
1
Another random document with
no related content on Scribd:
capable of producing so much sound. I have never observed this
habit upon a dull or cloudy day.”
Mr Nuttall having presented me with the nest of this species
attached to the twig to which the bird had fastened it, my amiable
friend Miss Martin has figured it for me, as well as the plant, about
which these lovely creatures are represented. The nest, which
measures two inches and a quarter in height, and an inch and three
quarters in breadth, at the upper part, is composed externally of
mosses, lichens, and a few feathers, with slender fibrous roots
interwoven, and lined with fine cottony seed-down.
Length to end of tail 3 7 1/2/12 inches; bill along the ridge 8 3/4/12; wing
from flexure 1 10/12; tail 1 1 1/2/12.
Cleome heptaphylla.
Strix Tengmalmi, Gmel. Syst. Nat. vol. i. p. 291.—Lath. Ind. Ornith. vol. i. p.
65.
Strix Tengmalmi, Tengmalm’s Owl, Swains. and Richards. Fauna Bor.-
Amer. vol. ii. p. 94.
Anas hyperborea, Gmel. Syst. Nat. vol. i. p. 504.—Lath. Ind. Orn. vol. ii. p.
837.
Snow Goose, Anas hyperborea, Wils. Amer. Ornith. vol. viii. p. 76, pl. 68,
fig. 3, Male, and p. 89, pl. 69, fig. 5, Young.
Anser hyperboreus, Ch. Bonaparte, Synopsis of Birds of United States, p.
376.
Anser hyperboreus, Snow Goose, Richards. and Swains. Fauna Bor.-
Amer. vol. ii. p. 467.
Snow Goose, Nuttall, Manual, vol. ii. p. 344.
Tetrao Phasianellus, Linn. Syst. Nat. vol. i. p. 273.—Lath. Ind. Ornith. vol.
ii. p. 635.—Ch. Bonaparte, Synopsis of Birds of United States, p. 127.
Tetrao Phasianellus, Sharp-tailed Grous, Ch. Bonaparte, Amer. Ornith.
vol. iii. p. 37, pl. 19.
Tetrao (centrocercus) Phasianellus, Swains. Sharp-tailed Grous,
Richards. and Swains. Fauna Bor.-Amer. vol. ii. p. 361.
Sharp-tailed Grous, Nuttall, Manual, vol. i. p. 669.
This Owl is much more abundant in our Middle and Eastern Atlantic
Districts than in the Southern or Western parts. My friend Dr
Bachman has never observed it in South Carolina; nor have I met
with it in Louisiana, or any where on the Mississippi below the
junction of the Ohio. It is not very rare in the upper parts of Indiana,
Illinois, Ohio, and Kentucky, wherever the country is well wooded. In
the Barrens of Kentucky its predilection for woods is rendered
apparent by its not being found elsewhere than in the “Groves;” and
it would seem that it very rarely extends its search for food beyond
the skirts of those delightful retreats. In Pennsylvania, and elsewhere
to the eastward, I have found it most numerous on or near the banks
of our numerous clear mountain streams, where, during the day, it is
not uncommon to see it perched on the top of a low bush or fir. At
such times it stands with the body erect, but the tarsi bent and
resting on a branch, as is the manner of almost all our Owls. The
head then seems the largest part, the body being much more
slender than it is usually represented. Now and then it raises itself
and stands with its legs and neck extended, as if the better to mark
the approach of an intruder. Its eyes, which were closed when it was
first observed, are opened on the least noise, and it seems to squint
at you in a most grotesque manner, although it is not difficult to