100% found this document useful (5 votes)
92 views53 pages

Fault Tolerant Message Passing Distributed Systems An Algorithmic Approach Michel Raynal Download PDF

ebook

Uploaded by

thouramerht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
92 views53 pages

Fault Tolerant Message Passing Distributed Systems An Algorithmic Approach Michel Raynal Download PDF

ebook

Uploaded by

thouramerht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Full download test bank at ebook textbookfull.

com

Fault Tolerant Message Passing


Distributed Systems An Algorithmic

CLICK LINK TO DOWLOAD

https://textbookfull.com/product/fault-
tolerant-message-passing-distributed-systems-
an-algorithmic-approach-michel-raynal/

textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Adaptive and Fault-Tolerant Control of Underactuated


Nonlinear Systems 1st Edition Jiangshuai Huang

https://textbookfull.com/product/adaptive-and-fault-tolerant-
control-of-underactuated-nonlinear-systems-1st-edition-
jiangshuai-huang/

Advanced methods for fault diagnosis and fault-tolerant


control Steven X. Ding

https://textbookfull.com/product/advanced-methods-for-fault-
diagnosis-and-fault-tolerant-control-steven-x-ding/

Intelligent Video Surveillance Systems: An Algorithmic


Approach First Edition Maheshkumar H. Kolekar

https://textbookfull.com/product/intelligent-video-surveillance-
systems-an-algorithmic-approach-first-edition-maheshkumar-h-
kolekar/

Fault-Tolerant Systems 2nd Edition Koren D.Sc.


Electrical Engineering Israel Institute Of Technology
Haifa

https://textbookfull.com/product/fault-tolerant-systems-2nd-
edition-koren-d-sc-electrical-engineering-israel-institute-of-
technology-haifa/
Robust Integration of Model-Based Fault Estimation and
Fault-Tolerant Control Jianglin Lan

https://textbookfull.com/product/robust-integration-of-model-
based-fault-estimation-and-fault-tolerant-control-jianglin-lan/

Robust and Fault Tolerant Control Neural Network Based


Solutions Krzysztof Patan

https://textbookfull.com/product/robust-and-fault-tolerant-
control-neural-network-based-solutions-krzysztof-patan/

Computational Network Science An Algorithmic Approach


1st Edition Hexmoor

https://textbookfull.com/product/computational-network-science-
an-algorithmic-approach-1st-edition-hexmoor/

Advances in Gain-Scheduling and Fault Tolerant Control


Techniques 1st Edition Damiano Rotondo (Auth.)

https://textbookfull.com/product/advances-in-gain-scheduling-and-
fault-tolerant-control-techniques-1st-edition-damiano-rotondo-
auth/

Bio-Inspired Fault-Tolerant Algorithms for Network-on-


Chip 1st Edition Muhammad Athar Javed Sethi (Author)

https://textbookfull.com/product/bio-inspired-fault-tolerant-
algorithms-for-network-on-chip-1st-edition-muhammad-athar-javed-
sethi-author/
Michel Raynal

Fault-Tolerant
Message-Passing
Distributed
Systems
An Algorithmic Approach
Fault-Tolerant Message-Passing Distributed Systems
Michel Raynal

Fault-Tolerant
Message-Passing
Distributed Systems
An Algorithmic Approach
Michel Raynal
IRISA-ISTIC Université de Rennes 1
Institut Universitaire de France
Rennes, France

Parts of this work are based on the books “Fault-Tolerant Agreement in Synchronous Message-
Passing Systems” and “Communication and Agreement Abstractions for Fault-Tolerant Asynchro-
nous Distributed Systems”, author Michel Raynal, © 2010 Morgan & Claypool Publishers (www.
morganclaypool.com). Used with permission.

ISBN 978-3-319-94140-0 ISBN 978-3-319-94141-7 (eBook)


https://doi.org/10.1007/978-3-319-94141-7

Library of Congress Control Number: 2018953101

© Springer Nature Switzerland AG 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

La recherche du temps perdu passait par le Web. [...]


La mémoire était devenue inépuisable, mais la profondeur du temps [...] avait disparu.
On était dans un présent infini.
In Les années (2008), Annie Ernaux (1940)

Sed nos immensum spatiis confecimus aequor,


Et iam tempus equum fumentia solvere colla.1
In Georgica, Liber II, 541-542, Publius Virgilius (70 BC–19 BC)

Je suis arrivé au jour où je ne me souviens plus quand j’ai cessé d’être immortel.
In Livro de Crónicas, António Lobo Antunes (1942)

C’est une chose étrange à la fin que le monde


Un jour je m’en irai sans en avoir tout dit.
In Les yeux et la mémoire (1954), chant II, Louis Aragon (1897–1982)

Tout garder, c’est tout détruire.


Jacques Derrida (1930–2004)

1
French: Mais j’ai déjà fourni une vaste carrière, il est temps de dételer les chevaux tout fumants.
English: But now I have traveled a very long way, and the time has come to unyoke my steaming horses.

v
vi Preface

What is distributed computing? Distributed computing was born in the late 1970s when researchers
and practitioners started taking into account the intrinsic characteristic of physically distributed sys-
tems. The field then emerged as a specialized research area distinct from networking, operating sys-
tems, and parallel computing.
Distributed computing arises when one has to solve a problem in terms of distributed entities
(usually called processors, nodes, processes, actors, agents, sensors, peers, etc.) such that each entity
has only a partial knowledge of the many parameters involved in the problem that has to be solved.
While parallel computing and real-time computing can be characterized, respectively, by the terms
efficiency and on-time computing, distributed computing can be characterized by the term uncertainty.
This uncertainty is created by asynchrony, multiplicity of control flows, absence of shared memory
and global time, failure, dynamicity, mobility, etc. Mastering one form or another of uncertainty is
pervasive in all distributed computing problems. A main difficulty in designing distributed algorithms
comes from the fact that no entity cooperating in the achievement of a common goal can have an
instantaneous knowledge of the current state of the other entities, it can only know their past local
states.
Although distributed algorithms are often made up of a few lines, their behavior can be difficult
to understand and their properties hard to state and prove. Hence, distributed computing is not only
a fundamental topic but also a challenging topic where simplicity, elegance, and beauty are first-class
citizens.

Why this book? In the book “Distributed algorithms for message-passing systems” (Springer, 2013),
I addressed distributed computing in failure-free message-passing systems, where the computing enti-
ties (processes) have to cooperate in the presence of asynchrony. Differently, in my book “Concurrent
programming: algorithms, principles and foundations” (Springer, 2013), I addressed distributed com-
puting where the computing entities (processes) communicate through a read/write shared memory
(e.g., multicore), and the main adversary lies in the net effect of asynchrony and process crashes
(unexpected definitive stops).
The present book considers synchronous and asynchronous message-passing systems, where pro-
cesses can commit crash failures, or Byzantine failures (arbitrary behavior). Its aim is to present in a
comprehensive way basic notions, concepts and algorithms in the context of these systems. The main
difficulty comes from the uncertainty created by the adversaries managing the environment (mainly
asynchrony and failures), which, by its very nature, is not under the control of the system.

A quick look at the content of the book The book is composed of four parts, the first two are on
communication abstractions, the other two on agreement abstractions. Those are the most important
abstractions distributed applications rely on in asynchronous and synchronous message-passing sys-
tems where processes may crash, or commit Byzantine failures. The book addresses what can be done
and what cannot be done in the presence of such adversaries. It consequently presents both impossi-
bility results and distributed algorithms. All impossibility results are proved, and all algorithms are
described in a simple algorithmic notation and proved correct.
• Parts on communication abstractions.
– Part I is on the reliable broadcast abstraction.
Preface vii

– Part II is on the construction of read/write registers.

• Parts on agreement.
– Part III is on agreement in synchronous systems.
– Part IV is on agreement in asynchronous systems.

On the presentation style When known, the names of the authors of a theorem, or of an algorithm,
are indicated together with the date of the associated publication. Moreover, each chapter has a bib-
liographical section, where a short historical perspective and references related to that chapter are
given.
Each chapter terminates with a few exercises and problems, whose solutions can be found in the
article cited at the end of the corresponding exercise/problem.
From a vocabulary point of view, the following terms are used: an object implements an abstrac-
tion, defined by a set of properties, which allows a problem to be solved. Moreover, each algorithm
is first presented intuitively with words, and then proved correct. Understanding an algorithm is a
two-step process:
• First have a good intuition of its underlying principles, and its possible behaviors. This is nec-
essary, but remains informal.
• Then prove the algorithm is correct in the model it was designed for. The proof consists in a
logical reasoning, based on the properties provided by (i) the underlying model, and (ii) the
statements (code) of the algorithm. More precisely, each property defining the abstraction the
algorithm is assumed to implement must be satisfied in all its executions.

Only when these two steps have been done, can we say that we understand the algorithm.

Audience This book has been written primarily for people who are not familiar with the topic and
the concepts that are presented. These include mainly:
• Senior-level undergraduate students and graduate students in informatics or computing engineer-
ing, who are interested in the principles and algorithmic foundations of fault-tolerant distributed
computing.
• Practitioners and engineers who want to be aware of the state-of-the-art concepts, basic princi-
ples, mechanisms, and techniques encountered in fault-tolerant distributed computing.

Prerequisites for this book include undergraduate courses on algorithms, basic knowledge on operat-
ing systems, and notions on concurrency in failure-free distributed computing. One-semester courses,
based on this book, are suggested in the section titled “How to Use This Book” in the Afterword.

Origin of the book and acknowledgments This book has two complementary origins:
• The first is a set of lectures for undergraduate and graduate courses on distributed computing I
gave at the University of Rennes (France), the Hong Kong Polytechnic University, and, as an
invited professor, at several universities all over the world.
Hence, I want to thank the numerous students for their questions that, in one way or another,
contributed to this book.
• The second is the two monographs I wrote in 2010, on fault-tolerant distributed computing,
titled “Communication and agreement abstractions for fault-tolerant asynchronous distributed
viii Preface

systems”, and “Fault-tolerant agreement in synchronous distributed systems”. Parts of them


appear in this book, after having been revised, corrected, and improved.
Hence, I want to thank Morgan & Claypool, and more particularly Diane Cerra, for their per-
mission to reuse parts of this work.

I also want to thank my colleagues (in no particular order) A. Mostéfaoui, D. Imbs, S. Rajsbaum,
V. Gramoli, C. Delporte, H. Fauconnier, F. Taı̈ani, M. Perrin, A. Castañeda, M. Larrea, and Z. Bouzid,
with whom I collaborated in the recent past years. I also thank the Polytechnic University of Hong
Kong (PolyU), and more particularly Professor Jiannong Cao, for hosting me while I was writing parts
of this book. My thanks also to Ronan Nugent (Springer) for his support and his help in putting it all
together.

Last but not least (and maybe most importantly), I thank all the researchers whose results are pre-
sented in this book. Without their work, this book would not exist. (Finally, since I typeset the entire
text myself – LATEX2 for the text and xfig for figures – any typesetting or technical errors that remain
are my responsibility.)

Professor Michel Raynal

Academia Europaea
Institut Universitaire de France
Professor IRISA-ISTIC, Université de Rennes 1, France
Chair Professor, Hong Kong Polytechnic University

June–December 2017
Rennes, Saint-Grégoire, Douelle, Saint-Philibert, Hong Kong,
Vienna (DISC’17), Washington D.C. (PODC’17), Mexico City (UNAM)
Contents

I Introductory Chapter 1

1 A Few Definitions and Two Introductory Examples 3


1.1 A Few Definitions Related to Distributed Computing . . . . . . . . . . . . . . . . . . . 3
1.2 Example 1: Common Decision Despite Message Losses . . . . . . . . . . . . . . . . . 7
1.2.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Trying to Solve the Problem: Attempt 1 . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Trying to Solve the Problem: Attempt 2 . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.5 A Coordination Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Example 2:
Computing a Global Function Despite a Message Adversary . . . . . . . . . . . . . . . 11
1.3.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 The Notion of a Message Adversary . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 The TREE-AD Message Adversary . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 From Message Adversary to Process Mobility . . . . . . . . . . . . . . . . . . 15
1.4 Main Distributed Computing Models Used in This Book . . . . . . . . . . . . . . . . . 16
1.5 Distributed Computing Versus Parallel Computing . . . . . . . . . . . . . . . . . . . . 17
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

II The Reliable Broadcast Communication Abstraction 21

2 Reliable Broadcast in the Presence of Process Crash Failures 23


2.1 Uniform Reliable Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 From Best Effort to Guaranteed Reliability . . . . . . . . . . . . . . . . . . . 23
2.1.2 Uniform Reliable Broadcast (URB-broadcast) . . . . . . . . . . . . . . . . . . 24
2.1.3 Building the URB-broadcast Abstraction in CAMP n,t [∅] . . . . . . . . . . . . 25
2.2 Adding Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1 “First In, First Out” (FIFO) Message Delivery . . . . . . . . . . . . . . . . . . 27
2.2.2 “Causal Order” (CO) Message Delivery . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 From FIFO-broadcast to CO-broadcast . . . . . . . . . . . . . . . . . . . . . 31
2.2.4 From URB-broadcast to CO-broadcast: Capturing Causal Past in a Vector . . . 34
2.2.5 The Total Order Broadcast Abstraction Requires More . . . . . . . . . . . . . 38
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

ix
x Contents

3 Reliable Broadcast in the Presence of Process Crashes and Unreliable Channels 41


3.1 A System Model with Unreliable Channels . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.1 Fairness Notions for Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Fair Channel (FC) and Fair Lossy Channel . . . . . . . . . . . . . . . . . . . 42
3.1.3 Reliable Channel in the Presence of Process Crashes . . . . . . . . . . . . . . 43
3.1.4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 URB-broadcast in CAMP n,t [- FC] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 URB-broadcast in CAMP n,t [- FC, t < n/2] . . . . . . . . . . . . . . . . . . 45
3.2.2 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Failure Detectors: an Approach to Circumvent Impossibilities . . . . . . . . . . . . . . 47
3.3.1 The Concept of a Failure Detector . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Formal Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 URB-broadcast in CAMP n,t [- FC] Enriched with a Failure Detector . . . . . . . . . . 49
3.4.1 Definition of the Failure Detector Class Θ . . . . . . . . . . . . . . . . . . . . 49
3.4.2 Solving URB-broadcast in CAMP n,t [- FC, Θ] . . . . . . . . . . . . . . . . . 50
3.4.3 Building a Failure Detector Θ in CAMP n,t [- FC, t < n/2] . . . . . . . . . . 50
3.4.4 The Fundamental Added Value Supplied by a Failure Detector . . . . . . . . . 51
3.5 Quiescent Uniform Reliable Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 The Quiescence Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.2 Quiescent URB-broadcast Based on a Perfect Failure Detector . . . . . . . . . 52
3.5.3 The Class HB of Heartbeat Failure Detectors . . . . . . . . . . . . . . . . . . 54
3.5.4 Quiescent URB-broadcast in CAMP n,t [- FC, Θ, HB ] . . . . . . . . . . . . . 56
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Reliable Broadcast in the Presence of Byzantine Processes 61


4.1 Byzantine Processes and Properties of the Model BAMP n,t [t < n/3] . . . . . . . . . 61
4.2 The No-Duplicity Broadcast Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.3 A No-Duplicity Broadcast Algorithm . . . . . . . . . . . . . . . . . . . . . . 63
4.3 The Byzantine Reliable Broadcast Abstraction . . . . . . . . . . . . . . . . . . . . . . 65
4.4 An Optimal Byzantine Reliable Broadcast Algorithm . . . . . . . . . . . . . . . . . . 66
4.4.1 A Byzantine Reliable Broadcast Algorithm for BAMP n,t [t < n/3] . . . . . . 66
4.4.2 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.3 Benefiting from Message Asynchrony . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Time and Message-Efficient Byzantine Reliable Broadcast . . . . . . . . . . . . . . . . 69
4.5.1 A Message-Efficient Byzantine Reliable Broadcast Algorithm . . . . . . . . . 70
4.5.2 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

III The Read/Write Register Communication Abstraction 75

5 The Read/Write Register Abstraction 77


5.1 The Read/Write Register Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.1 Concurrent Objects and Registers . . . . . . . . . . . . . . . . . . . . . . . . 77
Contents xi

5.1.2 The Notion of a Regular Register . . . . . . . . . . . . . . . . . . . . . . . . 78


5.1.3 Registers Defined from a Sequential Specification . . . . . . . . . . . . . . . . 79
5.2 A Formal Approach to Atomicity and Sequential Consistency . . . . . . . . . . . . . . 81
5.2.1 Processes, Operations, and Events . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.2 Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.3 A Formal Definition of Atomicity . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.4 A Formal Definition of Sequential Consistency . . . . . . . . . . . . . . . . . 84
5.3 Composability of Consistency Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.1 What Is Composability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2 Atomicity Is Composable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.3 Sequential Consistency Is Not Composable . . . . . . . . . . . . . . . . . . . 87
5.4 Bounds on the Implementation of Strong Consistency Conditions . . . . . . . . . . . . 88
5.4.1 Upper Bound on t for Atomicity . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4.2 Upper Bound on t for Sequential Consistency . . . . . . . . . . . . . . . . . . 89
5.4.3 Lower Bounds on the Durations of Read and Write Operations . . . . . . . . . 90
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 Building Read/Write Registers


Despite Asynchrony and Less than Half of Processes Crash (t < n/2) 95
6.1 A Structural View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Building an SWMR Regular Read/Write Register in CAMP n,t [t < n/2] . . . . . . . . 96
6.2.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Implementing an SWMR Regular Register in CAMP n,t [t < n/2] . . . . . . . 97
6.2.3 Proof of the SWMR Regular Register Construction . . . . . . . . . . . . . . . 99
6.3 From an SWMR Regular Register to an SWMR Atomic Register . . . . . . . . . . . . 100
6.3.1 Why the Previous Algorithm Does Not Ensure Atomicity . . . . . . . . . . . . 100
6.3.2 From Regularity to Atomicity . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 From SWMR Atomic Register to MWMR Atomic Register . . . . . . . . . . . . . . . 101
6.4.1 Replacing Sequence Numbers by Timestamps . . . . . . . . . . . . . . . . . . 101
6.4.2 Construction of an MWMR Atomic Register . . . . . . . . . . . . . . . . . . 102
6.4.3 Proof of the MWMR Atomic Register Construction . . . . . . . . . . . . . . . 102
6.5 Implementing Sequentially Consistent Registers . . . . . . . . . . . . . . . . . . . . . 105
6.5.1 How to Address the Non-composability of Sequential Consistency . . . . . . . 105
6.5.2 Algorithms Based on a Total Order Broadcast Abstraction . . . . . . . . . . . 105
6.5.3 A TO-broadcast-based Algorithm with Local (Fast) Read Operations . . . . . 106
6.5.4 A TO-broadcast-based Algorithm with Local (Fast) Write Operations . . . . . 107
6.5.5 An Algorithm Based on Logical Time . . . . . . . . . . . . . . . . . . . . . . 108
6.5.6 Proof of the Logical Time-based Algorithm . . . . . . . . . . . . . . . . . . . 112
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7 Circumventing the t < n/2 Read/Write Register Impossibility:


the Failure Detector Approach 119
7.1 The Class Σ of Quorum Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1.1 Definition of the Class of Quorum Failure Detectors . . . . . . . . . . . . . . 119
7.1.2 Implementing a Failure Detector Σ When t < n/2 . . . . . . . . . . . . . . . 120
7.1.3 A Σ-based Construction of an SWSR Atomic Register . . . . . . . . . . . . . 121
xii Contents

7.2 Σ Is the Weakest Failure Detector to Build an Atomic Register . . . . . . . . . . . . . 122


7.2.1 What Does “Weakest Failure Detector Class” Mean . . . . . . . . . . . . . . . 122
7.2.2 The Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2.3 Correctness of the Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . 124
7.3 Comparing the Failure Detectors Classes Θ and Σ . . . . . . . . . . . . . . . . . . . . 125
7.4 Atomic Register Abstraction vs URB-broadcast Abstraction . . . . . . . . . . . . . . . 126
7.4.1 From Atomic Registers to URB-broadcast . . . . . . . . . . . . . . . . . . . . 126
7.4.2 Atomic Registers Are Strictly Stronger than URB-broadcast . . . . . . . . . . 127
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.7 Exercise and Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8 A Broadcast Abstraction
Suited to the Family of Read/Write Implementable Objects 131
8.1 The SCD-broadcast Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 132
8.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.1.2 Implementing SCD-broadcast in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 133
8.1.3 Cost and Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.1.4 An SCD-broadcast-based Communication Pattern . . . . . . . . . . . . . . . . 139
8.2 From SCD-broadcast to an MWMR Register . . . . . . . . . . . . . . . . . . . . . . . 139
8.2.1 Building an MWMR Atomic Register in CAMP n,t [SCD-broadcast] . . . . . . 139
8.2.2 Cost and Proof of Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2.3 From Atomicity to Sequential Consistency . . . . . . . . . . . . . . . . . . . 142
8.2.4 From MWMR Registers to an Atomic Snapshot Object . . . . . . . . . . . . . 143
8.3 From SCD-broadcast to an Atomic Counter . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3.1 Counter Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3.2 Implementation of an Atomic Counter Object . . . . . . . . . . . . . . . . . . 145
8.3.3 Implementation of a Sequentially Consistent Counter Object . . . . . . . . . . 146
8.4 From SCD-broadcast to Lattice Agreement . . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.1 The Lattice Agreement Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.2 Lattice Agreement from SCD-broadcast . . . . . . . . . . . . . . . . . . . . . 148
8.5 From SWMR Atomic Registers to SCD-broadcast . . . . . . . . . . . . . . . . . . . . 148
8.5.1 From Snapshot to SCD-broadcast . . . . . . . . . . . . . . . . . . . . . . . . 148
8.5.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9 Atomic Read/Write Registers in the Presence of Byzantine Processes 155


9.1 Atomic Read/Write Registers in the Presence of Byzantine Processes . . . . . . . . . . 155
9.1.1 Why SWMR (and Not MWMR) Atomic Registers? . . . . . . . . . . . . . . . 155
9.1.2 Reminder on Possible Behaviors of a Byzantine Process . . . . . . . . . . . . 155
9.1.3 SWMR Atomic Registers Despite Byzantine Processes: Definition . . . . . . . 156
9.2 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3 Reminder on Byzantine Reliable Broadcast . . . . . . . . . . . . . . . . . . . . . . . . 159
9.3.1 Specification of Multi-shot Reliable Broadcast . . . . . . . . . . . . . . . . . 159
9.3.2 An Algorithm for Multi-shot Byzantine Reliable Broadcast . . . . . . . . . . . 159
9.4 Construction of SWMR Atomic Registers in BAMP n,t [t < n/3] . . . . . . . . . . . . 161
9.4.1 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.4.2 Comparison with the Crash Failure Model . . . . . . . . . . . . . . . . . . . . 163
Contents xiii

9.5 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164


9.5.1 Preliminary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.5.2 Proof of the Termination Properties . . . . . . . . . . . . . . . . . . . . . . . 164
9.5.3 Proof of the Consistency (Atomicity) Properties . . . . . . . . . . . . . . . . . 165
9.5.4 Piecing Together the Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.6 Building Objects on Top of SWMR Byzantine Registers . . . . . . . . . . . . . . . . . 166
9.6.1 One-shot Write-snapshot Object . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.6.2 Correct-only Agreement Object . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.9 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

IV Agreement in Synchronous Systems 171

10 Consensus and Interactive Consistency


in Synchronous Systems Prone to Process Crash Failures 173
10.1 Consensus in the Crash Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.1.2 A Simple (Unfair) Consensus Algorithm . . . . . . . . . . . . . . . . . . . . 174
10.1.3 A Simple (Fair) Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . 175
10.2 Interactive Consistency (Vector Consensus) . . . . . . . . . . . . . . . . . . . . . . . . 177
10.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.2.2 A Simple Example of Use: Build Atomic Rounds . . . . . . . . . . . . . . . . 178
10.2.3 An Interactive Consistency Algorithm . . . . . . . . . . . . . . . . . . . . . . 178
10.2.4 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
10.2.5 A Convergence Point of View . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.3 Lower Bound on the Number of Rounds . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.3.1 Preliminary Assumptions and Definitions . . . . . . . . . . . . . . . . . . . . 182
10.3.2 The (t + 1) Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.3.3 Proof of the Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.6 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

11 Expediting Decision
in Synchronous Systems with Process Crash Failures 189
11.1 Early Deciding and Stopping Interactive Consistency . . . . . . . . . . . . . . . . . . . 189
11.1.1 Early Deciding vs Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . 189
11.1.2 An Early Decision Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
11.1.3 An Early Deciding and Stopping Algorithm . . . . . . . . . . . . . . . . . . . 191
11.1.4 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.1.5 On Early Decision Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.1.6 Early Deciding and Stopping Consensus . . . . . . . . . . . . . . . . . . . . . 195
11.2 An Unbeatable Binary Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . 196
11.2.1 A Knowledge-Based Unbeatable Predicate . . . . . . . . . . . . . . . . . . . 196
11.2.2 PREF0() with Respect to DIFF() . . . . . . . . . . . . . . . . . . . . . . . . 197
11.2.3 An Algorithm Based on the Predicate PREF0(): CGM . . . . . . . . . . . . . 197
11.2.4 On the Unbeatability of the Predicate PREF0() . . . . . . . . . . . . . . . . . 200
11.3 The Synchronous Condition-based Approach . . . . . . . . . . . . . . . . . . . . . . . 200
xiv Contents

11.3.1 The Condition-based Approach in Synchronous Systems . . . . . . . . . . . . 200


11.3.2 Legality and Maximality of a Condition . . . . . . . . . . . . . . . . . . . . . 201
11.3.3 Hierarchy of Legal Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.3.4 Local View of an Input Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.3.5 A Synchronous Condition-based Consensus Algorithm . . . . . . . . . . . . . 204
11.3.6 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
11.4 Using a Global Clock and a Fast Failure Detector . . . . . . . . . . . . . . . . . . . . . 207
11.4.1 Fast Perfect Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.4.2 Enriching the Synchronous Model to Benefit from a Fast Failure Detector . . . 208
11.4.3 A Simple Consensus Algorithm Based on a Fast Failure Detector . . . . . . . 208
11.4.4 An Early Deciding and Stopping Algorithm . . . . . . . . . . . . . . . . . . . 209
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

12 Consensus Variants: Simultaneous Consensus and k-Set Agreement 215


12.1 Simultaneous Consensus: Definition and Its Difficulty . . . . . . . . . . . . . . . . . . 215
12.1.1 Definition of Simultaneous Consensus . . . . . . . . . . . . . . . . . . . . . . 215
12.1.2 Difficulty Early Deciding Before (t + 1) Rounds . . . . . . . . . . . . . . . . 216
12.1.3 Failure Pattern, Failure Discovery, and Waste . . . . . . . . . . . . . . . . . . 216
12.1.4 A Clean Round and the Horizon of a Round . . . . . . . . . . . . . . . . . . . 217
12.2 An Optimal Simultaneous Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . 218
12.2.1 An Optimal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.2.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
12.3 The k-Set Agreement Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
12.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
12.3.2 A Simple Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
12.4 Early Deciding and Stopping k-Set Agreement . . . . . . . . . . . . . . . . . . . . . . 224
12.4.1 An Early Deciding and Stopping Algorithm . . . . . . . . . . . . . . . . . . . 224
12.4.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
12.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

13 Non-blocking Atomic Commitment


in Synchronous Systems with Process Crash Failures 231
13.1 The Non-blocking Atomic Commitment (NBAC) Abstraction . . . . . . . . . . . . . . 231
13.1.1 Definition of Non-blocking Atomic Commitment . . . . . . . . . . . . . . . . 231
13.1.2 A Simple Non-blocking Atomic Commitment Algorithm . . . . . . . . . . . . 232
13.2 Fast Commit and Fast Abort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
13.2.1 Looking for Efficient Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 233
13.2.2 An Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
13.3 Weak Fast Commit and Weak Fast Abort . . . . . . . . . . . . . . . . . . . . . . . . . 236
13.4 Fast Commit and Weak Fast Abort Are Compatible . . . . . . . . . . . . . . . . . . . 236
13.4.1 A Fast Commit and Weak Fast Abort Algorithm . . . . . . . . . . . . . . . . 236
13.4.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.5 Other Non-blocking Atomic Commitment Algorithms . . . . . . . . . . . . . . . . . . 241
13.5.1 Fast Abort and Weak Fast Commit . . . . . . . . . . . . . . . . . . . . . . . . 241
13.5.2 The Case t ≤ 2 (System Model CSMP n,t [1 ≤ t < 3 ≤ n]) . . . . . . . . . . . 242
13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Contents xv

13.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243


13.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

14 Consensus in Synchronous Systems Prone to Byzantine Process Failures 245


14.1 Agreement Despite Byzantine Processes . . . . . . . . . . . . . . . . . . . . . . . . . 246
14.1.1 On the Agreement and Validity Properties . . . . . . . . . . . . . . . . . . . . 246
14.1.2 A Consensus Definition for the Byzantine Failure Model . . . . . . . . . . . . 246
14.1.3 An Interactive Consistency Definition for the Byzantine Failure Model . . . . 247
14.1.4 The Byzantine General Agreement Abstraction . . . . . . . . . . . . . . . . . 247
14.2 Interactive Consistency for Four Processes Despite One Byzantine Process . . . . . . . 247
14.2.1 An Algorithm for n = 4 and t = 1 . . . . . . . . . . . . . . . . . . . . . . . . 247
14.2.2 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
14.3 An Upper Bound on the Number of Byzantine Processes . . . . . . . . . . . . . . . . . 249
14.4 A Byzantine Consensus Algorithm for BSMP n,t [t < n/3] . . . . . . . . . . . . . . . . 251
14.4.1 Base Data Structure: a Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
14.4.2 EIG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
14.4.3 Example of an Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
14.4.4 Proof of the EIG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
14.5 A Simple Consensus Algorithm with Constant Message Size . . . . . . . . . . . . . . 257
14.5.1 Features of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
14.5.2 Presentation of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 257
14.5.3 Proof and Properties of the Algorithm . . . . . . . . . . . . . . . . . . . . . . 258
14.6 From Binary to Multivalued Byzantine Consensus . . . . . . . . . . . . . . . . . . . . 259
14.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
14.6.2 A Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
14.6.3 Proof of the Multivalued to Binary Reduction . . . . . . . . . . . . . . . . . . 261
14.6.4 An Interesting Property of the Construction . . . . . . . . . . . . . . . . . . . 263
14.7 Enriching the Synchronous Model with Message Authentication . . . . . . . . . . . . . 263
14.7.1 Synchronous Model with Signed Messages . . . . . . . . . . . . . . . . . . . 263
14.7.2 The Gain Obtained from Signatures . . . . . . . . . . . . . . . . . . . . . . . 264
14.7.3 A Synchronous Signature-Based Consensus Algorithm . . . . . . . . . . . . . 264
14.7.4 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
14.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
14.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
14.10 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

V Agreement in Asynchronous Systems 269

15 Implementable Agreement Abstractions


Despite Asynchrony and a Minority of Process Crashes 271
15.1 The Renaming Agreement Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 271
15.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
15.1.2 A Fundamental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
15.1.3 The Stacking Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
15.1.4 A Snapshot-based Implementation of Renaming . . . . . . . . . . . . . . . . 274
15.1.5 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
15.2 The Approximate Agreement Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 276
15.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
15.2.2 A Read/Write-based Implementation of Approximate Agreement . . . . . . . 277
xvi Contents

15.2.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277


15.3 The Safe Agreement Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.3.2 A Direct Implementation of Safe Agreement in CAMP n,t [t < n/2] . . . . . . 280
15.3.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
15.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
15.6 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

16 Consensus:
Power and Implementability Limit in Crash-Prone Asynchronous Systems 287
16.1 The Total Order Broadcast Communication Abstraction . . . . . . . . . . . . . . . . . 287
16.1.1 Total Order Broadcast: Definition . . . . . . . . . . . . . . . . . . . . . . . . 287
16.1.2 A Map of Communication Abstractions . . . . . . . . . . . . . . . . . . . . . 288
16.2 From Consensus to TO-broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.1 Structure of the Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.2 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
16.2.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
16.3 Consensus and TO-broadcast Are Equivalent . . . . . . . . . . . . . . . . . . . . . . . 292
16.4 The State Machine Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.4.1 State Machine Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.4.2 Sequentially-Defined Abstractions (Objects) . . . . . . . . . . . . . . . . . . 294
16.5 A Simple Consensus-based Universal Construction . . . . . . . . . . . . . . . . . . . . 295
16.6 Agreement vs Mutual Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.7 Ledger Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.7.2 Implementation of a Ledger in CAMP n,t [TO-broadcast] . . . . . . . . . . . . 299
16.8 Consensus Impossibility in the Presence of Crashes and Asynchrony . . . . . . . . . . 300
16.8.1 The Intuition That Underlies the Impossibility . . . . . . . . . . . . . . . . . . 300
16.8.2 Refining the Definition of CAMP n,t [∅] . . . . . . . . . . . . . . . . . . . . . 301
16.8.3 Notion of Valence of a Global State . . . . . . . . . . . . . . . . . . . . . . . 303
16.8.4 Consensus Is Impossible in CAMP n,1 [∅] . . . . . . . . . . . . . . . . . . . . 304
16.9 The Frontier Between Read/Write Registers and Consensus . . . . . . . . . . . . . . . 309
16.9.1 The Main Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
16.9.2 The Notion of Consensus Number in Read/Write Systems . . . . . . . . . . . 310
16.9.3 An Illustration of Herlihy’s Hierarchy . . . . . . . . . . . . . . . . . . . . . . 310
16.9.4 The Consensus Number of a Ledger . . . . . . . . . . . . . . . . . . . . . . . 313
16.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
16.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
16.12 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

17 Implementing Consensus in Enriched Crash-Prone Asynchronous Systems 317


17.1 Enriching an Asynchronous System to Implement Consensus . . . . . . . . . . . . . . 317
17.2 A Message Scheduling Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
17.2.1 Message Scheduling (MS) Assumption . . . . . . . . . . . . . . . . . . . . . 318
17.2.2 A Binary Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 318
17.2.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
17.2.4 Additional Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
17.3 Enriching CAMP n,t [∅] with a Perpetual Failure Detector . . . . . . . . . . . . . . . . 321
17.3.1 Enriching CAMP n,t [∅] with a Perfect Failure Detector . . . . . . . . . . . . . 321
Contents xvii

17.4 Enriching CAMP n,t [t < n/2] with an Eventual Leader . . . . . . . . . . . . . . . . . 323
17.4.1 The Weakest Failure Detector to Implement Consensus . . . . . . . . . . . . . 323
17.4.2 Implementing Consensus in CAMP n,t [t < n/2, Ω] . . . . . . . . . . . . . . 324
17.4.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
17.4.4 Consensus Versus Eventual Leader Failure Detector . . . . . . . . . . . . . . 329
17.4.5 Notions of Indulgence and Zero-degradation . . . . . . . . . . . . . . . . . . 329
17.4.6 Saving Broadcast Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
17.5 Enriching CAMP n,t [t < n/2] with Randomization . . . . . . . . . . . . . . . . . . . 330
17.5.1 Asynchronous Randomized Models . . . . . . . . . . . . . . . . . . . . . . . 330
17.5.2 Randomized Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
17.5.3 Randomized Binary Consensus in CAMP n,t [t < n/2, LC] . . . . . . . . . . . 331
17.5.4 Randomized Binary Consensus in CAMP n,t [t < n/2, CC] . . . . . . . . . . . 334
17.6 Enriching CAMP n,t [t < n/2] with a Hybrid Approach . . . . . . . . . . . . . . . . . 337
17.6.1 The Hybrid Approach: Failure Detector and Randomization . . . . . . . . . . 337
17.6.2 A Hybrid Binary Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . 338
17.7 A Paxos-inspired Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 339
17.7.1 The Alpha Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 340
17.7.2 Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
17.7.3 An Implementation of Alpha in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 341
17.8 From Binary to Multivalued Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . 344
17.8.1 A Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
17.8.2 Proof of the Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 345
17.9 Consensus in One Communication Step . . . . . . . . . . . . . . . . . . . . . . . . . . 346
17.9.1 Aim and Model Assumption on t . . . . . . . . . . . . . . . . . . . . . . . . 346
17.9.2 A One Communication Step Algorithm . . . . . . . . . . . . . . . . . . . . . 346
17.9.3 Proof of the Early Deciding Algorithm . . . . . . . . . . . . . . . . . . . . . 347
17.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
17.11 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
17.12 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

18 Implementing Oracles
in Asynchronous Systems with Process Crash Failures 353
18.1 The Two Facets of Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
18.1.1 The Programming Point of View: Modular Building Block . . . . . . . . . . . 354
18.1.2 The Computability Point of View: Abstraction Ranking . . . . . . . . . . . . 354
18.2 Ω in CAMP n,t [∅]: a Direct Impossibility Proof . . . . . . . . . . . . . . . . . . . . . . 355
18.3 Constructing a Perfect Failure Detector (Class P ) . . . . . . . . . . . . . . . . . . . . 356
18.3.1 Reminder: Definition of the Class P of Perfect Failure Detectors . . . . . . . . 356
18.3.2 Use of an Underlying Synchronous System . . . . . . . . . . . . . . . . . . . 357
18.3.3 Applications Generating a Fair Communication Pattern . . . . . . . . . . . . . 358
18.3.4 The Theta Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
18.4 Constructing an Eventually Perfect Failure Detector (Class 3P ) . . . . . . . . . . . . . 361
18.4.1 Reminder: Definition of an Eventually Perfect Failure Detector . . . . . . . . 361
18.4.2 From Perpetual to Eventual Properties . . . . . . . . . . . . . . . . . . . . . . 361
18.4.3 Eventually Synchronous Systems . . . . . . . . . . . . . . . . . . . . . . . . 361
18.5 On the Efficient Monitoring of a Process by Another Process . . . . . . . . . . . . . . 363
18.5.1 Motivation and System Model . . . . . . . . . . . . . . . . . . . . . . . . . . 363
18.5.2 A Monitoring Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
18.6 An Adaptive Monitoring-based Algorithm Building 3P . . . . . . . . . . . . . . . . . 366
18.6.1 Motivation and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
xviii Contents

18.6.2 A Monitoring-Based Adaptive Algorithm for the Failure Detector Class 3P . . 366
18.6.3 Proof the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
18.7 From the t-Source Assumption to an Ω Eventual Leader . . . . . . . . . . . . . . . . . 369
18.7.1 The 3t-Source Assumption and the Model CAMP n,t [3t-SOURCE] . . . . . 369
18.7.2 Electing an Eventual Leader in CAMP n,t [3t-SOURCE] . . . . . . . . . . . . 370
18.7.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
18.8 Electing an Eventual Leader in CAMP n,t [3t-MS PAT] . . . . . . . . . . . . . . . . . 372
18.8.1 A Query/Response Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
18.8.2 Electing an Eventual Leader in CAMP n,t [3t-MS PAT] . . . . . . . . . . . . 374
18.8.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
18.9 Building Ω in a Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
18.10 Construction of a Biased Common Coin from Local Coins . . . . . . . . . . . . . . . . 377
18.10.1 Definition of a Biased Common Coin . . . . . . . . . . . . . . . . . . . . . . 377
18.10.2 The CORE Communication Abstraction . . . . . . . . . . . . . . . . . . . . . 377
18.10.3 Construction of a Common Coin with a Constant Bias . . . . . . . . . . . . . 380
18.10.4 On the Use of a Biased Common Coin . . . . . . . . . . . . . . . . . . . . . . 381
18.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
18.12 Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
18.13 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

19 Implementing Consensus in Enriched Byzantine Asynchronous Systems 385


19.1 Definition Reminder and Two Observations . . . . . . . . . . . . . . . . . . . . . . . . 385
19.1.1 Definition of Byzantine Consensus (Reminder) . . . . . . . . . . . . . . . . . 385
19.1.2 Why Not to Use an Eventual Leader . . . . . . . . . . . . . . . . . . . . . . . 386
19.1.3 On the Weakest Synchrony Assumption for Byzantine Consensus . . . . . . . 386
19.2 Binary Byzantine Consensus from a Message Scheduling Assumption . . . . . . . . . 387
19.2.1 A Message Scheduling Assumption . . . . . . . . . . . . . . . . . . . . . . . 387
19.2.2 A Binary Byzantine Consensus Algorithm . . . . . . . . . . . . . . . . . . . . 387
19.2.3 Proof of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
19.2.4 Additional Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
19.3 An Optimal Randomized Binary Byzantine Consensus Algorithm . . . . . . . . . . . . 389
19.3.1 The Binary-Value Broadcast Abstraction . . . . . . . . . . . . . . . . . . . . 389
19.3.2 A Binary Randomized Consensus Algorithm . . . . . . . . . . . . . . . . . . 391
19.3.3 Proof of the BV-Based Binary Byzantine Consensus Algorithm . . . . . . . . 393
19.3.4 From Decision to Decision and Termination . . . . . . . . . . . . . . . . . . . 395
19.4 From Binary to Multivalued Byzantine Consensus . . . . . . . . . . . . . . . . . . . . 396
19.4.1 A Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
19.4.2 Proof of the Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 398
19.5 From Binary to No-intrusion Multivalued Byzantine Consensus . . . . . . . . . . . . . 399
19.5.1 The Validated Byzantine Broadcast Abstraction . . . . . . . . . . . . . . . . . 399
19.5.2 An Algorithm Implementing VBB-broadcast . . . . . . . . . . . . . . . . . . 399
19.5.3 Proof of the VBB-broadcast Algorithm . . . . . . . . . . . . . . . . . . . . . 401
19.5.4 A VBB-Based Multivalued to Binary Byzantine Consensus Reduction . . . . . 402
19.5.5 Proof of the VBB-Based Reduction Algorithm . . . . . . . . . . . . . . . . . 403
19.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
19.7 Appendix:
Proof-of-Work (PoW) Seen as Eventual Byzantine Agreement . . . . . . . . . . . . . . 405
19.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
19.9 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Contents xix

VI Appendix 409

20 Quorum, Signatures, and Overlays 411


20.1 Quorum Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
20.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
20.1.2 Examples of Use of a Quorum System . . . . . . . . . . . . . . . . . . . . . . 412
20.1.3 A Few Classical Quorums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
20.1.4 Quorum Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
20.2 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
20.2.1 Cipher, Keys, and Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
20.2.2 How to Build a Secret Key: Diffie-Hellman’s Algorithm . . . . . . . . . . . . 416
20.2.3 How to Build a Public Key: Rivest-Shamir-Adleman’s (RSA) Algorithm . . . 417
20.2.4 How to Share a Secret: Shamir’s Algorithm . . . . . . . . . . . . . . . . . . . 417
20.3 Overlay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
20.3.1 On Regular Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
20.3.2 Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
20.3.3 de Bruijn Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
20.3.4 Kautz Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
20.3.5 Undirected de Bruijn and Kautz Graphs . . . . . . . . . . . . . . . . . . . . . 422
20.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Afterword 425

Bibliography 431

Index 453
Notation

Symbols

skip, no-op empty statement


process program in action
n number of processes
correct (or non-faulty) process process that does not fail during an execution
faulty process process that fails during an execution
t upper bound on the number of faulty of processes
f actual number of faulty of processes
pi process whose index (or identity) is i
idi identity of process pi (very often idi = i)
τ time instant (from an external observer point of view)
[1..m] set {1, ..., m}
AA[1..m] array with m entries (vector)
equal(a, I) occurrence number of a in the vector (or multiset) I
a, b pair with elements a and b
a, b, c triple with elements a, b, and c
XX small capital letters: message type (message tag)
xxi italics lower-case letters: local variable of process pi
xxi ← v assignment of value v to xxi
XX abstract variable known only by an external observer
xxri , XX r values of xxi , XX at the end of round r
m1 ; ...; mq  sequence of messages
ai [1..s] array of size s (local to process pi )
for each i ∈ {1, ..., m} do statements end for order irrelevant
for each i from 1 to m do statements end for order relevant
wait (P ) while ¬P do no-op end while
return (v) returns v and terminates the operation invocation
% blablabla % comments
; sequentiality operator between two statements
⊕ concatenation
 empty sequence (list)
|σ| size of the sequence σ

The notation broadcast TYPE(m), where TYPE is a message type and m a message content, is used
as a shortcut for “for each j ∈ {1, · · · , n} do send TYPE(m) to pj end for”. Hence, if it is not faulty
during its execution, pi sends the message TYPE(m) to each process, including itself. Otherwise there
is no guarantee on the reception of TYPE(m).
(In Chap. 1 only, j ∈ {1, · · · , n} is replaced by j ∈ neighborsi .)

xxi
xxii Notation

Acronyms (1)

SWMR single-writer/multi-reader register


MWSR multi-writer/single-reader register
SWMR single-writer/multi-reader register

CAMP Crash asynchronous message-passing


CSMP Crash synchronous message-passing
BAMP Byzantine asynchronous message-passing
BSMP Byzantine synchronous message-passing

EIG Exponential information gathering


RB Reliable broadcast
URB Uniform reliable broadcast
ND No-duplicity broadcast
BRB Byzantine reliable broadcast
BV Byzantine binary value broadcast
VBB Validated Byzantine broadcast
CC Consensus in the process crash model
BC Consensus in the Byzantine process model
SA Set-agreement
BBC Byzantine binary consensus
ICC Interactive consistency (vector consensus), crash model
SC Simultaneous (synchronous) consensus
CORE CORE-broadcast

CC-property Crash consensus property


BC-property Byzantine consensus property
Notation xxiii

Acronyms (2)

CO Causal order
FIFO First in first out
TO Total order
SCD Set-constrained delivery
FC Fair channel
CRDT Conflict-free replicated data type
MS PAT Message pattern

ADV Adversary
FD Failure detector
HB Heartbeat
MS PAT Message pattern
SO Send omission
GO General omission
MS Message scheduling assumption

LC Local coin
CC Common coin
BCCB Binary common coin with bias

GST Global stabilization time


List of Figures and Algorithms

1.1 Basic structure of distributed computing . . . . . . . . . . . . . . . . . . . . . . . . 4


1.2 Three graph types of particular interest . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Synchronous execution (left) vs. asynchronous execution (right) . . . . . . . . . . . 5
1.4 Algorithm structure of a common decision-making process . . . . . . . . . . . . . . 8
1.5 A simple distributed computing framework . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Examples of graphs produced by a message adversary . . . . . . . . . . . . . . . . 13
1.7 Distributed computation in SMP n [TREE-AD] (code for pi ) . . . . . . . . . . . . . 14
1.8 The property limiting the power of a TREE-AD message adversary . . . . . . . . . 14
1.9 Process mobility can be captured by a message adversary in synchronous systems . . 16
1.10 Sequential or parallel computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 An example of the uniform reliable broadcast delivery guarantees . . . . . . . . . . 25


2.2 URB-broadcast: architectural view . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Uniform reliable broadcast in CAMP n,t [∅] (code for pi ) . . . . . . . . . . . . . . . 26
2.4 From URB to FIFO-URB and CO-URB in CAMP n,t [∅] . . . . . . . . . . . . . . . 27
2.5 An example of FIFO-URB message delivery . . . . . . . . . . . . . . . . . . . . . 28
2.6 FIFO-URB uniform reliable broadcast: architecture view . . . . . . . . . . . . . . . 28
2.7 FIFO-URB message delivery in AS n,t [∅] (code for pi ) . . . . . . . . . . . . . . . . 29
2.8 An example of CO message delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 A simple URB-based CO-broadcast construction in CAMP n,t [∅] (code for pi ) . . . 31
2.10 From FIFO-URB to CO-URB message delivery in AS n,t [∅] (code for pi ) . . . . . . 32
2.11 How the sequence of messages im causal pasti is built . . . . . . . . . . . . . . . 32
2.12 From URB to CO message delivery in AS n,t [∅] (code for pi ) . . . . . . . . . . . . . 35
2.13 How vectors are used to construct the CO-broadcast abstraction . . . . . . . . . . . 36
2.14 Proof of the CO-delivery property (second construction) . . . . . . . . . . . . . . . 37
2.15 Total order message delivery requires cooperation . . . . . . . . . . . . . . . . . . 38
2.16 Broadcast of lifetime-constrained messages . . . . . . . . . . . . . . . . . . . . . . 40

3.1 Uniform reliable broadcast in CAMP n,t [- FC, t < n/2] (code for pi ) . . . . . . . . 45
3.2 Building Θ in CAMP n,t [- FC, t < n/2] (code for pi ) . . . . . . . . . . . . . . . . 50
3.3 Quiescent uniform reliable broadcast in CAMP n,t [- FC, Θ, P ] (code for pi ) . . . . 53
3.4 Quiescent uniform reliable broadcast in CAMP n,t [- FC, Θ, HB ] (code for pi ) . . . 56
3.5 An example of a network with fair paths . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Implementing ND-broadcast in BAMP n,t [t < n/3] . . . . . . . . . . . . . . . . . 64


4.2 An example of ND-broadcast with a Byzantine sender . . . . . . . . . . . . . . . . 65
4.3 Implementing BRB-broadcast in BAMP n,t [t < n/3] . . . . . . . . . . . . . . . . . 67
4.4 Benefiting from message asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Exploiting message asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Communication-efficient Byzantine BRB-broadcast in BAMP n,t [t < n/5] . . . . . 70

xxv
xxvi List of Figures and Algorithms

5.1 Possible behaviors of a regular register . . . . . . . . . . . . . . . . . . . . . . . . 78


5.2 A regular register has no sequential specification . . . . . . . . . . . . . . . . . . . 79
5.3 Behavior of an atomic register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Behavior of a sequentially consistent register . . . . . . . . . . . . . . . . . . . . . 81
5.5 Example of a history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.6 Partial order on the operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.7 Developing op1 →H op2 →X op3 →H op4 . . . . . . . . . . . . . . . . . . . . . 86
5.8 The execution of the register R is sequentially consistent . . . . . . . . . . . . . . . 87
5.9 The execution of the register R is sequentially consistent . . . . . . . . . . . . . . 87
5.10 An execution involving the registers R and R . . . . . . . . . . . . . . . . . . . . . 87
5.11 There is no atomic register algorithm in CAMP n,t [∅] . . . . . . . . . . . . . . . . . 88
5.12 There is no algorithm for two sequentially consistent registers in CAMP n,t [t ≥ n/2] 89
5.13 Tradeoff duration(read) + duration(write) ≥ δ . . . . . . . . . . . . . . . . . . . 91
5.14 duration(write) ≥ u/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.1 Building a read/write memory on top of CAMP n,t [t ≤ n/2] . . . . . . . . . . . . . 96


6.2 An algorithm that constructs an SWMR regular register in CAMP n,t [t < n/2] . . . 98
6.3 Regularity is not atomicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 SWMR register: from regularity to atomicity . . . . . . . . . . . . . . . . . . . . . 101
6.5 Construction of an atomic MWMR register in CAMP n,t [t < n/2] (code for any pi ) 103
6.6 Fast read algorithm implementing sequential consistency (code for pi ) . . . . . . . . 106
6.7 Benefiting from TO-broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.8 Fast write algorithm implementing sequential consistency (code for pi ) . . . . . . . 108
6.9 Fast enqueue algorithm implementing a sequentially consistent queue (code for pi ) . 108
6.10 Construction of a sequentially consistent MWMR register in CAMP n,t [t < n/2]
(code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.11 Message exchange pattern for a write operation . . . . . . . . . . . . . . . . . . . . 110
6.12 First message exchange pattern for a read operation . . . . . . . . . . . . . . . . . . 111
6.13 Logical time vs. physical time for write operations . . . . . . . . . . . . . . . . . . 112
6.14 An execution  H d |X in which resp(op1) < d inv(read2) . . . . . . . . . . . .
H |X 113

7.1 Building a failure detector of the class Σ in CAMP n,t [t < n/2] . . . . . . . . . . . 120
7.2 An algorithm for an atomic SWSR register in CAMP n,t [Σ] . . . . . . . . . . . . . 121
7.3 Extracting Σ from a register D-based algorithm A . . . . . . . . . . . . . . . . . . 122
7.4 Extracting Σ from a failure detector-based register algorithm A (code for pi ) . . . . 124
7.5 From atomic registers to URB-broadcast (code for pi ) . . . . . . . . . . . . . . . . 127
7.6 From the failure detector class Σ to the URB abstraction (1 ≤ t < n) . . . . . . . . 128
7.7 Two examples of the hybrid communication model . . . . . . . . . . . . . . . . . . 129

8.1 An implementation of SCD-broadcast in CAMP n,t [t < n/2] (code for pi ) . . . . . 134
8.2 Message pattern introduced in Lemma 16 . . . . . . . . . . . . . . . . . . . . . . . 137
8.3 SCD-broadcast-based communication pattern (code for pi ) . . . . . . . . . . . . . . 139
8.4 Construction of an MWMR atomic register in CAMP n,t [SCD-broadcast] (code for
pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.5 Construction of an MWMR sequentially consistent register in CAMP n,t [SCD-broadcast]
(code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.6 Example of a run of an MWMR atomic snapshot object . . . . . . . . . . . . . . . 143
8.7 Construction of an MWMR atomic snapshot object in CAMP n,t [SCD-broadcast] . . 144
8.8 Construction of an atomic counter in CAMP n,t [SCD-broadcast] (code for pi ) . . . . 145
List of Figures and Algorithms xxvii

8.9 Construction of a sequentially consistent counter in CAMP n,t [SCD-broadcast] (code


for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.10 Solving lattice agreement in CAMP n,t [SCD-broadcast] (code for pi ) . . . . . . . . 148
8.11 An implementation of SCD-broadcast on top of snapshot objects (code for pi ) . . . . 149

9.1 Execution E1 (impossibility of an SWMR register in BAMP n,t [t ≥ n/3]) . . . . . 157


9.2 Execution E2 (impossibility of an SWMR register in BAMP n,t [t ≥ n/3]) . . . . . 158
9.3 Execution E3 (impossibility of an SWMR register in BAMP n,t [t ≥ n/3]) . . . . . 158
9.4 Reliable broadcast with sequence numbers in BAMP n,t [t < n/3] (code for pi ) . . . 160
9.5 Atomic SWMR Registers in BAMP n,t [t < n/3] (code for pi ) . . . . . . . . . . . . 162
9.6 One-shot write-snapshot in BAMP n,t [t < n/3] (code for pi ) . . . . . . . . . . . . . 167
9.7 Correct-only agreement in BAMP n,t [t < n/(w + 1)] . . . . . . . . . . . . . . . . 168

10.1 A simple (unfair) t-resilient consensus algorithm in CSMP n,t [∅] (code for pi ) . . . . 175
10.2 A simple (fair) t-resilient consensus algorithm in CSMP n,t [∅] (code for pi ) . . . . . 176
10.3 The second case of the agreement property (with t = 3 crashes) . . . . . . . . . . . 177
10.4 A t-resilient interactive consistency algorithm in CSMP n,t [∅] (code for pi ) . . . . . 179
10.5 Three possible one-round extensions from Et−1 . . . . . . . . . . . . . . . . . . . . 183
10.6 Extending the k-round execution Ek . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.7 Extending two (k + 1)-round executions . . . . . . . . . . . . . . . . . . . . . . . 185
10.8 Extending again two (k + 1)-round executions . . . . . . . . . . . . . . . . . . . . 185

11.1 Early decision predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


11.2 An early deciding t-resilient interactive consistency algorithm (code for pi ) . . . . . 192
11.3 Early stopping synchronous consensus (code for pi , t < n) . . . . . . . . . . . . . . 195
11.4 The early decision predicate revealed0(i, r) in action . . . . . . . . . . . . . . . . . 197
11.5 Local graphs of p2 , p3 , and p4 at the end of round r = 1 . . . . . . . . . . . . . . . 198
11.6 Local graphs of p3 and p4 at the end of round r = 2 . . . . . . . . . . . . . . . . . 198
11.7 CGM : Early deciding synchronous consensus based on PREF0() (code for pi , t < n)199
11.8 Hierarchy of classes of conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
11.9 A condition-based consensus algorithm (code for pi ) . . . . . . . . . . . . . . . . . 205
11.10 Synchronous consensus with a fast failure detector (code for pi ) . . . . . . . . . . . 209
11.11 Relevant dates for process pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11.12 Early deciding synchronous consensus with a fast failure detector (code for pi ) . . . 211
11.13 The pattern used in the proof of the CC-agreement property . . . . . . . . . . . . . 211

12.1 Clean round vs failure-free round . . . . . . . . . . . . . . . . . . . . . . . . . . . 217


12.2 Existence of a clean round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.3 Optimal simultaneous consensus in the system model CSMP n,t [∅] (code for pi ) . . . 219
12.4 Computing the current horizon value . . . . . . . . . . . . . . . . . . . . . . . . . 219
12.5 A simple k-set agreement algorithm for the model CSMP n,t [∅] (code for pi ) . . . . 223
12.6 Early stopping synchronous k-set agreement (code for pi , t < n) . . . . . . . . . . . 224
12.7 The differential predicate PREF(i, r) for k-set agreement . . . . . . . . . . . . . . 224
12.8 A condition-based simultaneous consensus algorithm (code for pi ) . . . . . . . . . . 228
12.9 A simple k-set agreement algorithm for the model CSMP n,t [SO] (code for pi ) . . . 229

13.1 A consensus-based NBAC algorithm in CSMP n,t [∅] (code for pi ) . . . . . . . . . . 232
13.2 Impossibility of having both fast commit and fast abort when t ≥ 3 (E3) . . . . . . . 234
13.3 Impossibility of having both fast commit and fast abort when t ≥ 3 (E4, E5) . . . . 235
13.4 Fast commit and weak fast abort NBAC in CSMP n,t [3 ≤ t < n] (code for pi ) . . . . 237
13.5 Fast abort and weak fast commit NBAC in CSMP n,t [3 ≤ t < n] (code for pi ) . . . . 242
xxviii List of Figures and Algorithms

13.6 Fast commit and fast abort NBAC in the system model CSMP n,t [t ≤ 2] (code for pi ) 243

14.1 Interactive consistency for four processes despite one Byzantine process (code for pi ) 248
14.2 Proof of the interactive consistency algorithm in BSMP n,t [t = 1, n = 4] . . . . . . 249
14.3 Communication graph (left) and behavior of the t Byzantine processes (right) . . . . 251
14.4 EIG tree for n = 4 and t = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
14.5 Byzantine EIG consensus algorithm for BSMP n,t [t < n/3] . . . . . . . . . . . . . 253
14.6 EIG trees of the correct processes at the end of the first round . . . . . . . . . . . . 254
14.7 EIG tree tree2 at the end of the second round . . . . . . . . . . . . . . . . . . . . . 255
14.8 Constant message size Byzantine consensus in BSMP n,t [t < n/4] . . . . . . . . . . 258
14.9 From binary to multivalued Byzantine consensus in BSMP n,t [t < n/3] (code for pi ) 260
14.10 Proof of Property PR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
14.11 Deterministic vs non-deterministic scenarios . . . . . . . . . . . . . . . . . . . . . 263
14.12 A Byzantine signature-based consensus algorithm in BSMP n,t [SIG; t < n/2]
(code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

15.1 Stacking of abstraction layers for distributed renaming in CAMP n,t [t < n/2] . . . . 273
15.2 A simple snapshot-based size-adaptive (2p − 1)-renaming algorithm (code for pi ) . 274
15.3 A simple snapshot-based approximate algorithm (code for pi ) . . . . . . . . . . . . 277
15.4 What is captured by Lemma 62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.5 Safe agreement in CAMP n,t [t < n/2] (code for process pi ) . . . . . . . . . . . . . 281

16.1 Adding total order message delivery to various URB abstractions . . . . . . . . . . 288
16.2 Adding total order message delivery to the URB abstraction . . . . . . . . . . . . . 289
16.3 Building the TO-broadcast abstraction in CAMP n,t [CONS] (code for pi ) . . . . . . 290
16.4 Building the consensus abstraction in CAMP n,t [TO-broadcast] (code for pi ) . . . . 293
16.5 A TO-broadcast-based universal construction (code for pi ) . . . . . . . . . . . . . . 295
16.6 A state machine does not allow us to retrieve the past . . . . . . . . . . . . . . . . . 298
16.7 Building the consensus abstraction in CAMP n,t [LEDGER] (code for pi ) . . . . . . 298
16.8 A TO-broadcast-based ledger construction (code for pi ) . . . . . . . . . . . . . . . 299
16.9 Synchrony rules out uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.10 To wait or not to wait in presence of asynchrony and failures? . . . . . . . . . . . . 301
16.11 Bivalent vs univalent global states . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
16.12 There is a bivalent initial configuration . . . . . . . . . . . . . . . . . . . . . . . . 305
16.13 Illustrating the sets S1 and S2 used in Lemma 70 . . . . . . . . . . . . . . . . . . . 306
16.14 Σ2 contains 0-valent and 1-valent global states . . . . . . . . . . . . . . . . . . . . 307
16.15 Valence contradiction when i = i . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
16.16 Valence contradiction when i = i . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
16.17 k-sliding window register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.18 Solving consensus for k processes from a k-sliding window (code for pi ) . . . . . . 311
16.19 Schedule illustration: case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.20 Schedule illustration: case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.21 Building the TO-broadcast abstraction in CAMP n,t [- FC, CONS] (code for pi ) . . . 316

17.1 Binary consensus in CAMP n,t [t < n/2, MS] (code for pi ) . . . . . . . . . . . . . 319
17.2 A coordinator-based consensus algorithm for CAMP n,t [P ] (code for pi ) . . . . . . 322
17.3 Ω is a consensus computability lower bound . . . . . . . . . . . . . . . . . . . . . . 325
17.4 An algorithm implementing consensus in CAMP n,t [t < n/2, Ω] (code for pi ) . . . 326
17.5 The second phase for AS n,t [t < n/3, Ω] (code for pi ) . . . . . . . . . . . . . . . . 330
17.6 A randomized binary consensus algorithm for CAMP n,t [t < n/2, LC] (code for pi ) 332
17.7 What is broken by a random oracle . . . . . . . . . . . . . . . . . . . . . . . . . . 333
List of Figures and Algorithms xxix

17.8 A randomized binary consensus algorithm for CAMP n,t [t < n/2, CC] (code for pi ) 336
17.9 A hybrid binary consensus algorithm for CAMP n,t [t < n/2, Ω, LC] (code for pi ) . . 338
17.10 An Alpha-based consensus algorithm in CAMP n,t [t < n/2, Ω] (code for pi ) . . . . 340
17.11 An algorithm implementing Alpha in CAMP n,t [t < n/2] . . . . . . . . . . . . . . 342
17.12 A reduction of multivalued to binary consensus in CAMP n,t [BC] (code for pi ) . . . 344
17.13 Consensus in one communication step in CAMP n,t [t < n/3, CONS] (code for pi ) . 347
17.14 Is this consensus algorithm for CAMP n,t [t < n/2, AΩ] correct? (code for pi ) . . . 351

18.1 A simple process monitoring algorithm implementing P (code for pi ) . . . . . . . . 357


18.2 Building a perfect failure detector P from α-fair communication (code for pi ) . . . . 358
18.3 Building a perfect failure detector P in CAMP n,t [θ] (code for pi ) . . . . . . . . . . 360
18.4 Example message pattern in the model CAMP n,t [θ] with θ = 3 . . . . . . . . . . . 360
18.5 Building 3P from eventual 3α-fair communication (code for pi ) . . . . . . . . . . 362
18.6 Building 3P in CAMP n,t [3SYNC] (code for pi ) . . . . . . . . . . . . . . . . . . 362
18.7 The maximal value of timeouti [j] after GST . . . . . . . . . . . . . . . . . . . . . 363
18.8 Possible issues with timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
18.9 A simple monitoring algorithm (pi monitors pj ) . . . . . . . . . . . . . . . . . . . . 365
18.10 The three cases for the arrival of ALIVE(j, sn) . . . . . . . . . . . . . . . . . . . . 365
18.11 An adaptive algorithm that builds 3P in CAMP n,t [3SYNC] (code for pi ) . . . . . 367
18.12 Building Ω in CAMP n,t [3t-SOURCE] (code for pi ) . . . . . . . . . . . . . . . . . 371
18.13 Winning vs losing responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
18.14 An example illustrating the assumption 3t-MS PAT . . . . . . . . . . . . . . . . . 373
18.15 Building Ω in CAMP n,t [3t-MS PAT] (code for pi ) . . . . . . . . . . . . . . . . . 374
18.16 Algorithm implementing CORE-broadcast in CAMP n,t [t < n/2] (code for pi ) . . . 378
18.17 Definition of W [i, j] = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
18.18 Common coin with bias ρ ≥ 1/4 in CAMP n,t [t < n/2, LC,FM] (code for pi ) . . . . 380
18.19 Does it build a biased common coin in CAMP n,t [t < n/3, LC] (code for pi )? . . . . 383

19.1 Binary consensus in BAMP n,t [t < n/3, TMS] (code for pi ) . . . . . . . . . . . . . 387
19.2 An algorithm implementing BV-broadcast in BAMP n,t [t < n/3] (code for pi ) . . . 390
19.3 A BV-broadcast-based binary consensus algorithm for the model BAMP n,t [n >
3t, CC] (code for pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
19.4 From multivalued to binary Byzantine consensus in BAMP n,t [t < n/3, BBC]
(code of pi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
19.5 VBB-broadcast on top of reliable broadcast in BAMP n,t [t < n/3] (code of pi ) . . . 400
19.6 From multivalued to binary consensus in BAMP n,t [t < n/3, BBC] (code for pi ) . . 403
19.7 Local blockchain representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

20.1 An order two projective plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414


20.2 Structure of a cryptography system . . . . . . . . . . . . . . . . . . . . . . . . . . 415
20.3 Hypercubes H(1), H(2), and H(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
20.4 Hypercube H(4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
20.5 The de Bruijn directed networks dB(2,1), dB(2,2), and dB(2,3) . . . . . . . . . . . . 421
20.6 Kautz graphs K(2, 1) and K(2, 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
20.7 Kautz graph K(2, 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
List of Tables

1.1 Four classic fault-prone distributed computing models . . . . . . . . . . . . . . . . . . 16

4.1 Comparing the three Byzantine reliable broadcast algorithms . . . . . . . . . . . . . . 72

6.1 Cost of algorithms implementing read/write registers . . . . . . . . . . . . . . . . . . 115

9.1 Crash vs Byzantine failures: cost comparisons . . . . . . . . . . . . . . . . . . . . . . 163

10.1 Crash pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180


10.2 Missing messages due to the crash of pi . . . . . . . . . . . . . . . . . . . . . . . . . 185

11.1 Examples of (maximal and non-maximal) legal conditions . . . . . . . . . . . . . . . 203

14.1 Upper bounds on the number of faulty processes for consensus . . . . . . . . . . . . . 245

16.1 Read/write register vs consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314


√ √
20.1 Defining quorums from a n × n grid . . . . . . . . . . . . . . . . . . . . . . . . . 413
20.2 Number of vertices for D = Δ = 4, 6, 8, 10 . . . . . . . . . . . . . . . . . . . . . . . 422

xxxi
Part I

Introductory Chapter

1
Another random document with
no related content on Scribd:
capable of producing so much sound. I have never observed this
habit upon a dull or cloudy day.”
Mr Nuttall having presented me with the nest of this species
attached to the twig to which the bird had fastened it, my amiable
friend Miss Martin has figured it for me, as well as the plant, about
which these lovely creatures are represented. The nest, which
measures two inches and a quarter in height, and an inch and three
quarters in breadth, at the upper part, is composed externally of
mosses, lichens, and a few feathers, with slender fibrous roots
interwoven, and lined with fine cottony seed-down.

Trochilus rufus, Gmel. Syst. Nat. vol. i. p. 497.


Trochilus collaris, Lath. Ind. Ornith. vol. i. p. 318.
Trochilus (Selasphorus) Rufus, Swainson.
Cinnamon or Nootka Humming Bird, Richards. and Swains. Fauna Bor.-
Amer. vol. ii. p. 324.

Adult Male. Plate CCCLXXIX. Figs. 1, 2.


Bill long, straight, subulate, somewhat depressed at the base, acute;
upper mandible with the dorsal line straight, the ridge narrow at the
base, broad and convex toward the end, the sides convex, the edges
overlapping, the tip acuminate; lower mandible with the angle very
long and extremely narrow, the dorsal line straight, the edges erect,
the tip acuminate. Nostrils basal, linear.
Head of ordinary size, oblong; neck short; body slender. Feet very
small; tarsus very short, feathered more than half-way down, toes
small; the lateral equal, the middle toe not much longer, the hind toe
a little shorter than the lateral, anterior toes united at the base; claws
rather long, arched, compressed, laterally grooved, very acute.
Plumage soft and blended; feathers on the throat, fore part and sides
of the neck oblong-obovate, with the filaments towards the end
thickened and flattened, with metallic gloss, those on the sides of the
neck elongated and erectile. Wings rather short, extremely narrow,
falcate, pointed; the primaries rapidly graduated, the second being
longest, but only slightly longer than the first; these two quills taper to
a point; the rest are broader, and gradually become less pointed; the
secondaries are extremely short, and only five in number. Tail rather
long, broad, graduated, the lateral feathers four and a half twelfths of
an inch shorter than the central; the latter are extremely broad,
measuring four and a half twelfths across, and the rest gradually
diminish to the lateral, which are very narrow; all obtusely pointed.
Bill brownish-black; toes brown, claws dusky. The general colour of
the upper parts is bright cinnamon or reddish-orange; the head
bronzed green, the wings dusky, the coverts glossed with green, the
primaries with purplish; each of the tail-feathers has a narrow
longitudinal lanceolate median streak toward the end. The loral
space, a narrow band over the eye, another beneath it, and the
auriculars are reddish-orange; the scale-like feathers of the throat
and sides of the neck are splendent fire-red, purplish-red, yellowish-
red, greenish-yellow or yellowish-green, according to the light in
which they are viewed; behind them, on the lower part of the neck, is
a broad band of reddish-white; the rest of the lower parts are like the
upper, the abdomen inclining to white.
1/2
Length to end of tail 3 7/12 inches; bill along the ridge 7 /12, along
the edge of lower mandible 9 1/4/12; wing from flexure 1 7 1/4/12; tail
1 3 1/2/12; tarsus 1 1/2/12; hind toe 1 1/2/12, its claw 1 1/4/12; middle toe
2 1/4/ , its claw 1
1/2
/12.
12

Adult Female. Plate CCCLXXIX. Fig. 3.


The Female has the bill and feet coloured as in the male. The upper
parts are gold-green, the head inclining to brown; the wings as in the
male; the tail-feathers reddish-orange at the base, brownish-black
toward the end, the tip white. The lower parts are white, tinged with
rufous, of which colour, especially, are the sides; the throat marked
with roundish spots of metallic greenish-red.

Length to end of tail 3 7 1/2/12 inches; bill along the ridge 8 3/4/12; wing
from flexure 1 10/12; tail 1 1 1/2/12.

The above descriptions are from two individuals shot by Dr


Townsend on the “Columbia River, 30th May 1835.” A “young male,
Columbia River, 29th May 1835,” resembles the female as above
described, differing only in having the metallic spots on the throat
larger. A “young female, Columbia River, June 10th 1835,” differs
from the adult only in wanting the metallic spots on the throat, which
is spotted with greenish-brown.

Cleome heptaphylla.

The beautiful plant represented in the plate belongs to Tetradynamia


Siliquosa of the Linnæan arrangement, and to the genus Cleome,
characterized by having three nectariferous glandules at each corner
of the calyx, the lower excepted; all the petals ascending; the
germen stipitate; the siliqua unilocular, two-valved. The species, C.
heptaphylla, is distinguished by its septenate leaves, of which the
leaflets are lanceolate, acuminate, and of a deep green colour. It
grows in South Carolina and Georgia.
TENGMALM’S OWL.

Strix Tengmalmi, Gmel.


PLATE CCCLXXX. Male and Female.

I procured a fine male of this species at Bangor, in Maine, on the


Penobscot River, in the beginning of September 1832; but am
unacquainted with its habits, never having seen another individual
alive. Dr Townsend informs me that he found it first on the Malade
River Mountains, where it was so tame and unsuspicious, that Mr
Nuttall was enabled to approach within a few feet of it, as it sat
upon the bushes. Dr Richardson gives the following notice
respecting it in the Fauna Boreali-Americana:—“When it accidentally
wanders abroad in the day, it is so much dazzled by the light of the
sun as to become stupid, and it may then be easily caught by the
hand. Its cry in the night is a single melancholy note, repeated at
intervals of a minute or two. Mr Hutchins informs us that it builds a
nest of grass half-way up a pine tree, and lays two white eggs in the
month of May. It feeds on mice and beetles. I cannot state the extent
of its range, but believe that it inhabits all the woody country from
Great Slave Lake to the United States. On the banks of the
Saskatchewan it is so common that its voice is heard almost every
night by the traveller, wherever he selects his bivouac.”

Strix Tengmalmi, Gmel. Syst. Nat. vol. i. p. 291.—Lath. Ind. Ornith. vol. i. p.
65.
Strix Tengmalmi, Tengmalm’s Owl, Swains. and Richards. Fauna Bor.-
Amer. vol. ii. p. 94.

Adult Male. Plate CCCLXXX. Fig. 1.


Bill short, very deep, strong; upper mandible with its dorsal line
curved from the base, its ridge convex, as are the sides, the edges
sharp and incurved anteriorly, the tip very acute, and at its extremity
nearly perpendicular; the cere short, and bare on its upper part; the
lower mandible has the angle broad and short, the dorsal line slightly
convex, the edges inflected, towards the end incurved, with a notch
on each side close to the abruptly-rounded tip. Nostrils broadly
elliptical, oblique, in the fore part of the cere, which bulges
considerably behind them.
The head is extremely large, roundish, when viewed from above
somewhat triangular; the eyes large. The conch of the ear very large,
of an elliptical form, extending from the base of the lower jaw to near
the top of the head, being an inch and a quarter in length, with an
anterior semicircular operculum stretching along its whole length,
and an elevated margin behind. The neck is very short and thin; the
body very slender; but both appear very full on account of the vast
mass of plumage. The feet are rather short, and strong; the tarsi and
toes covered with very soft downy feathers, the extremities of the
latter with two scutella. The claws are slender, tapering to a fine
point, compressed, and curved.
The facial disk is complete, as is the ruff. The plumage is full, very
soft, and blended; the feathers broadly oblong and rounded. The
wings are rather long, very broad, much rounded; the third primary
longest, the fourth almost equal, the second four-twelfths of an inch
shorter, the first equal to the seventh; the barbs of the outer web of
the first, of half the second, and the terminal part of the third, free
and recurved. Tail of moderate length, arched, slightly rounded, of
twelve broad, rounded feathers.
Bill greyish-brown, yellowish-white at the end; claws yellowish-
brown, their tips dusky. The general colour of the upper parts is
greyish-brown tinged with olive. The feathers of the head have an
elliptical central white spot; those of the neck are similarly marked
with larger white spots, of which some are disposed so as to form a
semicircular band; the scapulars have two or four large round spots
near the end, and some of the dorsal feathers and wing-coverts have
single spots on the outer web. All the quills have marginal white
spots on both webs, arranged in transverse series, there being six
on the outer web of the third quill. On the tail are five series of
transversely elongated narrow white spots. The disk is yellowish-
white, anteriorly black; the ruff yellowish-white, mottled with dusky.
The throat is brown, the chin white. The general colour of the lower
parts is yellowish-white, longitudinally streaked with brown, some of
the feathers of the sides have two white spots near the end; the
tarsal and digital feathers greyish-yellow, with faint transverse bars of
brown.
Length to end of tail 11 inches; wing from flexure 6 10/12; bill along
the ridge 1; tarsus 11/12; hind toe 5/12, its claw 5/12; middle toe 9/12, its
claw 8/12.

Adult Female. Plate CCCLXXX. Fig. 2.


The Female resembles the male, but is considerably larger.
SNOW GOOSE.

Anser hyperboreus, Bonap.


PLATE CCCLXXXI. Adult Male and Young Female.

The geographical range of the Snow Goose is very extensive. It has


been observed in numerous flocks, travelling northward, by the
members of the recent overland expeditions. On the other hand, I
have found it in the Texas, and it is very abundant on the Columbia
River, together with Hutchins’s Goose. In the latter part of autumn,
and during winter, I have met with it in every part of the United States
that I have visited.
While residing at Henderson on the Ohio, I never failed to watch the
arrival of this and other species in the ponds of the neighbourhood,
and generally found the young Snow Geese to make their
appearance in the beginning of October, and the adult or white birds
about a fortnight later. In like manner, when migrating northward,
although the young and the adult birds set out at the same time, they
travel in separate flocks, and, according to Captain Sir George
Back, continue to do so even when proceeding to the higher
northern latitudes of our continent. It is not less curious that, during
the whole of the winter, these Geese remain equally divided, even if
found in the same localities; and although young and old are often
seen to repose on the same sand-bar, the flocks keep at as great a
distance as possible.
The Snow Goose in the grey state of its plumage is very abundant in
winter, about the mouths of the Mississippi, as well as on all the
muddy and grassy shores of the bays and inlets of the Gulf of
Mexico, as far as the Texas, and probably still farther to the south-
west. During the rainy season, it betakes itself to the large prairies of
Attacapas and Oppellousas, and there young and adult procure their
food together, along with several species of Ducks, Herons, and
Cranes, feeding, like the latter, on the roots of plants, and nibbling
the grasses sideways, in the manner of the Common Tame Goose.
In Louisiana I have not unfrequently seen the adult birds feeding in
wheat fields, when they pluck up the plants entire.
When the young Snow Geese first arrive in Kentucky, about
Henderson for instance, they are unsuspicious, and therefore easily
procured. In a half-dry half-wet pond, running across a large tract of
land, on the other side of the river, in the State of Indiana, and which
was once my property, I was in the habit of shooting six or seven of
a-day. This, however, rendered the rest so wild, that the cunning of
any “Red Skin” might have been exercised without success upon
them; and I was sorry to find that they had the power of
communicating their sense of danger to the other flocks which
arrived. On varying my operations however, and persevering for
some time, I found that even the wildest of them now and then
suffered; for having taken it into my head to catch them in large
traps, I tried this method, and several were procured before the rest
had learned to seize the tempting bait in a judicious manner.
The Snow Goose affords good eating when young and fat; but the
old Ganders are tough and stringy. Those that are procured along
the sea-shores, as they feed on shell-fish, fry and marine plants,
have a rank taste, which, however suited to the palate of the epicure,
I never could relish.
The flight of this species is strong and steady, and its migrations over
the United States are performed at a considerable elevation, by
regular flappings of the wings, and a disposition into lines similar to
that of other Geese. It walks well, and with rather elevated steps; but
on land its appearance is not so graceful as that of our common
Canada Goose. Whilst with us they are much more silent than any
other of our species, rarely emitting any cries unless when pursued
on being wounded. They swim buoyantly, and, when pressed, with
speed. When attacked by the White-headed Eagle, or any other
rapacious bird, they dive well for a short space. At the least
appearance of danger, when they are on land, they at once come
close together, shake their heads and necks, move off in a contrary
direction, very soon take to wing, and fly to a considerable distance,
but often return after a time.
I am unable to inform you at what age the Snow Goose attains its
pure white plumage, as I have found that a judgment formed from
individuals kept in confinement is not to be depended upon. In one
instance at least, a friend of mine who had kept a bird of this species
four years, wrote to me that he was despairing of ever seeing it
become pure white. Two years after, he sent me much the same
message; but, at the commencement of next spring, the Goose was
a Snow Goose, and the change had taken place in less than a
month.
Dr Richardson informs us that, this species “breeds in the barren
grounds of Arctic America, in great numbers. The eggs of a
yellowish-white colour, and regularly ovate form, are a little larger
than those of the Eider Duck, their length being three inches, and
their greatest breadth two. The young fly in August, and by the
middle of September all have departed to the southward. The Snow
Goose feeds on rushes, insects, and in autumn on berries,
particularly those of the Empetrum nigrum. When well fed it is a very
excellent bird, far superior to the Canada Goose in juiciness and
flavour. It is said that the young do not attain the full plumage before
their fourth year, and until that period they appear to keep in
separate flocks. They are numerous at Albany Fort in the southern
part of Hudson’s Bay, where the old birds are rarely seen; and, on
the other hand, the old birds in their migrations visit York Factory in
great abundance, but are seldom accompanied by the young. The
Snow Geese make their appearance in spring a few days later than
the Canada Geese, and pass in large flocks both through the interior
and on the coast.”
The young birds of this species begin to acquire their whiteness
about the head and neck after the first year, but the upper parts
remain of a dark bluish colour until the bird suddenly becomes white
all over; at least, this is the case with such as are kept in captivity.
Although it is allied to the White-fronted or Laughing Goose, Anser
albifrons, I was surprised to find that Wilson had confounded the
two species together, and been of opinion that the Bean Goose also
was the same bird in an imperfect state of plumage. That excellent
ornithologist tells us that “this species, called on the sea-coast, the
Red Goose, arrives in the river Delaware, from the north, early in
November, sometimes in considerable flocks, and is extremely noisy,
their notes being shriller and more squeaking than those of the
Canada, or common Wild Goose. On their first arrival, they make but
a short stay, proceeding, as the depth of winter approaches, farther
south; but from the middle of February, until the breaking up of the
ice in March, they are frequently numerous along both shores of the
Delaware, about and below Reedy Island, particularly near Old Duck
Creek, in the State of Delaware. They feed on roots of the reeds
there, which they tear up like hogs.”
This species is rare both in Massachusetts and South Carolina,
although it passes over both these States in considerable numbers,
and in the latter some have been known to alight among the
common domestic Geese, and to have remained several days with
them. My friend Dr Bachman, of Charleston, South Carolina, kept a
male Snow Goose several years along with his tame Geese. He had
received it from a friend while it was in its grey plumage, and the
following spring it became white. It had been procured in the autumn,
and proved to be a male. In a few days it became very gentle, and
for several years it mated with a common Goose; but the eggs
produced by the latter never hatched. The Snow Goose was in the
habit of daily frequenting a mill-pond in the vicinity, and returning
regularly at night along with the rest; but in the beginning of each
spring it occasioned much trouble. It then continually raised its head
and wings, and attempted to fly off; but finding this impossible, it
seemed anxious to perform its long journey on foot, and it was
several times overtaken and brought back, after it had proceeded
more than a mile, having crossed fences and plantations in a direct
course northward. This propensity cost it its life: it had proceeded as
far as the banks of the Cooper River, when it was shot by a person
who supposed it to be a wild bird.
In the latter part of the autumn of 1832, whilst I was walking with my
wife, in the neighbourhood of Boston in Massachusetts, I observed
on the road a young Snow Goose in a beautiful state of plumage,
and after making some inquiries, found its owner, who was a
gardener. He would not part with it for any price offered. Some
weeks after, a friend called one morning, and told me that this
gardener had sent his Snow Goose to town, and that it would be sold
by auction that day. I desired my friend to attend the sale, which he
did; and before a few hours had elapsed, the bird was in my
possession, having been obtained for 75 cents! We kept this Goose
several months in a small yard at the house where we boarded,
along with the young of the Sand-hill Crane, Grus Americana. It was
fed on leaves and thin stalks of cabbage, bread, and other vegetable
substances. When the spring approached, it exhibited great
restlessness, seeming anxious to remove northward, as was the
case with Dr Bachman’s bird. Although the gardener had kept it four
years, it was not white, but had the lower part of the neck and the
greater portion of the back, of a dark bluish tint, as represented in
the plate. It died before we left Boston, to the great regret of my
family, as I had anticipated the pleasure of presenting it alive to my
honoured and noble friend the Earl of Derby.
There can be little doubt that this species breeds in its grey plumage,
when it is generally known by the name of Blue-winged Goose, as is
the case with the young of Grus Americana, formerly considered as
a distinct species, and named Grus Canadensis.

Anas hyperborea, Gmel. Syst. Nat. vol. i. p. 504.—Lath. Ind. Orn. vol. ii. p.
837.
Snow Goose, Anas hyperborea, Wils. Amer. Ornith. vol. viii. p. 76, pl. 68,
fig. 3, Male, and p. 89, pl. 69, fig. 5, Young.
Anser hyperboreus, Ch. Bonaparte, Synopsis of Birds of United States, p.
376.
Anser hyperboreus, Snow Goose, Richards. and Swains. Fauna Bor.-
Amer. vol. ii. p. 467.
Snow Goose, Nuttall, Manual, vol. ii. p. 344.

Adult Male. Plate CCCLXXXI. Fig. 1.


Bill about the length of the head, much higher than broad at the
base, somewhat conical, compressed, rounded at the tip. Upper
mandible with the dorsal line sloping, the ridge broad and flattened at
the base, narrowed towards the unguis, which is roundish and very
convex, the edges beset with compressed, hard teeth-like lamellæ,
their outline ascending and slightly arched; lower mandible
ascending, nearly straight, the angle long and of moderate length,
the dorsal line beyond it convex, the sides erect, and beset with
lamellæ; similar to those of the upper, but more numerous, the
unguis obovate and very convex. Nasal groove oblong, parallel to
the ridge, filled by the soft membrane of the bill; nostrils medial,
lateral, longitudinal, narrow-elliptical, open, pervious.
Head of moderate size, oblong, compressed. Neck rather long and
slender. Body full, slightly depressed. Feet rather short, strong,
placed about the centre of the body; legs bare a little above the joint;
tarsus rather short, strong, a little compressed, covered all round
with hexagonal, reticulated scales, which are smaller behind; hind
toe very small, with a narrow membrane; third toe longest, fourth
considerably shorter, but longer than the second; all the toes
reticulated above at the base, but with narrow transverse scutella
towards the end; the three anterior connected by a reticulated
membrane, the outer having a thick margin, the inner with the margin
extended into a two-lobed web. Claws small, arched, rather
compressed, obtuse, that of the middle toe bent obliquely outwards,
and depressed, with a curved edge.
Plumage close, full, compact above, blended beneath, as well as on
the head and neck, on the latter of which it is disposed in longitudinal
bands, separated by narrow grooves; the feathers of the lateral parts
small and narrow, of the back ovato-oblong, and abruptly rounded, of
the lower parts curved and oblong. Wings rather long, broad;
primaries strong, incurved, broad, towards the end tapering, the
second longest, but only a quarter of an inch longer than the first,
which scarcely exceeds the third; the first and second sinuate on the
inner web, the second and third on the outer. Secondaries long, very
broad, rounded, the inner curved outwards. Tail very short, rounded,
of sixteen broad rounded feathers.
Bill carmine-red, the unguis of both mandibles white, their edges
black. Iris light brown. Feet dull lake, claws brownish-black. The
general colour of the plumage is pure white; the fore part of the head
tinged with yellowish-red; the primaries brownish-grey, towards the
end blackish-brown, their shafts white unless toward the end.
Length to end of tail 31 3/4 inches, to end of claws 33 1/2, to end of
wings 31 3/4, to carpus 14; extent of wings 62; wing from flexure
19 1/2; tail 6 1/4; bill along the ridge 2 5/8, along the edge of lower
mandible 3 1/4; bare part of tibia 3/4; tarsus 3 5/8; hind toe 1/2, its claw
4 1/2/ ; middle toe 3, its claw 4/12. Weight 6 3/4 lb.
12

Young Female, in first winter. Plate CCCLXXXI. Fig. 2.


The colours of the young bird, in its first plumage, are unknown; but
in its second plumage, in autumn and winter, it presents the
appearance exhibited in the plate. The bill is pale flesh-colour, its
edges black, and the unguis bluish-white; the feet flesh-colour, the
claws dusky. The head and upper part of the neck are white, tinged
above with grey, the lower part of the neck all round, the fore part of
the back, the scapulars, the fore part of the breast, and the sides,
blackish-grey; paler beneath. The hind part of the back and the
upper tail-coverts, are ash-grey; as are the wing-coverts; but the
secondary coverts are greyish-black in the middle; and all the quills
are of that colour, the secondaries margined with greyish-white; the
tail-feathers dusky-grey, broadly margined with greyish-white. The
dark colour of the fore part of the breast gradually fades into greyish-
white, which is the colour of the other inferior parts, excepting the
axillar feathers, and some of the lower wing-coverts, which are white.
Length of an individual in this plumage, kept four years, to end of tail
26 inches, to end of claws 25; extent of wings 55; bill along the ridge
2 1/4, from frontal angle 2 1/2; tarsus 2 7 1/2/12; hind toe 6/12, its claw
4 1/2/12; middle toe 2 1/4, its claw 4/12. Weight 2 lb. 13 oz. The bird
very poor.
In an adult male preserved in spirits, the roof of the mouth is
moderately concave, with five series of strong conical papillæ
directed backwards. The posterior aperture of the nares is linear,
margined with two series of extremely slender papillæ. The marginal
lamellæ of the upper mandible are 25, of the lower about 45. The
tongue is 2 inches 5 twelfths long, nearly cylindrical, with strong
pointed papillæ at the base, and on each side a series of flattened,
sharp lamellæ, directed backwards, together with very numerous
bristle-like filaments. It is fleshy, has a soft prominent pad at the base
above, and towards the end has a median groove, the point rounded
thin, and horny. The œsophagus; which is 17 inches long, has a
diameter of 9 twelfths at the upper part, and at the lower part of the
neck is dilated to 1 inch. The proventricular glands are cylindrical,
simple, and arranged in a belt nearly 1 inch in breadth. The other
parts were removed.
The reddish tint on the head affords no indication of the age of the
bird, some individuals of all ages having that part pure white, while
others have it rusty. The same remark applies to our two Swans.
SHARP-TAILED GROUS.

Tetrao Phasianellus, Linn.


PLATE CCCLXXXII. Male and Female.

This is another species of our birds with the habits of which I am


entirely unacquainted. Dr Richardson’s account of it is as follows:
—“The northern limits of the range of the Sharp-tailed Grous is Great
Slave Lake, in the sixty-first parallel; and its most southern recorded
station is in latitude 41°, on the Missouri. It abounds on the outskirts
of the Saskatchewan plains, and is found throughout the woody
districts of the Fur Countries, haunting open glades or low thickets
on the borders of lakes, particularly in the neighbourhood of the
trading paths, where the forests have been partially cleared. In
winter it perches generally on trees, in summer is much on the
ground; in both seasons assembling in coveys of from ten to sixteen.
Early in spring, a family of these birds select a level spot, whereon
they meet every morning, and run round in a circle of fifteen or
twenty feet in diameter, so that the grass is worn quite bare. When
any one approaches the circle, the birds squat close to the ground,
but in a short time stretch out their necks to survey the intruder; and,
if they are not scared by a nearer advance, soon resume their
circular course, some running to the right, others to the left, meeting
and crossing each other. These “Partridge dances” last for a month
or more, or until the hens begin to hatch. When the Sharp-tailed
Grous are put up, they rise with the usual whirring noise, and alight
again at the distance of a few hundred yards, either on the ground,
or on the upper branches of a tree. Before the cock quits his perch,
he utters repeatedly the cry of cuck, cuck, cuck. In winter they roost
in the snow like the Willow Grous, and they can make their way
through the loose wreaths with ease. They feed on the buds and
sprouts of the Betula glandulosa, of various willows, and of the
aspen and larch; and in autumn on berries. Mr Hutchins says that
the hen lays thirteen white eggs with coloured spots early in June;
the nest being placed on the ground and formed of grass, lined with
feathers.”
Dr Townsend informs me that while crossing the north branch of the
Platte (Lorimie’s Fork), he found this species breeding, and that as
an article of food it proved to be a very well-flavoured and plump
bird, considerably superior to any of the other larger species that
occur in the United States.

Tetrao Phasianellus, Linn. Syst. Nat. vol. i. p. 273.—Lath. Ind. Ornith. vol.
ii. p. 635.—Ch. Bonaparte, Synopsis of Birds of United States, p. 127.
Tetrao Phasianellus, Sharp-tailed Grous, Ch. Bonaparte, Amer. Ornith.
vol. iii. p. 37, pl. 19.
Tetrao (centrocercus) Phasianellus, Swains. Sharp-tailed Grous,
Richards. and Swains. Fauna Bor.-Amer. vol. ii. p. 361.
Sharp-tailed Grous, Nuttall, Manual, vol. i. p. 669.

Adult Male. Plate CCCLXXXII. Fig. 1.


Bill short, strong, as broad as high; upper mandible with the dorsal
line arcuato-declinate, the ridge narrow at the base on account of the
great extent of the nasal sinus, which is feathered, the sides convex
toward the end, the edges overlapping and thin, the tip declinate and
blunt, but thin-edged; lower mandible with the angle of moderate
length and width, the dorsal line ascending and convex, the edges
sharp and inclinate, the tip obtuse.
Head rather small, oblong; neck of moderate length; body full. Feet
rather short, stout; tarsus roundish, feathered, bare and reticulated
behind. Toes of moderate size, with numerous scutella above, but
covered over at the base by the hair-like feathers which grow from
the sides and the intervening basal membranes, laterally pectinate
with long slender projecting flattened scales; first toe small, second a
little longer than fourth, third much longer. Claws slender, arched,
moderately compressed, rather obtuse; that of the third toe with the
inner edge dilated.
Plumage dense, soft, rather compact, the feathers in general broadly
ovate; those on the head and upper part of the neck short, but some
on the upper and hind part of the former elongated and forming a
slight crest. There is a papillate coloured membrane over the eye, as
in the other species; and on each side of the neck is a large bare
space, concealed by the plumage, which I have no doubt is inflated,
as in Tetrao Cupido and T. Urophasianus, during the love season.
Wings rather short, concave, much rounded; the primaries stiff and
very narrow, so as to leave large intervals when the wing is
extended; the third quill longest, the fourth next, the second shorter
than the fifth, the sixth longer than the first. Tail short, much
graduated, of sixteen feathers, of which the lateral are three inches
shorter than the central; all the feathers are more or less concave,
excepting the two middle worn along the inner edge, obliquely and
abruptly terminated, the two middle projecting an inch beyond the
next.
Bill dusky above, brown beneath; iris light hazel; superciliary
membrane vermilion; toes brownish-grey, claws brownish-black. The
upper parts are variegated with light red or brownish-orange,
brownish-black and white; the black occupying the central part of the
feathers, the light red forming angular processes from the margin,
generally dotted with black, and a lighter bar near the end; the white
being in terminal, triangular, or guttiform spots on the scapulars and
wing-coverts. The alula, primary coverts, secondary coverts, and
quills are greyish-brown, the coverts spotted and tipped with white;
the primaries with white spots on the outer web, the inner tipped with
white, as are all the secondaries, of which the outer have two bars of
white spots, and the inner are coloured like the back. The tail is
white, at the base variegated, and the two middle feathers like the
back. Loral space, and a line behind the eye, white; a dusky streak
beneath the eye, succeeded by a light coloured one. The throat is
reddish-white, with some dusky spots; the fore part and sides of the
neck barred with dusky and reddish-white: on the lower part of the
neck and fore part of the breast, the dusky bars become first curved,
and then arrow-shaped, and so continue narrowing on the hind part
of the breast, and part of the sides, of which the upper portion is
barred; the abdomen, lower tail-coverts, axillar feathers, and most of
the lower wing-coverts, white. The hair-like feathers of the tarsi are
light brownish-grey, faintly barred with greyish-white.
Length to end of tail 17 1/2 inches, to end of wings 14, to end of
claws 17; extent of wings 23; wing from flexure 8 1/4, tail 4 1/2; bill
along the ridge 10 1/2/12, along the edge of lower mandible 1 1 1/2/12;
1/2 1/2
tarsus 1 7 /12; hind toe 6/12, its claw 6/12; middle toe 1 7 /12, its
claw 7/12.

Adult Female. Plate CCCLXXXII. Fig. 2.


The Female is considerably smaller, but is coloured like the male,
the tints being duller.
LONG-EARED OWL.

Strix otus, Linn.


PLATE CCCLXXXIII. Male.

This Owl is much more abundant in our Middle and Eastern Atlantic
Districts than in the Southern or Western parts. My friend Dr
Bachman has never observed it in South Carolina; nor have I met
with it in Louisiana, or any where on the Mississippi below the
junction of the Ohio. It is not very rare in the upper parts of Indiana,
Illinois, Ohio, and Kentucky, wherever the country is well wooded. In
the Barrens of Kentucky its predilection for woods is rendered
apparent by its not being found elsewhere than in the “Groves;” and
it would seem that it very rarely extends its search for food beyond
the skirts of those delightful retreats. In Pennsylvania, and elsewhere
to the eastward, I have found it most numerous on or near the banks
of our numerous clear mountain streams, where, during the day, it is
not uncommon to see it perched on the top of a low bush or fir. At
such times it stands with the body erect, but the tarsi bent and
resting on a branch, as is the manner of almost all our Owls. The
head then seems the largest part, the body being much more
slender than it is usually represented. Now and then it raises itself
and stands with its legs and neck extended, as if the better to mark
the approach of an intruder. Its eyes, which were closed when it was
first observed, are opened on the least noise, and it seems to squint
at you in a most grotesque manner, although it is not difficult to

You might also like