Science
Science
Leslie Lamport
18 October 2024
1 Introduction 1
1.1 Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Who Are You? . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 The Origin of the Science . . . . . . . . . . . . . . . . . . . . 2
1.3.1 The Origin of the Theory . . . . . . . . . . . . . . . . 2
1.3.2 The Origin of the Practice . . . . . . . . . . . . . . . . 3
1.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Why Math? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Ordinary Math 12
2.1 Arithmetic as a Mathematical Theory . . . . . . . . . . . . . 13
2.2 The Mathematical Theory of Algebra . . . . . . . . . . . . . 14
2.3 Mathglish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Boolean Arithmetic (Propositional Logic) . . . . . . . . . . . 18
2.5 ZF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Meaningless Expressions . . . . . . . . . . . . . . . . . . . . . 23
2.7 Quantification and Bound Variables . . . . . . . . . . . . . . 25
2.7.1 Quantification . . . . . . . . . . . . . . . . . . . . . . 25
2.7.2 Bound Variables . . . . . . . . . . . . . . . . . . . . . 26
2.8 Defining Mappings and Functions . . . . . . . . . . . . . . . . 28
2.8.1 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8.3 Sequences and Tuples . . . . . . . . . . . . . . . . . . 31
2.9 Some Useful Notation . . . . . . . . . . . . . . . . . . . . . . 32
2.9.1 if/then/else . . . . . . . . . . . . . . . . . . . . . 32
2.9.2 Conjunction and Disjunction Lists . . . . . . . . . . . 32
ii
CONTENTS iii
5 Interlude 111
5.1 Possibility and Accuracy . . . . . . . . . . . . . . . . . . . . . 111
5.1.1 Possibility Conditions . . . . . . . . . . . . . . . . . . 111
5.1.2 Expressing Possibility in TLA . . . . . . . . . . . . . . 112
5.1.3 Checking Accuracy . . . . . . . . . . . . . . . . . . . . 114
5.2 Real-Time Programs . . . . . . . . . . . . . . . . . . . . . . . 115
Math VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2.1 Fischer’s Algorithm . . . . . . . . . . . . . . . . . . . 117
5.2.2 Correctness of Fischer’s Algorithm . . . . . . . . . . . 120
5.2.3 Fairness and Zeno Behaviors . . . . . . . . . . . . . . 121
5.2.4 Discrete Time . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.5 Hybrid Systems . . . . . . . . . . . . . . . . . . . . . . 125
6 Refinement 126
6.1 A Sequential Algorithm . . . . . . . . . . . . . . . . . . . . . 128
Math VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.1.1 A One-Step Program . . . . . . . . . . . . . . . . . . . 129
6.1.2 Two Views of Refinement Mappings . . . . . . . . . . 131
6.1.3 A Step and Data Refinement . . . . . . . . . . . . . . 132
6.2 Invariance Under Refinement . . . . . . . . . . . . . . . . . . 135
CONTENTS v
Appendix 266
A Miscellany 266
A.1 Ordinary Math Summary . . . . . . . . . . . . . . . . . . . . 266
A.1.1 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 266
A.1.2 Propositional Logic . . . . . . . . . . . . . . . . . . . . 266
A.1.3 Predicate Logic . . . . . . . . . . . . . . . . . . . . . . 267
A.1.4 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
A.1.5 The choose Operator . . . . . . . . . . . . . . . . . . 268
A.1.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 269
A.1.7 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 269
A.1.8 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 270
A.1.9 Recursive Definitions . . . . . . . . . . . . . . . . . . . 270
A.2 Structured Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 271
A.3 Why Not All Mappings Are Sets . . . . . . . . . . . . . . . . 274
A.4 How Not to Write x 000 . . . . . . . . . . . . . . . . . . . . . . 275
A.5 Hoare Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
A.6 Another Way to Look at Safety and Liveness . . . . . . . . . 279
A.6.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . 279
A.6.2 The Metric Space of Behaviors . . . . . . . . . . . . . 282
B Proofs 284
B.1 Invariance Proof of Increment . . . . . . . . . . . . . . . . . . 284
B.2 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . 287
B.3 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . 289
B.4 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . 290
B.5 Proof of Theorem 4.6 . . . . . . . . . . . . . . . . . . . . . . . 291
B.6 Proof of Theorem 4.7 . . . . . . . . . . . . . . . . . . . . . . . 292
B.7 Proof of Theorem 4.8 . . . . . . . . . . . . . . . . . . . . . . . 293
B.8 Proof Sketch of Theorem 4.9 . . . . . . . . . . . . . . . . . . 295
B.9 Proof of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . 296
B.10 Proof Sketch of Theorem 7.3 . . . . . . . . . . . . . . . . . . 298
B.11 Proof Sketch of Theorem 7.6 . . . . . . . . . . . . . . . . . . 299
B.12 Proof of Theorem 8.2 . . . . . . . . . . . . . . . . . . . . . . . 302
B.13 Proof of Theorem 8.3 . . . . . . . . . . . . . . . . . . . . . . . 303
Bibliography 305
Index 311
CONTENTS viii
Acknowledgments
This book presents things that I have learned over decades. During that
time, I have discussed them with many wonderful colleagues. I learned
many things from them, and I cannot possibly remember who taught me
what. Here is a list containing only my coauthors of published papers who
I can remember contributing to the content of this book. Omitted are a
number of colleagues who taught me a lot that does not appear in the book,
and a few who taught me things contained here but with whom I never
wrote a published paper.
The following people not in the list above read preliminary versions of
the book and reported errors or sent me comments that led to significant
changes.
Introduction
1.1 Who Am I?
Dear Reader. I am inviting you to spend many pages with me. Before
deciding whether to accept my invitation, you may want to know who I am.
I was educated as a mathematician; my doctoral thesis was on partial
differential equations. While a student, I worked part-time and summers
as a programmer. At that time, almost all programs were what I will call
traditional programs—ones with a single thread of control that take input,
produce output, and stop.
After obtaining my doctorate, I began working on concurrent algo-
rithms—ones comprising multiple independent threads of control, called pro-
cesses, that are executed concurrently. The first concurrent algorithms were
meant to be executed on a single computer, and processes communicated
through a shared memory. Later came distributed algorithms—concurrent
algorithms designed to be executed on multiple computers in which processes
communicate by message passing.
This is not the place for modesty. I was very good at concurrency—both
writing concurrent algorithms and developing the theory underlying them.
The first concurrent algorithm I wrote, published in 1974, is still taught at
universities. In 1978 I published what is probably the first paper on the
theory of distributed computing. I have received many awards and honors
for this work, including a Turing award (generally considered the Nobel
prize of computer science) for “fundamental contributions to the theory and
practice of distributed and concurrent systems.”
1
CHAPTER 1. INTRODUCTION 2
that they were hard to get right. The many possible orders in which the op-
erations of different processes can be executed leads to an enormous number
of possible executions that have to be considered. The only way to ensure
that the algorithm worked correctly was to prove that it did.
By the 1970s, a standard approach had been developed for proving cor-
rectness of traditional programs. Around 1975, I and a few other computer
scientists began extending that approach to concurrent algorithms [4, 24, 29,
45]. Concurrent algorithms were usually written in pseudocode plus some
informal explanation of what the pseudocode meant. I came to realize that
all these methods for proving correctness could be explained by describing
a concurrent algorithm as what I am now calling an abstract program; and
an abstract program could be described mathematically.
Correctness of an algorithm was expressed by properties required of its
executions. I came to realize that correctness can also be expressed by an
abstract program—a more abstract, higher-level one than the abstract pro-
gram describing the algorithm. Proving correctness means showing that the
abstract program describing the algorithm implements the abstract program
describing its correctness, and I developed a method for doing that.
This work culminated around 1990 with a way to write an abstract pro-
gram as a single formula [32]. The formula is written in an obscure form
of math called temporal logic. The particular temporal logic is called TLA
(for the Temporal Logic of Actions). Most of the TLA formula for an ab-
stract program consists of ordinary math that expresses essentially what
was described by pseudocode. Temporal logic replaces the informal expla-
nation of the pseudocode. That one abstract program implements another
is expressed as logical implication together with mathematical substitution.
Throughout this period, I was writing correctness proofs of the algo-
rithms I was inventing. This showed me that my way of reasoning with
abstract programs worked in practice. However, I discovered that as my
algorithms got more complicated and the formulas describing them became
larger, the method of writing proofs used by mathematicians became unre-
liable. It could not ensure that all the details were correct. I had to devise
a method of hierarchically structuring proofs to keep track of those details.
both intellectual tools to help them think better and programs to help them
detect errors before they are implemented in code. These tools are based on
what I learned by writing and reasoning about concurrent algorithms.
Programming is not just coding. It requires thinking before we code.
Writing algorithms taught me that there are two things we need to decide
before writing and debugging the code: what the program should do and
how the program should do it. Most programmers think that the code
itself adequately describes “how the program should do it”, but I learned
that we need a higher-level, more abstract description of what the program
does. To emphasize that programming is more than just coding, I now
use the name coding language for what are commonly called programming
languages. That name is used in this book.
An algorithm is an example of a description of how a program should do
something. Concurrent algorithms are hard to understand. To invent them,
I had to be able to write them in the simplest way possible. Algorithms were
usually written in pseudocode to avoid the complexity that real code requires
to permit efficient execution. I developed a way to describe concurrent
algorithms in math that was more precise and no harder to understand
than pseudocode.
Engineers who build complex systems usually recognize the need for
describing what their programs do in a simpler, more abstract way than
with code. I decided that abstract programs written in math provided such
a way for describing the aspects of a system that involve concurrency. By
about 1995, I had designed a complete language called TLA+ that engineers
could use to write abstract programs in TLA.
The abstract programs I know of that have been written by engineers to
describe what a system should do generally consist of about 200–2000 lines
of TLA+. All but a few of those lines are ordinary math, not temporal logic.
As with code, those formulas are made easy to understand by decomposing
them into smaller pieces. This is done using simple definitions, rather than
the more complex constructs of coding languages.
To formalize mathematics and make it easier to write long formulas, I
had to add to TLA+ some concepts and syntax not present in the math com-
monly used by mathematicians—for example, variable declarations, group-
ing definitions into modules, and notation for substitution. This book uses
TLA, but not TLA+, because the examples with which it illustrates our
science are short and simple.
The kind of hierarchically structured proofs I devised can also be writ-
ten in TLA+, and there is a program for checking the correctness of those
proofs. However, with today’s proof-checking technology, writing machine-
CHAPTER 1. INTRODUCTION 5
checked proofs takes more time than engineers generally have. By the time
I designed TLA+, model checking had become a practical tool for checking
the correctness of abstract programs. A model checker can essentially check
correctness of all possible executions of a very small instance of an abstract
program. This turns out to be very effective at detecting errors. There
are two model checkers for abstract programs written in TLA+, using two
complementary approaches. Model checking is the standard way engineers
check those programs.
A program’s code can, in principle, be described by a (concrete) abstract
program and could, in principle, be written as a TLA+ formula. For a simple
program (or part of a program), the code can be hand-translated to TLA+
and checked with the TLA+ tools. Usually, the length of the program and
the complexity of the coding language makes this impractical.
From the point of view of our science, it makes no difference how long a
formula an abstract program is. We therefore consider a program written in
a coding language to be an abstract program. And since we are considering
only abstract programs, we will let program mean abstract program. We
will call an (abstract) program written in code a concrete program.
Although we don’t write them as formulas, viewing concrete programs as
abstract programs provides a new way of thinking about them. One benefit
of this way of thinking is that understanding what it means for a concrete
program to implement a higher-level abstract program can help avoid coding
errors.
1.4 Correctness
Thus far, our science has been described as helping to build concurrent
programs that work correctly. Working correctly is a vague concept. Here
is precisely what it is taken to mean in this book.
We define a behavioral property to be a condition that is or is not sat-
isfied by an individual execution of a program. For example, termination
is a behavioral property. An execution either terminates or else it doesn’t
terminate, meaning that it keeps executing forever. We say that a program
satisfies a behavioral property if every possible execution of the program sat-
isfies it. A program is considered to work correctly, or simply to be correct,
if it satisfies its desired behavioral properties.
That every possible execution of a program satisfies its behavioral prop-
erties may seem like an unreasonably strong requirement. I would be happy
if a program that I use does the right thing 99.99% of the times I run it. For
CHAPTER 1. INTRODUCTION 6
many programs, extensive testing can ensure that it does. But it can’t for
most concurrent programs. What a concurrent program does can depend on
the relative order in which operations by different processes are executed.
This makes the program nondeterministic, meaning that different executions
can do different things, even if the program receives identical inputs. This
can result in an enormous number of possible executions, and testing can
examine only a tiny fraction of them. Moreover, a concurrent program that
has run correctly for years can start producing frequent errors because a
small change to the computer hardware, the operating system, or even the
other programs running at the same time causes incorrect executions that
have never occurred before. The only way to prevent this is to ensure that
every possible execution satisfies the behavioral properties.
Model checking is more effective at finding errors in concurrent programs
than ordinary testing because it checks all possible executions. However,
it does this only on a few small instances of the program—for example,
an instance with few processes or one that allows only a small number of
messages to be in transit at any time.1 Engineering judgment is required
to decide if correctness of those instances provides enough confidence in the
correctness of the program.
There is one way testing could find errors in concrete programs. When
building a concurrent system, an abstract program is often used to model
how the processes interact with one another, and correctness of that pro-
gram is checked. The concrete program is then coded by implementing each
process of the more abstract program by a separate process in the code.
Since there is no concurrency within an individual process, testing that the
concrete program implements the more abstract program has a good chance
of finding coding errors. Research on this approach is in progress.
1.5 A Preview
To give you an idea of what our science is like, this section describes in-
formally a simple abstract program for Euclid’s algorithm—a traditional
algorithm that computes a value and stops. It’s a very simple concurrent
program in which the number of processes equals 1. Our science applies to
single-process programs, although there are simpler sciences that work quite
well for them.
1
There are techniques for proving the correctness of a program by model checking a
simpler program, but they have not been implemented for abstract programs written in
TLA+.
CHAPTER 1. INTRODUCTION 7
The states in the sequence are separated with arrows because we naturally
think of an execution going from one state to the next. But in terms of our
science, the algorithm and its execution just are; they don’t go anywhere.
What an algorithm does in the future depends on its current state, not
on what happened in the past. This means that in the final state of the
execution, in which x and y are equal, they equal GCD(M , N ) because of
some property that is true of every state of the execution. To understand
Euclid’s algorithm, we must know what that property is.
That property is GCD(x , y) = GCD(M , N ). (Chapter 3 explains how
we show that every state satisfies this property.) Because an execution
stops only when x and y are equal, and GCD(i , i ) equals i for any positive
2
You may have seen a more efficient modern version of Euclid’s algorithm that replaces
the larger of x and y by the remainder when it is divided by the smaller. For the purpose
of this example, it makes little difference which we use.
CHAPTER 1. INTRODUCTION 8
integer i , this property implies that x and y equal GCD(M , N ) in the final
state of the execution.
That the formula GCD(x , y) = GCD(M , N ) is true in every state of a
program’s execution is a behavioral property. A behavioral property that
asserts a formula is true in all states of an execution is called an invariance
property, and the formula is called an invariant of the program. Correctness
of any concurrent program depends on it satisfying an invariance property.
To understand why the program is correct, we have to know the invariant
of the program that explains its correctness.
The invariant GCD(x , y) = GCD(M , N ) shows that, if Euclid’s algo-
rithm terminates, then it produces the correct output. A traditional pro-
gram must also satisfy the behavioral property of termination. The two
behavioral properties
are special cases of the following two classes of behavioral properties that
can be required of a concurrent program:
These two classes of properties are defined precisely in Section 4.1. Termina-
tion is the only liveness property required of a traditional program. There
are many kinds of liveness properties that can be required of concurrent
programs.
Euclid’s algorithm satisfies its safety requirement (being allowed to ter-
minate only if x and y equal GCD(M , N )) because the only thing it is
allowed to do is start with x = M and y = N and execute its action. That
is, it satisfies its safety requirement because it is assumed to satisfy the
safety property of doing only what the description of the algorithm allows
it to do.
Euclid’s algorithm satisfies its liveness requirement (eventually terminat-
ing) because it is assumed to satisfy the liveness property of eventually per-
forming any action that its description allows it to perform. (Section 3.4.2.8
shows how we prove that the algorithm terminates.)
I have found it best to describe and reason about safety and liveness
in different ways. In our science, temporal logic plays almost no role in
handling safety, but it is central to handling liveness. The TLA formula for
CHAPTER 1. INTRODUCTION 9
This result was unusual. It was possible only because the design of the
entire system was described with TLA+. Usually, TLA+ is used to describe
only critical aspects of a system that involve concurrency, which represent
a small part of the system’s code. But this example dramatically illustrates
that describing abstract programs with mathematics can produce better
programs.
3
The book states the reduction in code size to be a factor of 5–10. Verhulst explained
to me that it was impossible to measure the reduction precisely, so the book gave a
conservative estimate; but he believes it was actually a factor of 10.
Chapter 2
Ordinary Math
12
CHAPTER 2. ORDINARY MATH 13
numbers until eventually you learned about the real numbers, which √ include
integers, rational numbers like 3/4, and lots of other numbers like − 2 and
π (which equals 3.14159. . . ). Although the numbers we will use are almost
always integers, most of our discussion here applies to all real numbers, so
we let number mean real number.
We use the same notation for the operators of arithmetic that you learned
in school—for example +, / (division), and ≥ ; except that multiplication is
written “∗” because mathematicians use × to mean something else.
An operator like ∗ is what we call a mapping. A mapping M takes some
fixed number of arguments. If M takes two arguments, then M (v , w ) is a
value for some values v and w . For the mapping ∗, if v and w are numbers,
then ∗(v , w ) equals the number we usually write v ∗ w . For the mapping
= , if v and w are numbers, then =(v , w ) is a value we call true if v equals
w , and it’s a different value we call false if v doesn’t equal w .1 The values
true and false are called Booleans. A mapping such as = whose value is
a Boolean for all values of its arguments is called a predicate.
A mathematical theory contains expressions. An expression in the theory
of arithmetic is a syntactically correct sequence of numbers, the operators of
arithmetic, and parentheses—for example 2 ∗ (3 + 42). It’s best to think of
2 ∗ (3 + 42) as a way of writing the expression ∗(2, +(3, 42)). Since it’s the
normal way of writing expressions, we’ll write 2 ∗ (3 + 42); but we’ll think
of it as ∗(2, +(3, 42)).
An expression like 2∗(3+42) whose value is a number is called a numeric
expression. An expression like 2 + 2 > 22 whose value is a Boolean is called
a Boolean expression or, more commonly, a formula.
The semantics of a mathematical theory is a mapping that assigns a
meaning to each expression. We write the meaning of an expression exp as
[[exp]]. But you spent years learning the meaning of arithmetic expressions,
so there’s no need for me to explain it to you. If I define something in terms
of arithmetic expressions, then you will understand it.
There are similar rules for − and the other operators of algebra. There are
also two other rules:
CHAPTER 2. ORDINARY MATH 16
[[3 ∗ x − 2 ∗ y]](Υ)
= [[3 ∗ x ]](Υ) − [[2 ∗ y]](Υ) by the rule for −
= [[3]](Υ) ∗ [[x ]](Υ) − [[2]](Υ) ∗ [[y]](Υ) by the rule for ∗
= 3 ∗ Υ(x ) − 2 ∗ Υ(y) by the two rules above
An important class of formulas are ones that equal true no matter what
values are substituted for their variables. Such a formula is said to be valid ;
and the assertion that F is valid is written |= F . For example, we write
(2.1) |= p ∗ (q + r ) = p ∗ q + p ∗ r
2.3 Mathglish
Math is precise, but this book isn’t written in math. It’s written in English
that explains math. Explaining the precise meaning of math in the imprecise
language of English is not easy. To help them do this, English-speaking
mathematicians speak and write in a dialect of English I call Mathglish. (I
expect mathematicians use similar dialects of other languages.) Mathglish
differs from English in two ways: It eliminates some of the imprecision of
English by giving a precise meaning to some imprecise English words, and it
makes the written language more compact by using mathematical formulas
to replace English phrases.
This book is written in Mathglish. This chapter explains the Mathglish
you need to know to read the book. This section discusses the second feature
of Mathglish—the use of formulas to replace prose.
Consider these two sentences that might appear in a math book:
1. Substituting y + 1 for z in formula (42) yields x ≥ y + 1.
2. Formula (42) shows us that x ≥ y + 1.
Grammatically, we can see that the two uses of “x ≥ y + 1” are different.
In sentence 1 it’s a noun, while in sentence 2 it’s a complete clause. In
sentence 1, “x ≥ y + 1” is a formula; in sentence 2 it’s an abbreviation for
“x is greater than or equal to y + 1”. This grammatical difference tells us
that the two sentences have very different meanings. Sentence 2 asserts that
the formula x ≥ y + 1 is true. The first doesn’t tell us whether it’s true or
false. For example, sentence 1 could be followed by:
It isn’t always possible to tell from the grammar which way a formula is
being used in a sentence. Sometimes we have to look at the context in
which the sentence appears. The formula x ≥ y + 1 can be true only in a
context in which some assumptions have been made about the values of x
and y—assumptions that are expressed by formulas that are assumed to be
CHAPTER 2. ORDINARY MATH 18
true. Without such assumptions, the formula can be used only as a formula,
which may be true or false. I have tried to make it clear by grammar or
context what it means when a formula appears in a sentence in this book.
When these operators are incorporated into algebra, they have lower prece-
dence than arithmetic operators like + and >. However, it’s best to put
parentheses around purely algebraic formulas (like those of (2.2)) when us-
ing operators from Boolean arithmetic.
I like the name Boolean arithmetic because it makes the subject sound as
simple as it really is. However, it’s usually called propositional logic, so that’s
what we’ll call it from now on. You can find propositional logic calculators
on the Web that will check whether a formula like (A ⇒ B ) ⇒ (¬B ⇒ ¬A)
is true for all Boolean values A and B . They can help you become facile
with propositional logic.
2.5 ZF
Computers do a lot more than numerical computation. To describe what
computer systems do mathematically, our math needs more kinds of values
than just numbers. The simplest way I know to make the math we need
rigorous is to base it on what is called ZF set theory or simply ZF, where Z
and F stand for the mathematicians Ernst Zermelo and Abraham Fraenkel.
One thing that makes ZF simple is that every value is a set. In ZF, the terms
set and value mean exactly the same thing. Sometimes I will write set/value
instead of set or value to remind you that the two words are synonyms.
You’ve probably learned that a set is a collection of things. In ZF, those
things are sets, so a set is a collection of sets. However, we will see below
that not all collections are sets—in particular, the collection of all sets isn’t
a set. The fundamental operator on sets is ∈, which is read is an element
of or simply in. For every value/set S , the formula e ∈ S equals true iff
e is one of the values/sets that S is a collection of. We call the values in a
set S the elements of S . Two values/sets are equal iff they have the same
elements.
Two sets that we√will use are the set R of all real numbers and the set
I of integers. Thus, 2 is an element of R but not an element of I. Since
the elements of a set are sets, this means a number must be a set. Logicians
have used the operators of ZF set theory to define the set of real numbers.
There’s no need for us to do that; we just assume the real numbers exist
and the√ arithmetic operators satisfy their usual properties. This means that
42 and√ 2 are sets, but we don’t specify what their elements are. We know
that 2 ∈ 42 equals either true or false, but we don’t know which. The
\enlargethispage
Booleans true and false are also values.3 We generally use the term value command.
3
As usually defined, ZF does not consider true and false to be sets. Making them
sets will allow the value of a program variable to be a Boolean.
CHAPTER 2. ORDINARY MATH 22
for a set/value like 42 or true for which we don’t know what its elements
are; and we use the term set for a set/value when we know what its elements
are.
We define the semantics of ZF the same way we defined the semantics
of algebra. The meaning [[exp]] of an expression exp of ZF is a mapping
from interpretations to values, where an interpretation is a mapping from
variables to values. The operators of propositional logic are incorporated
into ZF the same way they were incorporated into algebra. A formula is an
expression F such that [[F ]](Υ) is a Boolean for every interpretation Υ.
You’ve probably learned a number of operations on sets, and you will
need them if you want to write abstract programs that describe real systems.
But the examples in this book are so simple that we’ll need few of them.
We often speak of one set being a subset of another—for example, I is a
subset of R because every integer is a real number. The assertion that S is
a subset of T is written ⊆. It is effectively defined by this axiom:
|= S ⊆ T ≡ ((v ∈ S ) ⇒ (v ∈ T ))
where S , T , and v are variables.
A simple way to describe a set is by enumerating its elements. If
v 1 , . . . , v n are any values, then they are the (only) elements of the set
{v 1 ,√. . . , v n }. This set need not have n elements. For√ example, the set
{3, 2, 3, 2+1, 42, 3} contains √ only the three elements 2, 3, and 42. It is
equal to the set {42, 42, 3, 2}. (It is as silly to say that a set has two copies
of the number 42 as it is to say that a football team has two copies of one
of its players.) For n = 0, this defines {} to be the empty set, which has no
elements.
ZF contains the construct that mathematicians call set comprehension,
but that I prefer to call subsetting. The expression {v ∈ S : F } equals the
set whose elements are all the elements in the set S for which the formula
F is true. For example, we can define the set N of natural numbers, which
consists of all non-negative integers, by:
∆
N = {n ∈ I : n ≥ 0}
In the expression {v ∈ S : F }, the symbol v is what is called a bound variable.
It can be used only in the formula F , but not in S . Bound variables are
discussed in Section 2.7.
A kind of set that is often used in describing programs is a finite set of
consecutive integers, such as {−1, 0, 1, 2} . This set is written −1 . . 2. In
general, we define:
∆
m . . n = {i ∈ I : m ≤ i ≤ n}
CHAPTER 2. ORDINARY MATH 23
(x ∈ R) ∧ (y ∈ R) ⇒ (x + y = y + x )
(2.5) |= (A ∧ B ) ≡ (B ∧ A) .
CHAPTER 2. ORDINARY MATH 25
The assumption that A and B are Booleans isn’t needed because it can be
inferred that A and B have type Boolean.4 We can maintain that simplicity
without having to introduce types by assuming that rules of predicate logic
like (2.5) are true for all values A and B , not just for Booleans. We do that
by assuming a mapping Bool such that Bool (true) = true, Bool (false) =
false, and Bool (v ) ∈ {true, false} for all values v . We can then define
predicate logic operators like ∧ such that v ∧w equals Bool (v )∧Bool (w ) for
all values v and w . This means that, although we don’t know what 2 ∧ {3}
equals, we know that it’s a Boolean value and that it equals {3} ∧ 2.
For example, the following formula asserts that there exists a (real) number
whose square equals y :
(2.6) ∃ x ∈ R : y = x 2
Since x 2 is a non-negative real number for any real number x , and a real
number y has a square root (that’s a real number) iff y ≥ 0, this formula is
equivalent to (y ∈ R) ∧ (y ≥ 1) .
The two bounded quantifiers are related by these theorems:
(2.7) |= (∀ v ∈ S : F ) ≡ (¬ ∃ v ∈ S : ¬F )
|= (∃ v ∈ S : F ) ≡ (¬ ∀ v ∈ S : ¬F )
You should be able to check that they follow from the quantifiers’ informal
definitions. You should also be able to check that ∀ v ∈ {} : F equals true
and ∃ v ∈ {} : F equals false for any formula F .
When parsing a formula, the scope of a quantifier extends as far as
possible—for example, until terminated by the end of the formula or by a
right parenthesis whose matching left parenthesis precedes the quantifier.
We abbreviate ∀ v ∈ S : ∀ w ∈ T : F as ∀ v ∈ S , w ∈ T : F , and we abbre-
viate ∀ v ∈ S : ∀ w ∈ S : F as ∀ v , w ∈ S : F , with similar abbreviations for
∃ and for the unbounded quantifiers.
Quantifiers and the rules for reasoning about them form what is called
predicate logic. Predicate and propositional logic are the basis for reasoning
in ordinary mathematics. A valid formula whose validity is based solely on
the laws of predicate and propositional logic, and not on the meanings of
any other operators in the formula, is called a tautology. For example, the
truth of
|= (∃ v ∈ N : M (v , x ) ∧ (v +1 > x )) ⇒ (∃ v ∈ N : M (v , x ))
provides no way to write such a formula.6 Logicians call the error produced
by the naive substitution variable capture.
Fortunately, there’s an easy way to avoid variable capture. The under-
lying mathematics doesn’t depend on what we name variables. If we change
the name of the bound variable in (2.6) to anything other than y, for ex-
ample writing it as ∃ w ∈ R : y = w 2 , then we get an equivalent formula.
Naively substituting x 2 + 1 for y in that formula yields a formula equivalent
to (x 2 + 1 ∈ R) ∧ (x 2 + 1 ≥ 0), as it should.
We can avoid having to rename the bound variable in (2.6) if we use a
name different from the name of any variable declared in its context. That
is, we can avoid such variable capture by obeying this rule:
Safe Scoping Rule Never declare a bound variable within the scope
of a declaration of a variable with the same name.
The rule prevents writing (2.6) in any context in which we could write a
formula like x 2 + 1 to substitute for y.
The Safe Scoping Rule will help keep you out of trouble, but it isn’t
enough. There are unlikely situations involving definitions in which this
rule alone will not prevent variable capture. Section 2.8.1 below mentions
another way variable capture can still occur. So when writing hand proofs,
you should always be aware that variable capture is a potential problem,
and it may require renaming bound variables to avoid it.
The definition
∆
(2.8) M (a, b) = a 2 + b 2
defines the mapping M such that M (exp 1 , exp 2 ) equals (exp 1 )2 + (exp 2 )2 ,
for any expressions exp 1 and exp 2 . A mapping that has one or more argu-
ments is not a value. The string M + 42 is not an expression; it’s nonsense,
like 3 + = 7. (Appendix A.3 explains why mappings can’t all be values.)
The mapping M can appear in an expression only in a subexpression of
the form M (exp 1 , exp 2 ). However, (2.8) defines this subexpression to equal
(exp 1 )2 + (exp 2 )2 , for all expressions exp 1 and exp 2 — even if exp 1 equals R
and exp 2 equals false.
The symbols a and b in (2.8) are called the parameters of the definition.
Definition parameters are bound variables, whose scope is the right-hand
side of the definition. However, they pose no problem of variable capture
because they are replaced by expressions when the defined symbol appears
in a formula. The Safe Scoping Rule should still be applied to them because
the definition (2.8) would be confusing if it appeared within the scope of the
declaration of a variable named a or b.
While definition parameters are bound variables that can’t do any “cap-
turing”, variable capture is still possible with a definition such as
∆
P (a, b) = (a > b) ∧ ∃ x ∈ R : b = x 2
whose right-hand side declares the bound variable x . The Safe Scoping Rule
ensures that this definition does not occur in the scope of a variable named
x . However, P may later be used in an expression that is in the scope of a
variable x . Avoiding variable capture when substituting in that expression
may require changing the name of the bound variable in the definition of P .
A fundamental rule of mathematics you may have learned as a teenager
is that circular definitions are forbidden. We can’t define M in terms of P ,
and P in terms of Q, and Q in terms of M . The obvious way to enforce
this rule is to require that a definition use only previously defined mappings.
However, to understand the definitions, it’s often best to write them in the
opposite order, so higher-level concepts are defined before defining the lower-
level concepts on which they are built. So, we will order definitions to make
them easier to understand, while avoiding circular definitions.
A special class of circular definitions are allowed in which a mapping
is defined in terms of itself. They’re called recursive definitions and are
introduced in Section 3.5.
CHAPTER 2. ORDINARY MATH 30
2.8.2 Functions
You may have learned about functions, and you undoubtedly did if you
studied calculus. If you did learn about them, you may have wondered why
I use the term mapping rather than function. Most mathematicians consider
them to be the same. We’ll see why they must be different.
Here, we describe functions of a single argument. Functions of multiple
arguments are defined in Section 2.8.3 below. We consider a function to be
a special kind of mapping that differs from other mappings in two ways:
defines M (S ) to equal the Boolean N ∈ S for all values S because its value
is specified for every set/value S . We know that M (S ) equals N ∈ S , which
is a Boolean, for any value S . But M is not a function, because its domain
would have to be the set of all sets/values, and we saw in Section 2.5 that
the collection of all sets can’t be a set.
If you’ve studied calculus, you’ve seen functions like the function f , whose
domain is the set {x ∈ R : x 6= 0} of non-zero real numbers, defined by let-
ting f (x ) equal 1/x 2 for all numbers x in that set. Mathematicians seem to
have no convenient notation for writing such a function. We will write that
function f as v ∈ {x ∈ R : x 6= 0} 7→ 1/v 2 . In general,
∆
f = v ∈ S 7→ exp
and the value it assigns to elements of that domain. Thus, if f and g are
two functions with domain S and f (v ) = g(v ) for all v ∈ S , then the two
functions are equal.
2.9.1 if/then/else
A programmer who read enough math would notice that mathematicians
lack anything corresponding to the if /then/else statement of coding lan-
guages. Instead, they use either prose or a very awkward typographical
convention. We let the expression
∧ ∨ A∧B ( (A ∧ B )
∨C ∨ C )
∧D ⇒E equals ∧ (D ⇒ E )
∧ ∨ ∃x : F ∧ ( (∃ x : F )
∨∧G ≡H ∨ ( (G ≡ H )
∧J ∧ J ) )
Note how the implicit parentheses in the bulleted lists delimit the scope of
the ⇒ and ∃ x operators in this formula.
Making indentation significant is a feature of the currently popular Python
coding language, but it works even better in this notation because the use
of ∧ and ∨ as “bullets” makes the logical structure easier to see.
Chapter 3
Describing Abstract
Programs with Math
In this chapter, we take a leisurely path that begins with a conventional
mathematical method of describing computer systems and ends with the
definition of almost all of TLA. Along the way, you will learn how to describe
the safety part of an abstract program, how to prove it satisfies invariance
properties, and the temporal logic that will be used to describe its safety
and liveness properties as a single formula.
34
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 35
describing the three spatial coordinates of the planet’s position and three
describing the direction and magnitude of its momentum. Let’s call those
state variables v 1 , . . . , v 6 ; we won’t worry about which of the six values
each represents. The quantities these variables represent change with time,
so the value of each variable v i is a function, where v i (t) represents the value
at time t. The behavior of the system is described mathematically by the
function σ with domain R≥ such that σ(t) is the tuple hv 1 (t), . . . , v 6 (t)i of
numbers, for every t ∈ R≥ . Physicists call σ(t) the state of the system at
time t.
In this description, the planet is modeled as a point mass. Real plan-
ets are more complicated, composed of things like mountains, oceans, and
atmospheres. For simplicity, the model ignores those details. This limits
the model’s usefulness. For example, it’s no good for predicting a planet’s
weather. But models of planets as point masses are sometimes used to plan
the trajectories of a real spacecraft. It’s also not quite correct to say that the
model ignores details like mountains and oceans. The mass of the model’s
point mass is the total mass of the planet, including its mountains and
oceans, and its position is the planet’s center of mass. The model abstracts
those details, it doesn’t ignore them.
The laws that determine the point-mass planet’s behavior σ are ex-
pressed by six differential equations of this form:
dv i
(3.1) (t) = f i (t)
dt
where t ∈ R≥ and each f i is a function with domain R≥ such that f i (t)
is a formula containing the expressions v 1 (t), . . . , v 6 (t). Don’t worry if you
haven’t studied calculus and don’t know what equation (3.1) means. All
you need to know is that it asserts the following approximate equality for
small non-negative values of dt:
and the approximation gets better as dt gets smaller. (It reaches equality
when dt = 0.) The differential equations (3.1) have the property that for
any time t > t 0 and any time r > t, the values of the six numbers v i (t)
and the functions f i completely determine the six values v i (r ) and hence
the value of σ(r ). That is, the equations imply:
History Independence For any time t ∈ R≥ , the state σ(r ) of the system
at any time r > t depends only on its state σ(t) at time t, not on
anything that happened before time t.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 36
(m = d ∗ n + r ) ∧ (0 ≤ r < n)
This rule is a meta-formula, and the Mathglish terms and and implies are
used to represent the Boolean operators ∧ and ⇒ to make it clear that it is
a meta-formula and not a formula.
Strong mathematical induction allows proving that the formula is true
for n + 1 by assuming that it is true for all numbers in 0 . . n, not just true
for n. It is stated as:
|= ∀ n ∈ N : (∀ m ∈ 0 . . (n − 1) : P (m)) ⇒ P (n)
implies |= ∀ n ∈ N : P (n)
In this rule, for n = 0 the hypothesis implies
|= (∀ m ∈ 0 . . − 1 : P (m)) ⇒ P (0)
variables x = 1, y = 1;
while true do
a: x : = x + y + 2 ;
y := y + 2
end while
(3.4) ∀ j ∈ N : ∧ x (j + 1) = x (j ) + y(j ) + 2
∧ y(j + 1) = y(j ) + 2
We call (3.3) the initial predicate. It determines the initial state. Formula
(3.4) is called the step predicate. It’s the discrete analog of the differential
equations (3.1) that describe the orbiting planet. Instead of describing how
the values of the variables change in the continuous behavior when time
increases by the infinitesimal amount dt, the step predicate (3.4) describes
how they change when the state number of the discrete behavior increases
by one.
You can check that (3.3) and (3.4) define a behavior that begins as
follows where, for example, [x :: 16, y :: 7]3 indicates that state number 3
assigns the values 16 to x and 7 to y, and the arrows are purely decorative.
" # " # " # " # " #
x :: 1 x :: 4 x :: 9 x :: 16 x :: 25
→ → → → → ···
y :: 1 y :: 3 y :: 5 y :: 7 y :: 9
0 1 2 3 4
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 42
These first few states of the behavior suggest that in the complete behavior,
x and y equal the following functions:
(3.5) x = (j ∈ N 7→ (j + 1)2 )
y = (j ∈ N 7→ 2 ∗ j + 1)
To prove that (3.3) and (3.4) imply (3.5), we must prove that they imply:
(3.6) ∀ j ∈ N : (x (j ) = (j + 1)2 ) ∧ (y(j ) = 2 ∗ j + 1)
A proof by (simple) mathematical induction that (3.3) and (3.4) imply (3.6)
is a nice exercise in algebraic calculation.
We can think of (3.5) as the solution of (3.3) and (3.4), just as the for-
mulas describing the position and momentum of the planet at each time t
are solutions of the differential equations (3.1). It is mathematically im-
possible to find solutions to the differential equations describing arbitrary
multi-planet systems. It is mathematically possible to write explicit descrip-
tions of variables as functions of the state number like (3.5) for the abstract
programs written in practice, but those descriptions are almost always much
too complicated to be of any use. Instead, we reason about the initial pred-
icate and the step predicate, though in Section 3.4.1 we’ll see how to write
them in a more convenient way.
The interesting thing about program Sqrs is that the sequence of values
assumed by x in an execution of the program is the sequence of all posi-
tive integers that are perfect squares, and this is accomplished using only
addition. This is obvious from (3.5), but for nontrivial examples we won’t
have such an explicit description of each state of a behavior. Remember
that history independence implies that, at any point in a behavior, what
the program does in the future depends only on its current state. What is
it about the current state that ensures that if x is a perfect square in that
state, then it will equal all greater perfect squares in the future? There is
a large body of work on reasoning about traditional programs, initiated by
Robert Floyd in 1967 [15], that shows how to answer this question. If you’re
familiar with that work, the answer may seem obvious. If not, it may seem
like it was pulled out of a magician’s hat. Obvious or magic, the answer is
that the following formula is true for every state number j in the behavior
of Sqrs:
(3.7) ∧ (x (j ) ∈ N) ∧ (y(j ) ∈ N)
∧ y(j ) % 2 = 1
y(j ) + 1 2
∧ x (j ) =
2
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 43
This formula implies that x (j ) is a perfect square, since the first two con-
juncts imply that y(j ) is an odd natural number. Moreover, since y(j + 1) =
y(j ) + 2, the last conjunct implies that x (j + 1) is the next larger perfect
square after x (j ). So, the truth of (3.7) for every state number j explains
why the algorithm sets x to all perfect squares in increasing order.
A predicate like (3.7) that is true for every state number j of a behavior
is called an invariant of the behavior. By mathematical induction, we can
prove that a predicate is an invariant by proving these two conditions:
I2. For any k ∈ N, if the predicate is true for j = k then it’s true for
j = k + 1.
For (3.7), I1 follows from the initial predicate (3.3), and I2 follows from the
step predicate (3.4). (You should have no trouble writing the proof if you’re
used to writing proofs; otherwise, it might be challenging.)
A predicate that can be proved to be an invariant by proving I1 from an
initial predicate and I2 from a step predicate is called an inductive invariant.
Model checkers can check whether a state predicate is an invariant of small
instances of an abstract program. But the only way to prove it is an invariant
is to prove that it either is or is implied by an inductive invariant. For
any invariant P , there is an inductive invariant that implies P . However,
writing an inductive invariant for which we can prove I1 and I2 is a skill
that can be acquired only with practice. Tools to find it for you have been
developed [16, 40], but I don’t know how well they would work on industrial
examples.
The first conjunct of the invariant (3.7) asserts the two invariants
x (j ) ∈ N and y(j ) ∈ N. An invariant of the form v (j ) ∈ S for a variable v
is called a type invariant for v . An inductive invariant almost always must
imply a type invariant for each of its variables. For example, without the
hypotheses that x (j ) and y(j ) are numbers, we can deduce nothing about
the values of x (j + 1) and y(j + 1) from the step predicate (3.4).
Most mathematicians would not bother to write the first conjunct of
(3.7), simply assuming it to be obvious. However, mathematicians aren’t
good at getting things exactly right. They can easily omit some uninterest-
ing corner case—for example, the assumption that a set is nonempty. Those
“uninteresting corner cases” are the source of many errors in programs. To
avoid such errors, we need to state explicitly all necessary requirements,
including type invariants.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 44
variables x = 1, y = 1, pc = a;
while true do
a: x : = x + y + 2 ;
b: y : = y + 2
end while
than Sqrs. Having a finer grain of atomicity implies that the step predicate
is more complicated.
Having a finer grain of atomicity also implies that the inductive invariant
that explains why the abstract program works will be more complicated.
However, there is a trick for obtaining the invariant for FGSqrs from the
invariant (3.7) of Sqrs. Define yy(j ) to equal y(j ) if execution of FGSqrs is
at label a, and to equal the value y(j ) will have after executing statement
b if execution is at b. The mathematical definition is:
∆
yy(j ) = if pc(j ) = a then y(j ) else y(j ) + 2
3.3 Nondeterminism
Math II
The # Operator The operator # is defined so that if S is a finite set,
then #(S ) equals the number of elements in S . If S is not a finite set (so it
must be an infinite set), then #(S ) is a meaningless expression.
|= f ∈ (D → S ) ≡ ∧ f = (v ∈ D 7→ f (v ))
∧ ∀ v ∈ D : f (v ) ∈ S
An array in coding languages is described mathematically as a function,
where the expression f [x ] in the language means f (x ). For a variable f
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 47
Failure Physical devices don’t always behave the way they’re supposed to.
In particular, they can fail in various ways. Programs that tolerate
failures describe a failure as an operation that may or may not be
executed.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 48
variables x = 0 ;
process p ∈ Procs
variables t = 0, pc = a ;
a: t : = x ;
b: x : = t + 1
end process
Figure 3.3: The Increment abstract program for a set Procs of processes.
Initial Predicate
∧ x (0) = 0
∧ t(0) = (p ∈ Procs 7→ 0)
∧ pc(0) = (p ∈ Procs 7→ a)
Step Predicate
∀ j ∈ N : PgmStep(j ) ∨ Stutter (j )
where
∆
PgmStep(j ) = ∃ p ∈ Procs : aStep(p, j ) ∨ bStep(p, j )
∆
aStep(p, j ) = ∧ pc(j )(p) = a
∧ x (j + 1) = x (j )
∧ t(j + 1) = (t(j ) except p 7→ x (j ))
∧ pc(j + 1) = (pc(j ) except p 7→ b)
∆
bStep(p, j ) = ∧ pc(j )(p) = b
∧ x (j + 1) = t(j )(p) + 1
∧ t(j + 1) = t(j )
∧ pc(j + 1) = (pc(j ) except p 7→ done)
∆
Stutter (j ) = ∧ ∀ p ∈ Procs : pc(j )(p) = done
∧ hx (j + 1), t(j + 1), pc(j + 1)i = hx (j ), t(j ), pc(j )i
The possible steps in a behavior are described by a predicate that, for each
j , gives the values of x (j + 1), t(j + 1), and pc(j + 1) for any assignment
of values to x (j ), t(j ), and pc(j ). It asserts that there are two possibilities,
described by formulas PgmStep(j ) and Stutter (j ), that are explained below.
PgmStep(j ) describes the possible result of some process executing one step
starting in state j . The predicate equals true iff there exists a process
p for which aStep(p, j ) or bStep(p, j ) is true, where:
aStep(p, j ) describes a step in which process p executes its statement
labeled a in state number j . Its last three conjuncts describe the
values of the three variables x , t, and p in state j +1. Many people
are tempted to write t(j + 1)(p) = x (j ) and pc(j + 1)(p) = b
instead of the third and fourth conjuncts. But that would permit
t(j +1)(q) and pc(j +1)(q) to equal any values for q 6= p. Instead
we must use the except operator defined in Section 2.8.2. The
first conjunct is a predicate that is true or false of state j . It is an
enabling condition, allowing the step described by the following
three conjuncts to occur iff that condition is true.
bStep(p, j ) describes a step in which process p executes its statement
labeled b in state number j . It is similar to aStep(p, j ). Its
enabling condition is pc(j )(p) = b. The step sets pc(j + 1)(p)
to done, which is a value indicating that the process has reached
the end of its code and terminated.
Stutter (j ) describes a stuttering step starting in state j . It is enabled iff
pc(j )(p) equals done for all p ∈ Procs, so all processes have termi-
nated. At that point, PgmStep(j ) is not enabled, so only an infinite
sequence of stuttering steps can occur, as required for a terminated
abstract program. The second conjunct in the definition of Stutter (j )
uses the fact that two tuples are equal iff their corresponding elements
are equal to write the following formula more compactly:
(x (j + 1) = x (j )) ∧ (t(j + 1) = t(j )) ∧ (pc(j + 1) = pc(j ))
A property we might like to prove about abstract program Increment is that,
when it has terminated, the value of x lies between 1 and the number of
processes. Let’s define N to equal #(Procs), the number of processes. Since
a process has terminated iff its local pc variable equals done, the property
we want to prove is that this formula is an invariant of Increment—that is,
true for every j ∈ N:
(3.8) (∀ p ∈ Procs : pc(j )(p) = done) ⇒ (x (j ) ∈ 1 . . N )
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 51
|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ false)
|= (P ⇒ Q) ≡ (P ∧ ¬Q ⇒ Q)
1. It’s short. This usually means one paragraph of less than about a
dozen lines.
Proofs that are not short should be structured. The simplest structured
proof consists of a sequence of numbered steps, each consisting of an asser-
tion and its prose proof satisfying conditions 1 and 2. The assertion of the
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 53
last step is Q.E.D., which stands for the goal of the proof—that is, what
must be proved to prove the theorem. Each step’s proof may assume the
assertions of previous steps. General structured proofs, in which a step’s
proof may also be a structured proof, are introduced in Section 6.4.
Initial Predicate
∆
Init = ∧ x = 0
∧ t = (p ∈ Procs →7 0)
∧ pc = (p ∈ Procs →7 a)
Step Predicate
∆
Next = PgmStep ∨ Stutter
where
∆
PgmStep = ∃ p ∈ Procs : aStep(p) ∨ bStep(p)
∆
aStep(p) = ∧ pc(p) = a
∧ x0 = x
∧ t 0 = (t except p 7→ x )
∧ pc 0 = (pc except p 7→ b)
∆
bStep(p) = ∧ pc(p) = b
∧ x 0 = t(p) + 1
∧ t0 = t
∧ pc 0 = (pc except p 7→ done)
∆
Stutter = ∧ ∀ p ∈ Procs : pc(p) = done
∧ hx 0 , t 0 , pc 0 i = hx , t, pc i
Inductive Invariant
∆
Inv = ∧ TypeOK
∧ ∀ p ∈ Procs : (pc(p) = b) ⇒ (t(p) ≤ NumberDone)
∧ x ≤ NumberDone
Figure 3.5: Abstract program Increment and its invariant Inv in simpler
math.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 55
Since this book is about a science of programs, we will henceforth use the
name variable for program variables. Mathematical variables like Procs will
be called constants. When describing a program mathematically, variables
correspond to what we normally think of as program variables. Constants
are parameters of the program, such as a fixed set of processes. Early coding
languages had constants as well as variables. In modern coding languages,
constants are buried in the code, where they are called static final variables
of an object.
In this book, the variables in pseudocode are explicitly declared, and un-
declared identifiers like Procs are constants. For formulas, the text indicates
which identifiers are variables and which are constants.
In addition to having both variables and constants, the formulas in Fig-
ure 3.5 have primed variables, like x 0 . An expression that may contain
primed and unprimed variables, constants, and the operators and values of
ordinary math (which means everything described in Chapter 2) is called a
step expression. A Boolean-valued step expression is called an action. The
math whose formulas are actions is called the Logic of Actions, or LA for
short.
For an unprimed variable v , we define [[v ]](s → t) to equal s(v ), the value
assigned to variable v by state s. For a primed variable v 0 , we define
[[v 0 ]](s → t) to equal t(v ).
We call an LA expression a step expression and an LA formula an action.
For an action A and step s → t, we say that s → t satisfies A or is an A
step iff [[A]](s → t) equals true.
A state expression is an LA expression that contains no primed vari-
ables, and a state formula is a Boolean-valued state expression. For a state
expression exp, the value of [[exp]](s → t) depends only on s, so we can write
it as [[exp]](s).
Because the meaning of an LA expression assigns different values to v
and v 0 , we can treat v and v 0 as two unrelated variables. This means that
we can reason about LA formulas as if constants, unprimed variables, and
primed variables were all different mathematical variables. Thus, for LA as
defined so far, we can regard LA as ordinary math with some mathematical
variables having names like v 0 ending with 0 .
exp 0 is equivalent to the step expression obtained by priming all the variables
in exp. The priming operator (0 ) can be applied only to state expressions.
In LA, priming an expression that contains a prime is a syntax error. That
means that it is illegal to prime an expression containing a defined symbol
whose definition contains a prime. For example, if e is defined to equal
x 0 + 1, then e 0 is syntactically illegal.
A constant has the same value in both states of a step. Therefore,
|= c 0 = c is true for any constant c. More generally, a constant expression is
an expression with no (primed or unprimed) variable; and |= exp 0 = exp is
true for any constant expression exp. The bound identifiers of predicate logic
are like ordinary mathematical variables, which means they are treated like
constants in the Logic of Actions. For example, (∃ i ∈ N : y = x + i )0 equals
∃ i ∈ N : y 0 = x 0 + i . We therefore call bound identifiers bound constants.
Appendix Section A.4 gives an example of how you can get into trouble by
forgetting that bound identifiers are constants.
The semantics of LA imply that the prime operator distributes over
the operators and constructs of ordinary math—for example, that (F ∨ G)0
equals F 0 ∨ G 0 . By expanding all definitions and distributing primes in this
way, we obtain a formula in which the prime operator is applied only to vari-
ables. We don’t have to expand all definitions to obtain such a formula. We
need only expand definitions that contain a prime or that appear within a
primed expression and contain a variable. Once we have reached an expres-
sion in which only variables are primed, we can reason about the resulting
expression as if constants, variables, and primed variables were all ordinary
mathematical variables. We therefore need no additional rules for reasoning
about LA formulas.
Section 3.2.2 defined an inductive invariant Inv of a program to be a
state predicate satisfying conditions I1 and I2, which we can restate as:
I1. Inv is implied by the program’s initial state.
I2. If Inv is true in a state, then the program’s next-state predicate implies
that it is true in the next state.
For program Increment, whose initial predicate is Init and whose next-state
action is Next, these two conditions can be expressed in LA as:
(3.10) |= Init ⇒ Inv
|= Inv ∧ Next ⇒ Inv 0
The proof of these conditions for program Increment is discussed in Ap-
pendix Section B.1.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 58
Replacing the actions aStep(p) and bStep(p) with aStep(p) · bStep(p) in the
definition of the next-state action Next of program Increment produces a
program with a coarser grain of atomicity. Choosing the grain of atomicity
of an abstract program involves a tradeoff between making the program
detailed enough to be useful and simple enough to be usable. Section 8.1
addresses this tradeoff using action composition.
The operator “·” is associative, meaning (A · B ) · C = A · (B · C ) for any
actions A, B , and C . We can therefore omit parentheses and simply write
5
If you believe that the second and third conjuncts in this formula are in the wrong
order, then you’re thinking in terms of coding languages, not math. Remember that F ∧G
is equivalent to G ∧ F .
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 59
A·B ·C .
For any action A, we define the action A+ to be satisfied by a step s → t
iff state t can be reached from state s by a sequence of one or more A steps.
In other words:
∆
A+ = A ∨ (A · A) ∨ (A · A · A) ∨ (A · A · A · A) ∨ . . .
In temporal logic formulas, the operator 2 binds more tightly than the
operators of propositional logic. For example, 2F ∨G is parsed as (2F )∨G.
From now on, [[F ]] means [[F ]]RTLA for all RTLA formulas, including actions.
We will explicitly write [[A]]LA to denote the meaning of A as an LA formula.
For an action A, we define 2A to be the temporal formula that is true
of a behavior iff A is true of all steps of the behavior. In other words, we
define the meaning [[2A]] of the RTLA formula 2A by
∆
(3.12) [[2A]](σ) = ∀ n ∈ N : [[A]]LA (σ(n) → σ(n + 1))
Like most logics, RTLA contains the propositional logic operators, where
they have their standard meanings. For example, [[F ∧ G]](σ) equals
[[F ]](σ) ∧ [[G]](σ). We will write a quantified formula like ∃ i ∈ S : F with
F a temporal formula only when S is a constant expression, in which case
[[∃ i ∈ S : F ]](σ) equals ∃ i ∈ [[S ]] : [[F ]](σ) , where [[S ]] is the value of S under
the assumed assignment of values to constants. As in LA, bound identi-
fiers are called bound constants and they act like constants, having the same
value in all states of a behavior.
It’s important to remember that a behavior is any cardinal sequence of
states. It doesn’t have to be a behavior of any particular program. Since
any step is the first step of lots of behaviors, it’s obvious that if A is an LA
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 61
formula, then |= A is true when A is viewed as an RTLA formula iff it’s true
when A is viewed as an LA formula.
Now let’s return to the description of program Increment in Figure 3.5.
It tells us that a behavior σ is a behavior of the program iff (i) the initial
predicate Init is true of its first state σ(0) and (ii) the step predicate Next
is true for every step σ(n) → σ(n + 1) of σ. Condition (i) is expressed
by [[Init]], since (3.11) tells us that [[Init]](σ) equals [[Init]]LA (σ(0) → σ(1));
and since Init is a state predicate, it’s true of a step iff it’s true of the first
state of the step. By (3.12), condition (ii) is expressed as [[2Next]]. Thus
(the meaning of) the formula Init ∧ 2Next is true of a behavior σ iff σ is a
behavior of program Increment.
Of course, this is true for an arbitrary program. The behaviors that
satisfy a program with initial predicate Init and next-state action Next
are described by the simple RTLA formula Init ∧ 2Next. Any program is
described by an RTLA formula of this form. As promised, we can write any
program as a mathematical formula. It’s an RTLA formula rather than a
TLA formula, and we’ll see that it needs to be modified. But for now, it’s
close enough to the final TLA formula.
By (3.12), the state predicate Inv is true in all states of a behavior iff
2Inv is true of that behavior. That Inv is an invariant of Increment means
that, for any behavior σ, if σ is a behavior of Increment then Inv is true in
all states of σ. Thus, that Inv is an invariant of Increment is expressed by
this condition:
Remember that in (3.10) and (3.13), when Init, Next, and Inv are the formu-
las defined in Figure 3.5, |= F means that F is true for all interpretations
satisfying the assumptions we made about the constants of Increment—
namely, that Procs is a nonempty finite set and the values of a, b, and done
are different from one another.
In general, the conditions I1 and I2 for showing that a state predicate Inv
is an invariant of a program Init ∧ 2Next are expressed in LA by conditions
(3.10). It is an RTLA proof rule that these conditions imply (3.13). When we
prove a safety property like (3.13), the major part of the reasoning depends
on the definitions of the formulas Init, Next, and Inv . That reasoning is
reasoning about actions, which is formalized by LA. The temporal logic
reasoning, which is done in RTLA, is trivial. Describing the program with a
single formula is elegant. But it is really useful only when verifying liveness
properties, which requires nontrivial temporal reasoning.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 62
so σ +n equals i ∈ N 7→ σ(i + n) .
For any RTLA formula F , the RTLA formula 2F is true of a behavior
σ iff it is true of the behaviors σ +n for all n ∈ N. In other words:
∆
(3.15) [[2F ]](σ) = ∀ n ∈ N : [[F ]](σ +n )
Perhaps less obvious is this proof rule, which is sort of a converse of (3.16):
(3.20) |= F implies |= 2F
(3.21) |= F ⇒ G implies |= 2F ⇒ 2G
This rule lies at the heart of much temporal logic reasoning. Another rule
we will need is
Make sure you understand why they are true from the meaning of 3 as
eventually. These two tautologies can be derived from (3.16) and (3.17).
For example:
and |= 2¬F ⇒ ¬F follows from (3.16). You should convince yourself that
3(F ∧ G) and (3F ) ∧ (3G) need not be equivalent. The equivalence of
3(F ∨ G) and 3F ∨ 3G generalizes to arbitrary disjunctions:
|= 3(∃ i ∈ S : F i ) ≡ (∃ i ∈ S : 3F i )
Here are three tautologies relating 3 and 2. The first is obtained by negating
3F and its definition; the third by substituting ¬F for F in the first; and
the second by negating both sides of the equivalence in the third:
It asserts that if F is true from now on and G is true at some time in the
future, then at some time in the future F is true from then on and G is true
then.
We can express liveness properties with 3. For example, the assertion
that some state predicate P is eventually true is a liveness property. The
assertion that the program whose formula is F satisfies this property is
|= F ⇒ 3P . Since the assertion that something eventually happens is a
liveness property, most of the formulas we write that contain 3 express
liveness.
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 66
The rules for moving ¬ over 2 and 3 that are implied by the first two
tautologies of (3.25) yield the following two tautologies. For example, the
first comes from ¬23F ≡ 3¬3F ≡ 32¬F .
The first one is obvious if we read 323 as eventually infinitely often, because
F is true at infinitely many times iff it is true at infinitely many times after
some time has passed. You can convince yourself that the second is true
by realizing that infinitely often F always true is equivalent to F being
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 68
always true starting at some time. Alternatively, you can show that the
first tautology implies the second by figuring out why each of the following
equivalences is true:
|= (F ; G) ∧ (G ; H ) ⇒ (F ; H )
(3.31) |= ((F ∨ G) ; H ) ≡ (F ; H ) ∧ (G ; H )
|= (F ; G) ∧ 2(G ⇒ H ) ⇒ (F ; H )
(3.32) |= ((∃ i ∈ S : F i ) ; H ) ≡ (∀ i ∈ S : (F i ; H ))
Here are three more tautologies involving ;; try to understand why they’re
true.
(3.34) |= 2(P ⇒ Q 0 ) ⇒ (P ; Q)
3.4.2.9 Warning
Although elegant and useful, temporal logic is weird. It’s not ordinary math.
In ordinary math, any operator Op we can define satisfies the condition,
sometimes called substitutivity, that the value of an expression Op(e 1 , . . . , e n )
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 70
affects all temporal logics and makes temporal logic reasoning tricky.
3.5 TLA
Math IV
Simple Recursive Definitions A recursive definition of a mapping M
is one in which M appears in its definition. (Mathematicians call them
inductive definitions.) A simple recursive definition defines a function f
with domain N by defining the value of f (0) to equal some expression not
containing f and, for every n > 0, defining f (n) in terms of f (n − 1). The
classic recursive definition is that of n ! (pronounced n factorial ), which
equals the product of the numbers from 1 through n, with 0! defined to
equal 1. If we consider n ! to be an abbreviation of !(n) for the function !,
we can define ! by
∆
! = n ∈ N 7→ if n = 0 then 1 else n ∗ !(n − 1)
∆
An arbitrary definition f = n ∈ N 7→ exp where f appears in the expression
exp does not necessarily define f to equal n ∈ N 7→ exp. For example, if
I had written !(n + 1) instead of !(n − 1) in the definition of !, then it’s
not obvious what that definition would mean. (Its meaning is defined in
Appendix Section A.1.9.) But all you need to know is that it would define
!(n) to be a meaningless expression for any value n.6
6
This particular definition would define ! to be a function with domain N, but with
!(n) a meaningless expression for any n. However, it’s usually not the case that such a
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 71
because HM requires every step to change the value of min, while σ must
change the value of sec in every step and the value of min in only every 60th
step.
It is just as crazy for an abstract program describing an hour-minute
clock not to describe a clock that also displays seconds as it is for a descrip-
tion of a planet’s motion no longer to describe that motion because of a
spacecraft that doesn’t affect the planet. It means that anything we’ve said
about the hour and minute display might be invalid if there’s also a second
display. And it doesn’t matter if the minute display is on a digital clock on
my desk and the second display is on a phone in my pocket. More generally,
it means if we’ve proved things about completely separate digital devices
and we look at those two devices at the same time, nothing we’ve proved
about them remains true unless those devices are somehow synchronized to
run in lock step. The more you think about it, the crazier it seems.
a finite behavior therefore asserts that the entire universe halts when the
program does. Those infinitely many stuttering steps, in which the value of
no variable of the program changes, allow other programs’ variables to keep
changing.
We can add those stuttering steps because of the observation that the
conversion from times to state numbers requires that a program variable be
allowed to change only at time t i for some i . It does not require that any
variable does change at that time. The mistake was writing descriptions
that, until the program halts, require some variable to change value at each
time t i . Instead, we should have added to the sequence of times t i times at
which no program variable changes. Adding such a time adds a step in which
other variables describing other programs can change while the program’s
variables remain unchanged. Thus, if the description allows a behavior σ,
then it should allow the behavior obtained by inserting stuttering steps of
the program in σ. This is easy to do. For the description of the hour/minute
display, we just change the definition of HM to
because hhr , min i0 equals hhr 0 , min 0 i, and two tuples are equal iff their
corresponding components are equal.
We can similarly fix every other example we’ve seen so far by changing
the next-state action Next in its RTLA description to Next ∨ (v 0 = v ), where
v is the tuple of all variables that appear in the RTLA formula. Since this
will have to be done all the time, we abbreviate A ∨ (v 0 = v ) as [A]v for any
action A and state expression v .
We can add stuttering steps to a pseudocode description of an algorithm
by adding a separate process that just takes stuttering steps. However,
we won’t bother to do this. We will just consider all pseudocode to allow
stuttering steps.
When HM is defined by (3.37), if HMS is true of a behavior then HM
is also true of the behavior. This remains true when HMS is modified to
allow stuttering steps. Thus, HMS implements HM , and |= HMS ⇒ HM is
true. Implementation is implication. How elegant!
There is an apparent problem with formula HM of (3.37). It allows
behaviors in which the program takes a finite number of steps (possibly zero
steps) and then takes nothing but stuttering steps. In other words, it allows
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 74
behaviors in which the clock stops. Most computer scientists will say that
we should never allow behaviors in which an abstract program stops when
it is possible for it to continue executing. This is because they are used
to thinking about traditional programs. In many cases, we don’t want to
require a concurrent abstract program to do something just because it can.
Never stopping is a liveness property. Taking only steps satisfying [Next]v
is a safety property. My experience has taught me that we should describe
safety properties separately from liveness properties, because we reason
about them differently and we should think about them differently. For-
mula HM describes the safety property that the hour-minute clock should
satisfy. We will see in Section 4.2 how we conjoin a liveness property to HM
if we want to require the clock to run forever. It is a feature not a problem
that this definition of HM asserts only what the clock may do and not what
it must do.
In general, the safety property of an abstract program is written in the
form Init ∧ 2[Next]v , where Init is the initial predicate and [Next]v is the
next-state action. The formula 2[Next]v always allows stuttering steps be-
cause [Next]v has the form . . .∨(v 0 = v ) , and v 0 = v allows stuttering steps.
However, v 0 = v allows lots of non-stuttering steps. In particular, it allows
steps in which any variable that does not appear in v can have any values
in the two states of the step. To describe an abstract program, the state ex-
pression v in 2[Next]v must ensure that v 0 = v allows only steps that do not
change any of the program’s variables. Therefore, unless stated otherwise,
in a formula of the form 2[Next]v where Next is the next-state action of a
program, the subscript v is assumed to be the tuple of all program variables.
(However, that subscript need not be called v .)
assertion depends only on the values a behavior assigns to the program’s vari-
ables, this condition is satisfied iff the assertion does not depend on whether
we add or remove steps that leave all variables unchanged. We’ve used the
term stuttering step to mean a step that leaves a program’s variables un-
changed. We will now call such a step a stuttering step of the program. We
define a stuttering step to be a step that leaves all variables unchanged.
A sensible predicate F on behaviors should satisfy the condition that the
value of [[F ]](σ) is not changed by adding stuttering steps to, or removing
them from, a behavior σ. This means that the value of [[F ]](σ) is not changed
even if an infinite number of stuttering steps are added and an infinite
number removed. (However, the behavior must still be infinite, so i++f σ
ends in an infinite number of stuttering steps, those steps can’t be removed.)
A predicate on behaviors satisfying this condition for all behaviors σ is called
stuttering insensitive, or SI for short. When describing abstract programs
or the properties they satisfy, we should use only SI predicates on behaviors.
To define SI precisely, we first define \(σ) to be the behavior obtained by
removing from the behavior σ all stuttering steps except those belonging to
an infinite sequence of stuttering steps at the end. We do this by defining
\(σ)(n) to equal σ(f σ (n)) where the function f σ is defined recursively by
f σ (0) = 0 and f σ (n) for n > 0 equals either the smallest value i greater
than f σ (n − 1) such that σ(i ) is unequal to σ(f σ (n − 1)), or else equals
f σ (n − 1) + 1 if σ stutters forever after state number f σ (n − 1).
To write the definition of f σ , we first let Min(S ) be the smallest element
of S for any set S of natural numbers. Such a smallest element exists for
any nonempty subset S of N, even if S is infinite. We next let n > be the
set {i ∈ N : i > n} of all natural numbers greater than n. The recursive
definition of f σ is then:
∆
f σ = n ∈ N 7→
if n = 0
then 0
else if ∀ i ∈ f σ (n −1)> : σ(i ) = σ(f σ (n −1))
then f σ (n −1) + 1
else Min( {i ∈ f σ (n −1)> : σ(i ) 6= σ(f σ (n −1))} )
We have been using the term property informally to mean some condition
on the behaviors of a system or abstract program. We now define it to mean
an SI predicate on behaviors. Behavior predicate still means any predicate
on behaviors, not just SI ones.
The formula 2[A]v asserts that action [A]v is true of all steps of a be-
havior. For reasoning about liveness, we will need to assert that an action is
true in some step of a behavior. The formula 3A is not SI for an arbitrary
action A because if A is true on some stuttering step, then 3A might be
false on a behavior σ and true on a behavior obtained by adding such a stut-
tering step to σ. However, if A does not allow stuttering steps, then adding
or removing stuttering steps doesn’t alter whether a behavior satisfies 3A,
so 3A is SI. Since A ∧ (v 0 6= v ) does not allow stuttering steps, the formula
3(A ∧ (v 0 6= v )) is SI for any state expression v . We define hAiv to equal
A ∧ (v 0 6= v ); and we let TLA contain all formulas 3hAiv , for any action A
and state expression v .
We can also see that 3hAiv is SI because of the tautology:
|= hAiv ≡ ¬[¬A]v
• A state predicate.
Abstract programs and the properties they satisfy should be TLA for-
mulas. However, we can use RTLA proof rules and even RTLA formulas
CHAPTER 3. DESCRIBING ABSTRACT PROGRAMS 78
when reasoning about TLA formulas. For example, we can prove that Inv
is an invariant of Init ∧ 2[Next]v by substituting [Next]v for Next in the
RTLA proof rule that (3.10) implies (3.13). This yields the following rule:
In this rule, the first |= means validity in LA while the second |= means
validity in TLA. A feature of TLA is that as much reasoning as possible is
done in LA, which becomes ordinary mathematical reasoning when the nec-
essary definitions are expanded and primes are distributed across operators,
so only variables are primed.
Chapter 4
79
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 80
4.1.1 Definitions
Safety and liveness properties have been described intuitively as specifying
what the program is allowed to do and what it must do. To define them
precisely, we begin by observing that they have these characteristics:
Safety If a behavior doesn’t satisfy a safety property, then we can point to
the place in the behavior where it violates the property. For example,
if a behavior doesn’t satisfy an invariance property, it violates the
property in the first state in which the invariant is false.
nonempty prefixes have the same initial state. The formula 2[A]v is a safety
property for any action A and state expression v ; here is the proof that every
nonempty finite prefix of a behavior σ satisfies 2[A]v iff σ satisfies 2[A]v .
It also follows easily from the definition of safety that the conjunction of
safety properties is a safety property. Therefore, as expected, the formula
Init ∧ 2[Next]v that we have been calling the safety property of a program
is indeed a safety property.
The property that asserts that a program halts is a liveness property.
That property is true of a behavior σ iff σ ends with infinitely many steps
that leave the program’s variables unchanged. It’s a liveness property be-
cause every finite behavior ρ is a prefix of its completion ρ↑ , which satisfies
the property.
Safety and liveness are conditions on properties, which are SI behavior
predicates. When we say that a TLA formula 2[A]v is a safety property,
we are conflating the formula with its meaning. It’s actually [[2[A]v ]] that
is the safety property.
S 12 = Init ∧ 2[Next]hx ,y i
∆
(4.2)
∆
Init = (x 6= 2) ∧ (y = (x = 1))
∆
Next = ∧ (x 0 = 2) ⇒ y
∧ y 0 = (y ∨ (x = 1))
It’s not obvious in what sense formula S 12 expresses property F12 , since
S 12 contains the variables x and y while F12 describes only the values of x .
Intuitively, S 12 makes the same assertion as F12 if we ignore the value of y.
Section 7.1 describes a TLA operator ∃ such that ∃ y : S 12 means S12 if we
ignore the value of y. We’ll then see that [[∃ y : S 12 ]] equals F12 . However,
there’s no need to introduce ∃ here. The relevant condition that S 12 satisfies
is that if G is any TLA formula that does not contain the variable y, then
a finite number of variables, and its value depends only on the values of
those variables. Remember that we are assuming that the language LA for
writing actions contains all the operators of ZF.
The theorem is expressed with the convention of letting a boldface iden-
tifier like x be the list x 1 , . . . , x n of subscripted non-bold versions of the
identifier, for some n. Thus, hxi is the tuple of those identifiers. The theo-
rem is a special case of Theorem 4.9 in Section 4.2.7 below, so the proof is
omitted.
However, we’ll see later why that’s not a good liveness property to use.
There’s another method of describing safety and liveness that helps me
understand them intuitively. It’s based on topology. The method and the
necessary topology are explained in Appendix Section A.6.
nothing about finite prefixes; they describe what must be true if the system
runs forever. Since we don’t live forever, why should we care about liveness
properties?
In theory, liveness is useless; but in practice it’s useful. Consider the
liveness property required of a traditional program: it eventually terminates.
In theory, that’s useless because it might not terminate in a billion years.
In practice, proving that a program will terminate within a given amount
of time isn’t easy. Proving that it eventually terminates is easier, and it is
useful because the program is certainly not going to terminate soon enough if
it never does. But proving liveness provides more than that. Understanding
why a program eventually terminates requires understanding what it must
do in order to finish. That understanding helps you decide if it will terminate
soon enough. This applies to other liveness properties as well.
Using a model checker doesn’t give you the understanding that you get
from writing a proof. However, using a model checker to check liveness prop-
erties is a good way to detect errors—both in the program you intended to
write and in what you actually wrote. A program that does nothing satis-
fies most safety properties, and an error in translating your intention into
mathematics might disallow behaviors in which the program fails to satisfy a
safety property. Checking that the program satisfies liveness properties that
it should can catch such errors, as well as errors in the program you wanted
to write. Section 5.1 discusses checking liveness to check if the program you
wrote is the one you wanted to write.
4.2 Fairness
Expressing mathematically the way computer scientists and engineers de-
scribed their algorithms and programs led us to describe the safety property
satisfied by an abstract program with the formula Init ∧ 2[Next]v , where
v is the tuple of all the program’s variables. We must conjoin to that for-
mula another formula to describe the program’s liveness property. To see
how this should be done, we first examine how scientists and engineers have
expressed liveness.
(in terms of program steps) a process that can execute a statement might
wait before executing it. Therefore, fairness came to mean simply that no
process should be starved.
In a program with a set Procs of processes, the next-state action is
defined by
∆
Next = ∃ p ∈ Procs : PNext(p)
However, this is not the way fairness should be expressed, and it is not an
appropriate liveness property for multiprocess programs. To see why, we
consider mutual exclusion algorithms.
set {0, 1} of processes. But for reasons that will be discussed later, it isn’t
considered to be an acceptable algorithm.
This pseudocode program is the first one we’ve seen with an await state-
ment. For a state predicate P , the statement await P can be executed only
when control is at the statement and P equals true. We could write the
statement a : await P as:
a: if ¬P then goto a end if
Executing this statement in a state with P equal to true just moves control
to the next statement. Executing it in a state with P equal to false does
not change the value of any program variable, so it’s a stuttering step of the
program. Since a stuttering step is always allowed, executing the statement
await P when P equals false is the same as not executing it. So, while we
can think of the statement await P continually evaluating the expression
P and moving to the next statement iff it finds P equal to true, mathe-
matically that’s equivalent to describing it as an action A such that E(A)
equals (pc = a) ∧ P .
This is also the first pseudocode we’ve seen with explicit array variables.
An array variable x is an array-valued variable, where an array is a function
and x [p] just means x (p). We’ve already seen implicit array variables—
namely, the local variables t and pc of program Increment are represented
by function-valued variables in Figure 3.5. I have decided to write x [p]
instead of x (p) in pseudocode to make the pseudocode look more like real
code. However, the value of an array variable can be any function, not just
(as in some coding languages such as C) a finite ordinal sequence; and we
write x (p) instead of x [p] when discussing the program mathematically. As
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 91
(4.6) ∧ TypeOK
∧ ∀ p ∈ {0, 1} : ∧ (pc(p) ∈ {w 2, cs}) ⇒ x (p)
∧ (pc(p) = cs) ⇒ (pc(1 − p) 6= cs)
where TypeOK is the type-correctness invariant:
∆
TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, cs, exit})
Let UMSafe be the safety property described by the pseudocode. We
want to conjoin a property UMLive to UMSafe to state a fairness require-
ment of the program’s behaviors. Let’s make the obvious choice of defining
UMLive to be formula (4.4) with Procs equal to {0, 1} and v equal to hx , pc i.
This implies that both processes keep taking steps forever, executing their
critical sections infinitely often, which makes it seem like a good choice.
Actually, that makes it a bad choice.
Algorithm UM is unacceptable because formula UMSafe, which de-
scribes the pseudocode, permits deadlock. If both processes execute state-
ment wait before either executes w 2, then the algorithm reaches the dead-
locked state in which neither await statement is enabled. Conjoining UMLive
to UMSafe produces a formula asserting that such a deadlocked state cannot
occur. It ensures the liveness property we want, that processes keep execut-
ing their critical sections. However, it does this not by requiring only that
processes keep taking steps, but also by preventing them from taking some
steps—namely, ones that produce a deadlocked state. A fairness property
shouldn’t do that.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 92
Before going further, let’s see why UMSafe ∧ UMLive doesn’t allow such
a deadlocked state to be reached. The reason is that the formula satisfies
this invariant:
The equivalence of (4.9) and (4.10) follows from this RTLA theorem, which
is proved in the Appendix.
It makes weak fairness look stronger than the definition because 23hAiv
is a stronger property than 3hAiv . Here’s an informal proof of (4.14).
The definition of WFv (A) implies the right-hand side of the equivalence be-
cause 32 EhAiv implies that eventually hAiv is always enabled, whereupon
WFv (A) keeps forever implying that an hAiv step occurs, so there must be
infinitely many hAiv steps, making 23hAiv true. The opposite implica-
tion is true because 2 EhAiv implies 32 EhAiv , so the right-hand side of
the equivalence implies that 23hAiv is true and hence 3hAiv is true. A
rigorous proof of (4.14) is by the following RTLA4 reasoning, substituting
EhAiv for F and hAiv for G:
algorithm OB , we define:6
∆
(4.17) TypeOK = ∧ x ∈ ({0, 1} → {true, false})
∧ pc ∈ ({0, 1} → {ncs, wait, w 2, w 3, w 4, cs, exit})
∧ pc(0) ∈
/ {w 3, w 4}
However, the resulting inductive invariant isn’t strong enough for proving
liveness. We now consider liveness.
Let OBSafe, the safety property of OB described by the pseudocode, be
the formula Init ∧ 2[Next]v , where v equals hx , pc i and
∆
Next = ∃ p ∈ {0, 1} : PNext(p)
The fairness condition we want OB to satisfy is weak fairness of each pro-
cess’s next-state action, except when the process is in its noncritical section.
A process p remaining forever in its noncritical section is represented in our
abstract program by no PNext(p) step occurring when pc(p) equals ncs.
The fairness condition we assume of program OB is therefore:
∆
OBFair = ∀ p ∈ {0, 1} : WFv ((pc(p) 6= ncs) ∧ PNext(p))
The formula OBSafe ∧ OBFair , which we call OB , satisfies the liveness
property that if process 0 is in its waiting section, then it will eventually
enter its critical section. That is, OB implies:
(4.18) (pc(0) ∈ {wait, w 2}) ; (pc(0) = cs)
This implies deadlock freedom, because if process 0 stops entering and leav-
ing its critical section, then it eventually stays forever in its noncritical
section. If process 1 is then in its waiting section, it will read x [0] equal to
false and enter its critical section.
The inductive invariant obtained from the inductive invariant of UM
isn’t strong enough because it doesn’t assert that x [p] = false when process
p is in its noncritical section, which is at the heart of why OB is deadlock
free. For that we need this stronger invariant, where TypeOK is defined by
(4.17):
(4.19) ∧ TypeOK
∧ x [0] ≡ (pc[0] ∈ {w 2, cs, exit})
∧ x [1] ≡ (pc[1] ∈ {w 2, w 3, cs, exit})
∧ ∀ p ∈ {0, 1} : (pc[p] = cs) ⇒ (pc[1 − p] 6= cs)
6
For any infix predicate symbol like = or ∈, putting a slash through the symbol negates
it, so e ∈/ S means ¬(e ∈ S ).
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 100
By the meaning of leads to, the property asserted by each formula F in the
graph means that if the program is ever in a state for which F is true, then
it will eventually be in a state satisfying a formula pointed to by one of the
outgoing edges from F . The graph has a single sink node (one having no
outgoing edge). Every path in the graph, if continued far enough, leads to
the sink node. By transitivity of the ; relation, this means that if all the
properties asserted by the diagram are true of a behavior, then the behavior
satisfies the property F ; H , where H is the sink-node formula and F
is any formula in the lattice. In particular, the properties asserted by the
diagram imply formula (4.18). By (3.32), that every formula in the graph
leads to the sink-node formula means that the disjunction of all the formulas
in the graph leads to the sink-node formula.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 101
Now to explain the box. Let Λ equal 2Inv ∧ 2[Next]v ∧ OBFair , the
formula that labels the box. Formula Λ is implicitly conjoined to each of the
formulas in the graph. It is a 2 formula, since the conjunction of 2 formulas
is a 2 formula, and OBFair is the conjunction of WF formulas, which are
2 formulas.
Since Λ is conjoined to every formula in it, the leads-to lattice makes
assertions of the form
Λ ∧ G ; (Λ ∧ H 1 ) ∨ . . . ∨ (Λ ∧ H j )
Since Λ equals 2Λ, and once 2Λ is true it is true forever, this formula is
equivalent to Λ ∧ G ; H 1 ∨ . . . ∨ H j . (This follows from (3.33c) and
propositional logic.)
If H is the unique sink node of the lattice, then proving the assertions
made by the lattice proves |= Λ∧G ; H for every node G of the lattice. By
definition of ; and (3.22), |= Λ ∧ G ; H implies |= 2Λ ⇒ (G ; H ). Thus,
if Λ is a 2 formula, then proving |= Λ ∧ G ; H proves |= Λ ⇒ (G ; H ).
In general, we label a box in a leads-to lattice only with a 2 formula.
Remember what the proof lattice of Figure 4.4 is for. We want to prove
that OB implies (4.18). By proving the assertions made by the proof lattice,
we show that the formula Λ labeling the box implies (4.18). By definition of
OB and because OB implies 2Inv , formula Λ is implied by OB . Therefore,
by proving the leads-to properties asserted by the proof lattice, we prove that
OB implies (4.18). Note how we had to use the 2 formula 2Inv ∧ 2[Next]v
instead of OBSafe, which is true only initially.
To complete the proof that OB implies (4.18), we now prove the leads-
to properties asserted by Figure 4.4. The leads-to property asserted by the
edges numbered 1 is:
It is trivially true, since pc(0) ∈ {wait, w 2} implies that pc(0) equals wait
or w 2, and |= F ⇒ G implies F ; G.
The formula Λ ∧ (pc(0) = wait) ; (pc(0) = w 2) asserted by edge
number 2 is true because Λ implies 2Inv ∧ 2[Next]v , which implies that
if pc(0) = wait is true then it must remain true until a PNext(0) step makes
pc(0) = w 2 true, and such a step must occur by the weak fairness assumption
of process 0, which Λ also implies.
The formula
The formula 2[Next2]v is implied by 2Inv and 2[Next]v and the conjunct
2(pc(0) 6= cs) of the formula at the tail of the edge 2 arrow. (Note that the
prime in this formula is valid because pc(0) 6= cs always true implies that
it’s always true in the next state.) We are using an invariance property of
one program to prove a liveness property of another program. This would
seem strange if we were thinking in terms of code. But we’re thinking
mathematically, and a mathematical proof contains lots of formulas. It’s
not surprising if one of those formulas looks like the formula that describes
a program.
The edges numbered 3 enter a box whose label is the same formula from
which those edges come. In general, an edge can enter a box with a label
2F if it comes from a formula that implies 2F . This is because a box
labeled 2F is equivalent to conjoining 2F to all the formulas in the box,
and 2F ; (G 1 ∨ . . . ∨ G n ) implies 2F ; ((2F ∧ G 1 ) ∨ . . . ∨ (2F ∧ G n )).
An arrow can always leave a box, since removing the formula it points to
from the box just weakens that formula.
Proofs of the assertions represented by the rest of the lattice’s edges are
sketched below.
edges 3 The formula represented by these edges is true because the dis-
junction of the formulas they point to asserts that pc(1) is in the set
{ncs, wait, w 2, w 3, w 4, cs, exit}, which is implied by 2Inv .
edges 4 If pc(1) equals cs or exit, then 2Inv ∧ 2[Next]v and the fairness
condition for process 1 imply that it will eventually be at ncs. Either
pc(1) equals ncs forever or eventually it will not equal ncs. In the
latter case, 2[Next]v implies that the step that makes pc(1) = ncs
false must make pc(1) = wait true.
edge 8 2¬x (1), 2(pc(0) = w 2) (implied by the inner box’s label), and
OBFair imply that a process 0 step that makes pc(0) equal to cs must
eventually occur. (Equivalently, these three formulas are contradic-
tory, so they imply false which implies anything.)
The proof sketches of the properties asserted by edges 4 and edge 6 skim
over more details than the proofs of the other properties asserted by the
lattice. A more detailed proof would be described by a lattice in which each
of the formulas pointed to by the edges numbered 3 were split into multiple
formulas—for example, the formula pc(1) ∈ {cs, exit, ncs} would be split
into the formulas pc(1) = cs, pc(1) = exit, and pc(1) = ncs. A good check
of your understanding is to draw the more detailed lattice and write proof
sketches for its new edges.
P (sem) V (sem)
await sem = 1; sem : = 1
sem : = 0
Locks were originally implemented with operating system calls. Modern
multiprocessor computers provide machine instructions to implement them.
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 105
variables sem = 1 ;
process p ∈ Procs
while true do
ncs: skip ;
wait: P (sem) ;
cs: skip ;
exit: V (sem)
end while
end process
Using a lock, mutual exclusion for any set Procs of processes can be imple-
mented with the trivial algorithm LM of Figure 4.6.
Let PNext(p) now be the next-state action of process p of program LM .
With weak fairness of (pc(p) 6= ncs) ∧ PNext(p) for each process p as its
fairness property, algorithm LM satisfies the deadlock freedom condition
(4.16). However, deadlock freedom allows individual processes to be starved,
remaining forever in the waiting section.
Let Wait(p), Cs(p), and Exit(p) be the actions described by the state-
ments in process p with the corresponding labels wait, cs, and exit. Weak
fairness of (pc(p) 6= ncs) ∧ PNext(p) is equivalent to the conjunction of
weak fairness of the actions Wait(p), Cs(p), and Exit(p). Program LM al-
lows starvation of individual processes because weak fairness of the Wait(p)
actions ensures only that if multiple processes are waiting to execute that
action, then some process will eventually execute it. But if processes con-
tinually reach the wait statement, some individual processes p may never
get to execute Wait(p).
It’s reasonable to require the stronger condition of starvation freedom,
which asserts that no process starves. This is the property
(4.21) ∀ p ∈ Procs : (pc(p) = wait) ; (pc(p) = cs)
which asserts that any process reaching wait must eventually enter its critical
section. For LM to satisfy this property, it needs a stronger fairness property
than weak fairness of the Wait(p) actions.
The informal justification and the proof of (4.23) are similar to the ones for
(4.14). The proof of (4.24) is essentially the same as that of (4.15).
Exit(p). This is because the action is enabled iff pc(p) has the appropriate
value, so it remains enabled until a step of that action occurs to change
pc(p). Thus, when the action is enabled, it is continuously enabled until it
is executed. We can therefore write LMFair as the conjunction of strong
fairness of the three actions Wait(p), Cs(p), and Exit(p).
The same sort of reasoning that led to (4.13) of Section 4.2.3, as well as
Theorem 4.8 of Section 4.2.7, imply that the conjunction of strong fairness
of these three actions is equivalent to strong fairness of their disjunction.
Therefore, we can write LMFair as strong fairness of their disjunction, which
equals (pc(p) 6= ncs) ∧ PNext(p).
While SFv ((pc(p) 6= ncs) ∧ PNext(p)) is compact, I prefer not to define
LMPFair (p) this way because it suggests to a reader of the formula that
strong fairness of Cs(p) and Exit(p) is required, although only weak fairness
is. Usually, the process’s next-state action will be the disjunction of many
actions, and strong fairness is required of only a few of them. I would define
LMFair to equal
This is redundant because the first conjunct implies weak fairness of Wait(p)
and the second conjunct asserts strong fairness of it. But a little redundancy
doesn’t hurt, and its redundancy should be obvious because strong fairness
implies weak fairness.
Theorem 4.7 Let Init be a state predicate, Next an action, and v a tuple
of all variables occurring in Init and Next. If Ai is a subaction of Next for
all i in a countable set I , then the pair
where U is the until operator with which we first defined weak fairness
of PNext(p) as (4.9). Similarly to what we did for weak fairness, we can
remove the U by observing that F U G implies that if G is never true,
then F must remain true forever. That hAi iv is never true is asserted by
¬3hAi iv , which is equivalent to 2[¬Ai ]v . Therefore (4.25) implies
While (4.25) implies (4.26), the formulas are not equivalent. Formula (4.26)
is strictly weaker than (4.25). However, it’s strong enough to imply that
strong or weak fairness of all the Ai is equivalent to strong or weak fairness
of Q—assuming that Q is the disjunction of the Ai . Here is the precise
theorem. Its proof is in the Appendix.
newpage added
to make
hyperlink to
theorem work
CHAPTER 4. SAFETY, LIVENESS, AND FAIRNESS 109
∆
Theorem 4.8 Let Ai be an action for each i ∈ I , let Q = ∃ i ∈ I : Ai ,
and let XF be either WF or SF. Then
|= ( ∀ i ∈ I : 2( EhAi iv ∧ 2[¬Ai ]v ⇒
2[¬Q]v ∧ 2( EhQ iv ⇒ EhAi iv ) )
⇒ ( XFv (Q) ≡ ∀ i ∈ I : XFv (Ai ) )
Interlude
We have seen how to use TLA to write abstract programs and show that
they satisfy simple safety and liveness properties. In this chapter, we pause
in our development to consider two problems. The first is determining if an
abstract program written in TLA expresses what we want it to. We consider
an approach to this problem that is different from what we have been doing—
determining what the program might do, rather than what it must or must
not do. The second problem is describing and reasoning about the real-time
behavior of systems. I hope that seeing how this problem is addressed with
TLA helps you appreciate the power of thinking of an abstract program as
a predicate on behaviors rather than a generator of behaviors.
111
CHAPTER 5. INTERLUDE 112
it if we see such a behavior. But seeing one behavior that doesn’t satisfy it
doesn’t tell us whether or not some other behavior might satisfy it. However,
we will see that possibility conditions can be expressed as properties that
explicitly mention the program’s actions.
Knowing that something might be true of a system, but knowing nothing
about the probability of its being true, is of almost1 no practical use. The
only way I know of calculating such probabilities is to view the abstract
program as a state-transition system, attach probabilities to the various
transitions, and mathematically analyze that system—for example, using
Markov analysis [51]. Usually, the state-transition system would be a more
abstract program implemented by the program of interest.
While possibility conditions of systems are of little interest, we don’t
reason about systems; we reason about abstract programs that describe sys-
tems. Verifying that a program satisfies a possibility condition can be a way
of checking what we will call here the accuracy of an abstract program—that
it accurately describes the system it is supposed to describe. For example,
if the system doesn’t control when users send it input, a program that ac-
curately describes the system and its users should satisfy the condition that
it’s always possible for users to enter input.
model checker called TLC reports how many different steps satisfying Input
occur in behaviors of the program. If it finds no such steps, then it is not
always possible for an Input step to occur with either definition of “always
possible”. Finding too few such steps can also be an indication that the
program is not accurate.
Accuracy of an abstract program cannot be formally defined. It means
that a program really is correct if it implements the abstract program. In
other words, an abstract program is accurate iff it means what we want it
to mean, and our desires can’t be formally defined. That accuracy can’t
be formally defined does not mean it’s unimportant. There are quite a few
important aspects of programs that lie outside the scope of our science of
correctness.
Math VI
Definitions Within an Expression The scope of an ordinary definition
includes everything that comes logically after it in the current context, which
in this book might end at the end of the current section. It’s sometimes
convenient to make a definition whose scope is limited to a single expression
exp. This is often done for “common subexpression elimination”, where
exp contains multiple occurrences of the same subexpression subexp, and we
want to give that subexpression a name nm and rewrite exp as the expression
nmexp in which each occurrence of subexp has been replaced by nm. We do
that by rewriting exp as:
∆
let nm = subexp in nmexp
Note that if exp occurs in the scope of bound variables and we wanted to do
this with an ordinary definition, we would have to define nm as an operator
with one argument for each of those bound variables. The general form of
the let/in construct is
∧ pc(p) = wait
∧ x = none
∧ pc 0 = (pc except p 7→ w 1)
∧ x0 = x
With no time constraints, mutual exclusion is easily violated. Two processes
can execute the wait statement when x equals none, then statements w 1 and
w 2 can both be executed by the first process and then by the second one,
putting both processes in the critical section. Mutual exclusion is ensured
by timing constraints.
We assume that each step is executed instantaneously at a certain time,
and that each process executes w 1 at most δ seconds after it executes wait
and executes w 2 at least seconds after it executes w 1, for constants δ and
with δ < . (The algorithm doesn’t specify what the time units are; we
will call them seconds for convenience.) It’s a nice exercise to show that this
ensures mutual exclusion by assuming that two processes are in their critical
sections and showing that the necessary reads and writes of x that allowed
them both to enter the critical section must have occurred in an order that
violates the timing constraints if δ < . While it may be good enough for
4
The expression {v ∈ S : v ∈ T } is ambiguous; it could be either a subsetting or a
mapping constructor. We will never write such an expression, so we won’t worry about
which it is.
CHAPTER 5. INTERLUDE 118
variables x = none ;
process p ∈ Procs
variables pc = ncs ;
while true do
ncs: skip ; noncritical section
wait: await x = none ;
w 1: x : = p ;
w 2: if x 6= p then goto wait end if ;
cs: skip ; critical section
exit: x : = none
end while
end process
hand side of the :∈ . The action can assign to now any value t greater than
its current value, subject to the condition that t ≤ rt(p)+δ for every process
p at control point w 1. It is this condition that enforces the requirement that
a process must execute statement w 1 within δ seconds of when it executes
the wait statement.
Fischer’s Algorithm illustrates the basic method of representing real-
time constraints in an abstract program. Lower bounds on how long it
must take to do something are described by enabling conditions on the
algorithm’s actions. Upper bounds are described by enabling conditions on
the action that advances time. There are a number of ways of enforcing these
bounds. The use of the variable rt in Fischer’s algorithm shows one way.
Another is to use variables whose values are the number of seconds remaining
before an action must be executed (lower bounds) or can be executed (upper
bounds)—variables whose values are decremented by the time-advancing
CHAPTER 5. INTERLUDE 120
action.
The idea of an abstract program constraining the advance of time is
mind-boggling to most people, since they view a program as a set of in-
structions. They see it as the program stopping time. You should by now
realize that an abstract program is a description, not a set of instructions.
It describes a universe in which the algorithm is behaving correctly. That
description may constrain the algorithm’s environment, which is the part
of the universe that the algorithm doesn’t control—for example, its users.
Time is an important part of that environment if the amount of time it takes
to perform the algorithm’s actions is relevant to its correctness.
∀ p ∈ Procs :
∧ (pc(p) = w 1) ⇒ (rt(p) ≤ now ≤ (rt(p) + δ))
∧ (pc(p) ∈ {cs, exit}) ⇒ (x = p) ∧ (∀ q ∈ Procs : pc(q) 6= w 1)
∧ (pc(p) = w 2) ∧ (x = p) ⇒
∀ q ∈ Procs : (pc(q) = w 1) ⇒ ((rt(q) + δ) < (rt(p) + ))
You should understand why the three conjuncts in this formula are the three
assertions expressed informally above. Adding the type-correctness part and
proving that it is an inductive invariant is a good exercise if you want to
learn how to write proofs.
CHAPTER 5. INTERLUDE 121
which asserts that the value of time is unbounded. However, this isn’t neces-
sarily a fairness property. It’s easy to write an abstract program that allows
only Zeno behaviors, so conjoining the liveness property (5.3) produces a
CHAPTER 5. INTERLUDE 122
program that allows no behaviors. For example, we can add timing con-
straints to the program of Figure 5.1 that require a process both to execute
statement w 1 within δ seconds after executing statement wait and to wait
at least seconds after executing wait before executing w 1, with δ < . If a
process executes wait at time t, then now ≤ t + δ must remain true forever.
If we added fairness properties that required processes eventually to reach
the wait statement and execute it if it’s enabled, then the program would
allow only Zeno behaviors.
We can ensure that Fischer’s Algorithm satisfies (5.3) by having it require
an appropriate fairness condition on the advancing of time. The condition
we need is strong fairness of the action timeStep ∧ (now 0 = exp), where exp
is the largest value of now 0 permitted by the values of rt(p) for processes p
with control at w 1, or now + 1 if there is no such process. More precisely:
∆ ∆
exp = let T = {rt(p) + δ : p ∈ {q ∈ Procs : pc(q) = w 1}}
in if T = {} then now + 1 else Min(T )
where Min(T ) is the minimum of the nonempty finite set T of real numbers.
With this fairness condition on advancing time and the conjunction of the
fairness conditions for the processes in Procs, Fischer’s Algorithm satisfies
(5.3) and the proof sketch that the algorithm is deadlock free can be made
rigorous.
If we are interested only in safety properties, there is no need for an
abstract program to rule out Zeno behaviors. A program satisfies a safety
property iff all finite behaviors allowed by the program satisfy it, and a Zeno
behavior is an infinite behavior. In many real-time programs, liveness prop-
erties are of no interest. Correctness means not that something eventually
happens but that it happens within a certain length of time, which is a
safety property. Zeno behaviors then make no difference, and there is no
reason to disallow them.
Even if Zeno behaviors don’t matter, the absence of non-Zeno behaviors
can be a problem. Since real time really does increase without a bound,
an abstract program in which it is not always possible for time to become
arbitrarily large is unlikely to be accurate. Therefore, we almost always
want to ensure that a real-time program satisfies the condition that for any
t ∈ R, it is always possible for now > t to be true. This is true iff, for
any t ∈ R, from any reachable state of the program it is always possible
for now > t to be true. This is the kind of possibility condition considered
in Section 5.1. We saw there that if the program Safe is a safety property
that satisfies this condition, then we can verify that it does so by finding a
CHAPTER 5. INTERLUDE 123
for some expression exp containing the bound variable t and other vari-
ables [31].
It may seem that a representation of the behavior of a continuous process
by a sequence of discrete states would not be sufficiently accurate. For
example, if it is required that the pressure not be too high, violation of that
requirement would not be found if it occurred during the time between two
successive states of the behavior. This is not a problem because correctness
means that a property is true of all possible behaviors, and the possibility of
the pressure being too high at some time is revealed by a behavior containing
a state in which now equals that time.
Other than the differences implied by the use of continuous math, such
as calculus in (5.5), rather than discrete math, proving properties of hybrid
programs is the same as proving properties of other real-time abstract pro-
grams. Automatic tools like model checkers for ordinary abstract programs
seem to be unsuitable for checking abstract programs in which variables
represent continuously varying quantities. Methods have been developed
for checking such programs [11].
Chapter 6
Refinement
Data Refinement A program refining another program can also refine the
representation of data used by the higher-level program. This will be
illustrated by refining a higher-level program that uses numbers with
a program that implements a number by a sequence of digits.
Refinement usually involves both step and data refinement, with step re-
finement manifest as operations on the lower-level data requiring more non-
stuttering steps than the corresponding operations on the higher-level pro-
gram’s data. As we saw with the example of the hour-minute and hour-
minute-second clocks in Section 3.5.2, without data refinement, a program
S is refined by a program T means that T implies S . We will see that
126
CHAPTER 6. REFINEMENT 127
with data refinement, T refines S means that T implies the formula ob-
tained from S by substituting expressions containing the variables of T for
the variables of S . To describe this precisely, we need some notation for
substitution.
Mathematicians have no standard way of describing substitution, and
the notation I’ve seen used by computer scientists is impractical for the
formulas that arise in describing programs. The notation used in this book,
illustrated with substitution for three variables, is that
( p ∗ (q + r ) with q ← r , r ← q + s ) = p ∗ (r + (q + s))
where the v i are the variables of S and the exp i are expressions containing
the variables of T . The same thing applies to the constants of T and S ,
so the v i are both the variables and constants of S . If v i is a constant,
then exp i is usually a constant expression. We adopt the convention that
if the expression exp i (which may be a constant or variable) of T has the
same name as the constant or variable v i of S , then we omit the substitution
v i ← v i . (This is a common case, because the programs S and T are usually
abstract views of the same system, where an expression of T and a variable
or constant of S have the same name only if they describe the same part of
the system’s state.)
CHAPTER 6. REFINEMENT 128
Math VII
Sequence Operators We now define some operators for sequences. They
are defined for both ordinal and cardinal sequences. For any sequences σ
and τ such that σ is finite, we define σ ◦ τ to be the sequence obtained by
concatenating σ and τ . For example, h4, 5i ◦ h1, 2i equals h4, 5, 1, 2i, and
(4 → 5) ◦ (1 → 2 → 3 → · · ·) equals 4 → 5 → 1 → 2 → 3 → · · · .
For any (finite or infinite) nonempty sequence σ, we define Head (σ) to be
the first item of σ and Tail (σ) to be the sequence obtained by removing the
first element of σ. For example, Head (h1, 2, 3i) and Head (1 → 2 → 3 → · · ·)
both equal 1; Tail (h1, 2, 3i) equals h2, 3i; and Tail (1 → 2 → 3 → · · ·)
CHAPTER 6. REFINEMENT 129
Exactly what does it mean for AddS to refine Add ? I believe the natural
definition is: If we look at any behavior of AddS and interpret the numbers
represented by the sequences u, v , and w of digits to be the values of x ,
y, and z , and we interpret the value of fin to be the value of end , then we
get a behavior of Add . More precisely, let “←” mean “is represented by”.
That AddS refines Add means that a behavior satisfying AddS represents a
behavior satisfying Add with this representation of the variables of Add in
terms of the variables of AddS :
(6.1) x ← Val (u) y ← Val (v ) z ← Val (w ) end ← fin
Here’s an example to illustrate this, where the first two-state sequence is
a finite behavior satisfying AddS and the second two-state sequence is the
finite behavior it represents. Remember that AddS leaves unspecified the
value of w in an initial state and the values of u and v in a halting state.
A “?” in the second behavior means the value is unspecified because, as
explained in Section 2.6, Val (seq) is a meaningless expression if seq isn’t a
sequence of numbers.
u :: h1, 2, 3i u :: h5i
v :: h3, 2i v :: −27
w ::
√
2
→
w :: h4, 4, 3i
f in :: false 0 f in :: true 1
x :: 321 x :: 5
y :: 23 y :: ?
z :: ?
→
z :: 344
is true for every S -behavior σ. And this equivalence is true if Add is replaced
by any RTLA formula containing only the variables of Add . It’s true because
the values of the variables of Add in the state s and the values of the variables
of AddS in the state f (s) are related by (6.1).
CHAPTER 6. REFINEMENT 132
The algorithm’s define statement defines digit to equal the indicated ex-
pression within that statement. The value of bn/10c is the greatest integer
less than or equal to n/10. To simplify the invariant, AddSeq specifies the
initial value of carry to equal 0 and ensures that it equals 0 at the end. Since
the low-order digit of a two-digit number n is n % 10 and its high-order digit
is bn/10c, it should be clear that AddSeq describes an algorithm for adding
two decimal numbers. (If it’s not, execute it by hand on an example.)
The usual way to express correctness of a program that computes a value
sum and stops is with an invariant asserting that if the program has stopped
then sum has the correct value. We can’t do that with AddSeq because the
correct value of sum is the initial value of u ⊕ v , and those initial values
CHAPTER 6. REFINEMENT 133
The key part of an inductive invariant to prove (6.4) is the assertion that
ans equals the final value of sum. A first approximation to the final value
of sum is:
sum ◦ (hcarry i ⊕ (u ⊕ v ))
in one non-stuttering step. And we don’t have to add the constant ans to
do it.
Under the refinement mapping, one step in an execution of AddSeq must
refine a NextA step of Add ; all the other steps must refine stuttering steps
of Add . The initial values of the variables x and y of Add should equal the
initial values of Val (u) and Val (v ). The initial values of u and v are no
longer deducible from the state after AddSeq takes its first step. This tells
us that the NextA step of Add must be refined by the first non-stuttering
step of AddSeq.
An Add step changes the value of its variable done from false to true.
So, the refinement mapping must assign to done an expression whose value
is changed from false to true by the first non-stuttering step of AddSeq.
Since further steps of AddSeq refine stuttering steps of Add , the expression
assigned to done must remain true for the rest of the execution of Add . A
suitable expression is sum 6= h i, so we let the refinement mapping include
done ← sum 6= h i.
In the initial state of AddSeq, the refinement mapping should assign to x
and y the values of u and v . Since Add allows x and y to have any values in
its final state, it doesn’t matter what values the refinement mapping assigns
to x and y after the first step of AddSeq. However, since later steps must
refine stuttering steps of Add , the values of x and y must not change. Zero
seems like a nice value to let x and y equal when their value no longer
matters, so we let the refinement mapping include:
never found out how their program implemented consensus. But based on
the state of the art of programming at the time, here is what their program
might have done.
A process called the leader, running on a single computer, receives all
input requests and decides what input should be chosen next. A new leader
will have to be selected if the initial leader fails, but we’ll worry about that
later. (Failure of a process usually means failure of the computer executing
the process.) For the system to keep running despite the failure of individual
computers, a set of processes called acceptors, each running on a different
computer, have to know what value was chosen. Moreover, only a subset of
the acceptors should have to be working (that is, not failed) for an input to
be chosen. If an input v is chosen by a leader and a set of acceptors, and
the leader and those acceptors fail, then a different leader and a different set
of acceptors must not choose an input different from v . The obvious way
to ensure that is to require a majority of the acceptors to agree upon the
input v in order for that input to be chosen. Any two majorities have at
least one acceptor in common, and that acceptor will know that it agreed
to the choice of v .
This reasoning leads to the following algorithm: The leader decides what
input v should be chosen. It sends a message to the acceptors saying that
they should agree to the choice of v . Any working acceptor that receives the
message replies to the leader with a message saying “v is OK”. When the
leader receives such an OK message from a majority of acceptors, it sends
a message to all the acceptors telling them that v has been chosen.
This algorithm works fine, and the system keeps choosing a sequence of
inputs, until the leader fails. At that point, a new leader is selected. The
new leader sends a message to all the acceptors asking them what they’ve
done. In particular, the new leader finds out from the acceptors if inputs
were chosen that it was unaware of. It also finds out if the previous leader
had begun trying to choose an input but failed before the input was chosen.
If it had, then the new leader completes the choice of that input. When the
new leader has received this information from a majority of acceptors, it
can complete any uncompleted choices of an input and begin choosing new
inputs. Let’s call this algorithm the naive consensus algorithm.
There’s one problem with the naive algorithm: How is the new leader
chosen? Choosing a single leader is just as hard as choosing a single input.
The naive consensus algorithm thus assumes the existence of a consensus
algorithm. However, because leader failures should be rare, choosing a leader
does not have to be done efficiently. So, programmers would probably have
approached the problem of choosing a leader the way they approached most
CHAPTER 6. REFINEMENT 138
there are enough votes cast for the value in the ballot. More precisely, we
define ChosenAt(b, v ) to be true iff a majority of acceptors has voted for v in
ballot b. The Voting algorithm implements the Consensus abstract program
under the refinement mapping
(6.6) chosen ← {v ∈ Value : ∃ b ∈ N : Chosen(b, v )}
In addition to votes, the algorithm has one other variable maxBal whose
value is a function that assigns to each acceptor a a number maxBal (a).
The significance of this number is that a will never in the future cast a vote
in any ballot numbered less than maxBal (a). The value of maxBal (a) is
initially 0 and is never decreased. The algorithm can increase maxBal (a) at
any time.
It may seem strange that the state does not contain any information
about what processes have failed. We are assuming that a failed process does
nothing. Since we are describing only safety, a process is never required to
do anything, so there is no need to tell it to do nothing. A failed process that
has been repaired can differ from a process that hasn’t failed because it may
have forgotten its prior state when it resumes running. A useful property
of a consensus algorithm is that, even if all processes fail, the algorithm can
resume its normal operation when enough processes are repaired. To achieve
this, we require that a process maintains its state in stable storage, so it is
restored when a failed process restarts. A process failing and restarting is
then no different from a process simply pausing.
The heart of the Voting algorithm is a state expression SafeAt(b, v ) that
is true iff ChosenAt(c, w ) is false and will remain false forever for any c < b
and w 6= v . That it will remain false forever can be deduced from the current
state, because the next-state action implies both that a process a will not
cast a vote in ballot c when c < maxBal (a) and that maxBal (a) can never
decrease. The key invariant maintained by the algorithm is
(6.7) ∀ a ∈ Acceptor , b ∈ N, v ∈ Value :
(hb, v i ∈ votes(a)) ⇒ SafeAt(b, v )
where Acceptor is the set of acceptors. The next-state action allows a process
a to perform either of two actions:
• Increase maxBal (a). This action is always enabled.
• Vote for a value v in a ballot numbered b. As already explained, this
action is enabled only if no process has voted for a value other than
v in ballot b and b ≥ maxBal (a). An additional enabling condition is
required to maintain the invariance of (6.7).
CHAPTER 6. REFINEMENT 142
I have given you all the information you need to figure out the definition
of SafeAt(b, v ) and the enabling condition on acceptors needed to maintain
the invariance of (6.7). Can you do it? Few people can. I was able to
only because I had simplified the problem to finding an abstract program
whose only processes are the acceptors and whose state consists only of the
set of votes cast and the value of maxBal . I had abstracted away leaders,
messages, and failures.
The Voting algorithm requires an acceptor to know the current state of
other acceptors to decide what vote it can cast. How can this lead to a
distributed consensus algorithm? I abstracted away leaders and messages; I
didn’t ignore them. I knew that an acceptor didn’t have to directly observe
the state of other acceptors to know that they hadn’t voted for some value
other than v in a ballot. The acceptor could know that because of a message
it received from a leader. I also knew that it could deduce that the other
enabling conditions were satisfied from messages it received. Abstracting
away leaders and messages enabled me to concentrate on the core problem
of achieving consensus. The solution to that problem told me what the
leaders should do and what messages needed to be sent.
a message can be inferred from the message, so we can just have a set of
messages. Paxos tolerates the same message being received multiple times
by a process, so there is no need to remove a message when it is received.
This means that if the same message is sent to multiple recipients, there
is no need for multiple copies of the message. There is also no need for a
separate action of receiving a message. An action that should be taken upon
receipt of a message simply has the existence of that message in the set of
sent messages as an enabling condition. Paxos tolerates message loss. But
since we are describing safety, there’s no difference between a lost message
and a message that is sent but never received. So, there is no need ever to
remove messages that have been sent.
We can therefore represent message passing with a variable msgs whose
value is the set of all messages that have been sent. A message is sent by
adding it to the set msgs. The presence of a message in msgs enables an
action that should be triggered by the receipt of the message. The algorithm
has a variable maxBal that implements the variable of the same name in
the Voting algorithm. It also has two other variables maxVBal and maxVal
whose values are functions with domain the set of acceptors. They are
explained below.
The Paxos consensus algorithm can be viewed as a multiprocess algo-
rithm containing two sets of processes: the acceptors that implement the
acceptors of the Voting algorithm, and an infinite set of processes, one for
each natural number, where process number b is the leader of ballot num-
ber b. More precisely, the ballot b leader orchestrates the voting by the
acceptors in ballot b of the Voting algorithm.
The next-state action of the algorithm could be (but isn’t literally) writ-
ten in the form ∃ b ∈ N : BA(b) where BA(b) describes how ballot b is per-
formed. The ballot consists of two phases. In phase 1, the ballot b leader
sends a message to the acceptors containing only the ballot number b. An
acceptor a ignores the message unless b > maxBal (a), in which case it sets
maxBal (a) to b and replies with a message containing a, b, maxVBal (a),
and maxVal (a). When the ballot b leader receives those messages from a
majority of the acceptors, it can pick a value v to be chosen, where v is ei-
ther a value picked by the leader of a lower-numbered ballot or an arbitrary
value. The complete algorithm describes how it picks v . Phase 2 begins
with the leader sending a message to the acceptors asking them to vote for
v in ballot b. An acceptor a ignores the message unless b ≥ maxBal (a), in
which case a sets maxBal (a) to b and replies with a message saying that it
has voted for v in ballot b.
The Paxos algorithm implements the Voting algorithm under a refine-
CHAPTER 6. REFINEMENT 144
The proof uses identifiers from the definition of OB and ones from the
definition of LM . To avoid confusion, we indicate to which program an
identifier belongs with a subscript. We defined OB to equal:
Init ∧ 2[Next]v ∧ Fair
∆
where Next = ∃ p ∈ {0, 1} : PNext(p)
∆
Fair = ∀ p ∈ {0, 1} : WFv (PNext(p))
We now add subscripts to that definition, so OB equals:
InitOB ∧ 2[NextOB ]vOB ∧ FairOB
We assume LM has the same definition, except with the subscripts LM .
We sometimes use subscripts even when they aren’t necessary—for example
writing xOB even though LM has no variable named x .
We define OBSafe and OBFair as in Section 4.2.5.2, so OB equals
OBSafe ∧ OBFair . We define LMSafe and LMFair similarly. As expected,
safety and liveness are proved separately. We first show that OBSafe re-
fines LMSafe. (By machine closure of hOBSafe, OBFair i and Theorems 4.3
and 4.5, OBFair isn’t needed to prove that OB refines LMSafe.) We then
show that OB implies LMFair . However, first we must define the refinement
mapping under which OB implements LM .
Math VIII
Hierarchical Proofs Thus far, our structured proofs have consisted of a
single list of steps. That doesn’t work for the long proofs needed to prove
complex results, such as correctness of the abstract programs that engineers
write. The method of handling complexity that’s obvious to an engineer is
hierarchical structuring. With structured proofs, a proof consists of either a
paragraph proof or a sequence of steps, each step having a proof. The last
step in a proof that consists of a sequence of steps is a Q.E.D. step.
The steps of a top-level proof are numbered 1, 2, etc. The steps of a
proof of step number 2 are numbered 2.1, 2.2, . . . . The steps of a proof of
step number 3.4 are 3.4.1, 3.4.2, . . . and so on. The lowest-level proofs are
paragraph proofs. A step can be used only in the proof of later steps of the
same proof. For example, the assertion proved as step 3.4.2 can be used in
the proofs of steps 3.4.4 and 3.4.5.1, but not in the proof of step 3.5. This
ensures that a step is used only where the assumptions under which the step
was proved still hold.
This numbering scheme works for three or four levels. For deeper proofs,
we can abbreviate step number 2.7.4 as h3i4 because it’s step number 4 of
CHAPTER 6. REFINEMENT 147
a depth-3 proof. Although many step numbers can have the same abbrevi-
ation, at most one of those steps can be used at any point in the proof [38].
For reliable proofs, the paragraph proofs should be short enough so
they’re easy to understand and obviously correct. If a paragraph proof isn’t
obviously correct, it should be decomposed into a sequence of steps. Some
steps need deeper proofs than others. The proofs of Q.E.D. steps should
usually be paragraphs. My rule of thumb is to decompose a proof until I’m
sure that every paragraph proof is correct, then decompose the paragraph
proofs one level further. The proofs in this book haven’t been carried down
to that level. This was to keep the book from being too long, and because no
program will crash if there’s a small mistake in one of the book’s theorems.
For machine-checked proofs, paragraph proofs are replaced by instruc-
tions to the prover. The proof must be decomposed into steps that are simple
enough for the prover to check, which may sometimes be infuriatingly sim-
ple. This should eventually change as machine learning is applied to proof
checking. Some proof checkers don’t support hierarchical structuring. They
require you to do the structuring by hand. If you don’t structure the proof,
you will wind up with an unmanageable unstructured mass of lemmas when
trying to prove the correctness of a complex abstract program.
Suffices: Assume: E
Prove: F
changes the current goal to F , and it adds E to the set of formulas that can
be assumed true by the following steps of the current proof. The proof of
this statement is the same as the proof of Suffices: E ⇒ F .
CHAPTER 6. REFINEMENT 148
We’ll be using a lot of formulas that are obtained from formulas FLM by
making the substitutions defined by the refinement mapping for the variables
of LM . To keep from having lots of withs, we use this abbreviation, for
any formula FLM :
∆
FLM = (FLM with pcLM ← pcBarOB , semLM ← semBarOB )
imply actions of the form hALM ivLM . For proving that OB implies LMSafe,
we need only the weaker assertions obtained by replacing such an action by
ALM . However, we will need the stronger assertions later for proving that
OB implies LMLive.
R1. |= InvOB ∧ NcsOB (p) ⇒ NcsLM (p)
R2. |= InvOB ∧ WaitOB (p) ⇒ (vLM 0 = vLM )
R3. |= InvOB ∧ W 2OB (p) ⇒
if p = 0 then hWaitLM (0)ivLM
else if xOB (0) then vLM 0 = vLM
else hWaitLM (1)ivLM
R4. |= InvOB ∧ W 3OB (p) ⇒ (vLM 0 = vLM )
R5. |= InvOB ∧ W 4OB (p) ⇒ (vLM 0 = vLM )
R6. |= InvOB ∧ CsOB (p) ⇒ hCsLM (p)ivLM
R7. |= InvOB ∧ ExitOB (p) ⇒ hExitLM (p)ivLM
Assertion R3 is equivalent to these three assertions:
R3a. |= InvOB ∧ W 2OB (0) ⇒ hWaitLM (0)ivLM
R3b. |= InvOB ∧ W 2OB (1) ∧ xOB (0) ⇒ (vLM 0 = vLM )
R3c. |= InvOB ∧ W 2OB (1) ∧ ¬xOB (0) ⇒ hWaitLM (1)ivLM
All these assertions are proved by expanding the definitions of the actions
and of the refinement mapping. To see how this works, we consider R3a.
We haven’t written the definitions of the actions corresponding to the pseu-
docode statements of algorithms OB and LM . The definitions of W 2OB (0)
and WaitLM (0) as well as the other relevant definitions are in Figure 6.2.
Here is the proof of R3a.
definition of W 2OB (0) and InvOB (which implies pcOB is a function with
domain {0, 1}) imply pcOB 0 (0) = cs. Hence semBarOB 0 = 0, so semLM 0 = 0.
4. Q.E.D.
Proof: Steps 2 and 3 and the definition of WaitLM (0) imply WaitLM (0).
Step 3 implies semLM 0 6= semLM which implies vLM
0 6= v , proving the goal
LM
hWaitLM (0)ivLM introduced by step 1.
How we decomposed the proof that OBSafe implies LMSafe into proving
R1–R7 was determined by the structure of NextOB as a disjunction of seven
subactions and knowing which disjuncts of NextLM each of those subactions
implements, which followed directly from the definition of the refinement
mapping. The decomposition of R3 into R3a–R3c followed from the struc-
ture of R3. As illustrated by the proof of R3a, the proof of each of the
resulting nine formulas is reduced to ordinary mathematical reasoning by
expanding the appropriate definitions. The only place where not under-
standing the algorithms could result in an error is in the definition of the in-
variant InvOB or of the refinement mapping. Catching such an error requires
CHAPTER 6. REFINEMENT 152
only careful reasoning about simple set theory and a tiny bit of arithmetic,
using elementary logic. Someday, computers should be very good at such
reasoning.
We prove (6.15) by finding an action BOB and state predicates POB and QOB
satisfying the following conditions:
To show that these conditions imply (6.15), we have to show that they imply
that in any behavior σ satisfying OBB , if 2 EhALM ivLM is true of σ +m , then
σ(n) → σ(n + 1) is an hALM ivLM step for some n ≥ m. Condition A1.1
implies 2QOB is true of σ +m , which by A1.2 implies 2POB is true of σ +q for
some q ≥ m. By the definition of WF, conditions A2 imply σ(n) → σ(n + 1)
is a hBOB ivOB step for some n ≥ q, and A3 implies that hBOB ivOB step is
an hALM ivLM step.
CHAPTER 6. REFINEMENT 153
ExitLM (p) pcOB (p) = exit ExitOB (p) pcOB (p) = exit
Figure 6.3: Formulas BOB , POB , and QOB for the actions ALM , with p ∈ {0, 1}.
The formulas BOB , POB , and QOB used for the six actions ALM are shown
in Figure 6.3. Condition A2.1 for the actions ALM follows easily from the
definitions of BOB and POB . To show that A2.2 is satisfied, we apply Theo-
rem 4.8 to write OBFair as the conjunction of weak fairness of the actions
described by each process’s statements other than its ncs statement. That
A3 is satisfied for the four actions ALM in Figure 6.3 follows from conditions
R3a, R3c, R6, and R7 of Section 6.4.2.
This leaves condition A1 for the actions. A1.1 is proved by using
the type correctness invariant implied by InvLM to show that EhALM ivLM
equals E(ALM ) , and then substituting pcBarOB for pcLM and semBarOB
for semLM in E(ALM ) . For our example, this actually shows that InvLM
implies EhALM ivLM ≡ QOB for all the actions ALM . A1.2 is trivially satisfied
for CSLM (p) and ExitLM (p), since QOB and POB are equal. The interesting
conditions are A1.2 for WaitLM (0) and WaitLM (1). They are the kind of
leads-to property we saw how to prove in Section 4.2.5. In fact, we now
obtain a proof of A1.2 for WaitLM (0) by a simple modification of the proof
in Section 4.2.5.3 that OB implies:
Let’s drop the subscript OB , so the variables in any formula whose name
has no subscript are the variables of OB . The proof of (6.16) is described
by the proof lattice of Figures 4.4 and 4.5. A 2 formula in a label on a
box in a proof lattice means that the formula is conjoined to each formula
inside the box. Since F ; G implies (2H ∧ F ) ; (2H ∧ G) for any F ,
G, and H , we obtain a valid proof lattice (one whose leads-to assertions are
all true) by conjoining 2InvLM ∧ OBFair ∧ 2Q to the labels of the outer
CHAPTER 6. REFINEMENT 154
boxes in the lattices of Figures 4.4 and 4.5. This makes those labels equal
to OBB ∧ 2Q . Since Q implies pc(0) ∈ {wait, w 2} , we obtain a valid
proof lattice by replacing the source node of the lattice in Figure 4.4 by
2Q. Moreover, since the new label’s conjunct 2Q implies 2(pc(0) 6= cs),
so it’s impossible for pc(0) ever to equal cs, we can remove the sink node
pc(0) = cs and the edges to and from it from the lattice of Figure 4.5.2
Since the label on the inner box containing 2¬x (1) , which is the new sink
node, implies 2(pc(0) = w 2) , we now have a valid proof lattice that shows:
1. 2Q ⇒ 2¬x (0)
1.1. 2Q ⇒ 2(pc(0) ∈ / {wait, w 2})
Proof: We proved in Section 4.2.5.3 that pc(0) ∈ {wait, w 2} leads to
pc(0) = cs, and 2Q implies 2(pc(0) 6= cs).
1.2. 2Q ∧ 2(pc(0) ∈ / {wait, w 2}) ⇒ 2(pc(0) = ncs)
Proof: Q implies pc(0) ∈/ {cs, exit}, which by Inv and pc(0) ∈
/ {wait, w 2}
implies pc(0) = ncs.
1.3. Q.E.D.
Proof: By steps 1.1 and 1.2, since Inv ∧ Q imply pc(0) = ncs, and Inv
and pc(0) = ncs imply ¬x (0).
2. 2Q ∧ 2¬x (0) ; 2P
2.1. 2Q ∧ 2¬x (0) ; (pc(1) = w 2)
Proof: Q implies pc(1) ∈ {wait, w 2, w 3, w 4}, and a straightforward
proof using fairness of PNext(1) and 2¬x (0) shows
(pc(1) ∈ {wait, w 2, w 3, w 4}) ; (pc(1) = w 2)
2.2. 2Q ∧ 2¬x (0) ∧ (pc(1) = w 2) ⇒ 2(pc(1) = w 2)
Proof: 2Q implies 2(pc(1) 6= cs), and (pc(1) = w 2) ∧ 2[Next]v ∧
2(pc(1) 6= cs) implies 2(pc(1) = w 2).
2
Equivalently, we can remove edge 8 and add an edge from pc(0) = cs to false and
an edge from false to 2¬x (1), since false implies anything.
CHAPTER 6. REFINEMENT 155
2.3. Q.E.D.
Proof: Steps 2.1 and 2.2 imply 2Q ∧ 2¬x (0) ; 2(pc(1) = w 2), and
2P equals 2(pc(1) = w 2) ∧ 2¬x (0).
3. Q.E.D.
Proof: By steps 1 and 2.
equals
(y − z ) = x + (sym awith x 0 ← y − z )
∆
√
If sym = 2 ∗ x 0 , then this equals
q
(y − z ) = x + 2 ∗ (y − z )
Now let A be an action and let x 1 , . . . , x n be all the variables that
appear in A. We can then write E2 as:
(6.17) E(A) = ∃ c 1 , . . . , c n : (A awith x 0 1 ← c 1 , . . . , x 0 n ← c n )
∆
6.4.4.2 Computing E
The syntactic definition (6.17) of E immediately provides rules for writing
E(A) in terms of formulas E(B i ), for B i subactions of A. From the rule
|= (∃ c : A ∨ B ) ≡ (∃ c : A) ∨ (∃ c : B )
we have
E1. |= E(A ∨ B ) ≡ E(A) ∨ E(B )
For example, in program LM defined in Section 4.2.6.1, the next-state action
PNext(p) is the disjunction of actions Ncs(p), Wait(p), Cs(p), and Exit(p).
Therefore, rule E1 implies
E(PNext(p)) ≡ E(Ncs(p)) ∨ E(Wait(p)) ∨ E(Cs(p)) ∨ E(Exit(p))
The generalization of E1 is:
E2. |= E(∃ i ∈ S : Ai ) ≡ ∃ i ∈ S : E(Ai )
where S is a constant or state expression.
Another rule of existential quantification is that if the constant c does
not occur in A, then ∃ c : (A ∧ B ) is equivalent to A ∧ (∃ c : B ). From this
we deduce:
E3. If no variable appears primed in both A and B , then |= E(A ∧ B ) ≡
E(A) ∧ E(B ).
CHAPTER 6. REFINEMENT 157
The first asserts that substitution distributes over ∨; the second asserts
that substitution distributes over 2; and the third asserts that substitution
distributes over the construct [. . .]... .
We expect substitution to distribute in this way over all mathematical
operators, so we would expect E(A) and E(A) to be equal for any action A.
In fact, they are equal for most actions encountered in practice. But here’s
an action A for which they aren’t for the refinement mapping of (6.19):
∆
A = ∧ pc 0 = (p ∈ {0, 1} 7→ wait)
∧ sem 0 = 0
Rules E3 and E5 imply that E(A) equals true, so E(A) equals true. By
definition of the refinement mapping:
∆
A = ∧ pcBar 0 = (p ∈ {0, 1} 7→ wait)
∧ semBar 0 = 0
A implies pcBar 0 (p) = wait for p ∈ {0, 1}. By definition of pcBar , this
implies:
Both (1) and (2) can’t be true, so A must equal false and thus E(A)
equals false. Therefore, E(A) does not equal E(A), so substitution does
not always distribute over E.
The reason substitution doesn’t distribute over E is that E(A) performs
the substitutions pc ← pcBar and sem ← semBar for the primed variables
pc 0 and sem 0 . However, as we see from (6.17), those primed variables do not
occur in E(A); they are replaced by bound constants. The substitutions
should be performed only on the unprimed variables. Therefore:
Instead, it equals
6.5 A Warning
We have defined correctness of a program S to mean |= S ⇒ P for some
property P . We have to be careful to make sure that we have chosen P so
that this implies what we really want correctness of the program to mean.
As discussed in Section 5.1, we have to be concerned with the accuracy of P .
When correctness asserts that S refines a program T , the property P
is T with . . . for a refinement mapping “. . .”. That refinement mapping
is as important a part of the property as the program T , and it must be
examined just as carefully to be sure that proving refinement means what
you want it to. As an extreme example, OB also implements LM under this
refinement mapping:
occur. In such a case, it’s a good idea to make sure that S refines T when
fairness requirements are added to those actions in both programs. This is
an application of the general idea of adding fairness to verify possibility that
was introduced in Section 5.1.2.
Chapter 7
Auxiliary Variables
163
CHAPTER 7. AUXILIARY VARIABLES 164
7.1.1 Introduction
Recall the behavior predicate F12 , discussed in Section 4.1.2, that is true of
a behavior iff the value of x can equal 2 in a state only if x equaled 1 in a
previous state. We gave a semantic definition of F12 ; it can’t be expressed
in RTLA or TLA as those languages have been defined so far. We observed
that F12 can be expressed as the abstract program S 12 , defined in (4.2), by
introducing an additional variable y.
The variable x that we’re interested in is called an interface variable.
The variable y that’s used only to describe how the values of x can change
is called an internal variable. There’s a problem with using the internal
variable y to describe F12 . Consider the abstract program S x that starts
with x = 0 and can keep incrementing x by one:
S x = (x = 0) ∧ 2[x 0 = x + 1]x
∆
CHAPTER 7. AUXILIARY VARIABLES 165
|= F ⇒ G implies |= (∃ y 1 , . . . , y k : F ) ⇒ G
if no variable y i occurs free in G.
mapping. If S has the form Init ∧ 2[Next]v ∧ L, then the answer is yes if
we’re allowed to add auxiliary variables to T . Adding an auxiliary variable
a (which does not occur in T ) to T means writing a formula T a such that
∃ a : T a is equivalent to T . By this equivalence, we can verify |= T ⇒ S by
verifying |= (∃ a : T a ) ⇒ S . By the ∃ Elimination rule, we do this by veri-
fying |= T a ⇒ S . And to verify this, we can use a as well as the variables of
T to define the refinement mapping. Auxiliary variables are the main topic
of this chapter and are discussed after the definition of ∃ .
Therefore, σ ' y τ asserts that behaviors σ and τ are the same except for
the values assigned to y by their states. We then define ∃ rtla y : F to be
satisfied by a behavior σ iff it is satisfied by some behavior τ with σ 'y τ .
The operator ∃ rtla is not a suitable hiding operator for properties, and
hence not suitable for TLA, because the formula ∃ rtla y : F need not be SI,
and thus not a property, even if F is. For example, let F be the following
formula, where br c is the largest integer less than or equal to r :
where Init h and Next h are obtained by augmenting Init and Next to de-
scribe, respectively, the initial value of h and how h can change; and vh is
the tuple v ◦ hh i of the variables of v and the variable h. Since h does not
appear in L, the formula ∃ h : T h equals
∆
InitS = (inp = rdy) ∧ (avg = 0) ∧ seq = h i
∆
User = ∧ inp = rdy
∧ inp 0 ∈ R
∧ (avg 0 = avg) ∧ (seq 0 = seq)
∆
Syst = ∧ inp ∈ R
∧ seq 0 = Append (seq, inp)
∧ avg 0 = SeqSum(seq 0 ) / Len(seq 0 )
∧ inp 0 = rdy
∆
NextS = User ∨ Syst
IS = InitS ∧ 2[NextS ]hinp,avg,seq i
∆
∆
S = ∃ seq : IS
Using the internal variable seq to write the behavior predicate S is arguably
the clearest way to describe the values assumed by the interface variables inp
and avg. It’s a natural way to explain that the value of avg is the average of
the values that have been input. However, it’s not a good way to describe
how to implement the system. There’s no need for an implementation to re-
member the entire sequence of past inputs; it can just remember the number
of inputs and their sum. In fact, it doesn’t even need an internal variable to
remember the sum. We can implement it with an abstract program T that
implements S using only a single internal variable num whose value is the
number of inputs that the user has entered.
We first describe T in pseudocode and construct T h by adding a history
variable h to the code. The TLA translations of the pseudocode show how
to add a history variable to an abstract program described in TLA.
CHAPTER 7. AUXILIARY VARIABLES 171
It’s natural to think of the user and the system in this example as two
separate processes. However, the abstract programs S and T are predicates
on behaviors, which are mathematical objects. Process is not a mathemat-
ical concept; it’s a way in which we interpret predicates on behaviors. For
simplicity, we write T as a single-process program.
The pseudocode for program T is in Figure 7.2. It uses the operator :∈
introduced in Figure 5.2, so statement usr sets inp to an arbitrary element
of R. Since we’re not concerned with implementing T , there’s no reason to
hide its internal variable num.
Because the sum of n numbers whose average is a is n ∗ a, it should be
clear that program T implements program S . But showing that T imple-
ments S requires defining a refinement mapping under which T implements
IS (program S without variable seq hidden). And that requires adding an
auxiliary variable that records the sequence of input values. Adding the
required auxiliary variable h is simple and obvious. We just add the two
pieces of code shown in black in Figure 7.3.
It is a straightforward exercise to prove
T = Init ∧ 2[Next]hinp,avg,num i
∆ ∆
where Next = Usr ∨ Sys
Actions Usr and Sys are the actions executed from control points usr and
sys, respectively. The TLA translation of the code in Figure 7.3 is
∆
where Init h = Init ∧ (h = h i)
∆
Next h = Usr h ∨ Sys h
∆
Usr h = Usr ∧ (h 0 = h)
∆
Sys h = Sys ∧ (h 0 = Append (h, inp))
Here is the general result that describes how to add a history variable to a
program. Its proof is a simple generalization of the proof for our example.
newpage added
to make
hyperlink to
theorem work
CHAPTER 7. AUXILIARY VARIABLES 173
• exp is a state expression that does not contain the variable h, and the
exp i are step expressions that do not contain h 0 ,
then |= T ≡ ∃ h : T h .
Theorem 7.3 Let T equal Init ∧ 2[Next]hxi where x is the list of all vari-
ables of S ; let F be a safety property such that F (σ) depends only on the
values of the variables x in σ, for any behavior σ; and let h be a variable not
one of the variables x. We can add h as a history variable to T to obtain
T h and define a state predicate IF in terms of F such that |= [[T ]] ⇒ F is
true iff IF is an invariant of T h .
A simple example of the theorem is when F is the safety property F12 defined
semantically by (4.1) of Section 4.1.2. That property asserts x must equal 1
CHAPTER 7. AUXILIARY VARIABLES 175
before it can equal 2. A program Init ∧ 2[Next]v satisfies F 12 iff the formula
(x = 2) ⇒ h is an invariant of the program obtained by adding the history
variable h to that program as follows:
Theorem 7.3 assumes only that F is a safety property. This might suggest
we can show that one program satisfies the safety part of another program by
verifying an invariance property. However, I have never seen this done, and
in practice it seems unlikely to be possible to describe any but the simplest
abstract programs with an invariant.
Math X
Case Proof Steps A common proof method is case splitting—for exam-
ple, splitting the proof of a formula containing a number x into proving it
CHAPTER 7. AUXILIARY VARIABLES 176
first if x ≥ 0 and then if x < 0. This is done with Case statements, where
if G is the current goal, then Case: F is an abbreviation of F ⇒ G . A
proof by case splitting usually ends with a sequence of Case steps followed
by a Q.E.D. step showing that those steps cover all possible cases.
∆
Cen1 = ∃ aw : ICen1
ICen1 = Init ∧ 2[Next1]v
∆
∆
v = hinp, disp, aw i
∆
Init = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}
∆
Next1 = Input ∨ DispOrNot ∨ Ack
∆
Input = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ aw 0 = hinp 0 i
∧ disp 0 = disp
∆
DispOrNot = ∧ aw 6= h i
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = h i
∧ inp 0 = inp
∆
Ack = ∧ (inp ∈ Art) ∧ (aw = h i)
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 7.4: The program Cen1.
∆
Cen2 = ∃ aw : ICen2
ICen2 = Init ∧ 2[Next2]v
∆
∆
Next2 = InpOrNot ∨ Display ∨ Ack
∆
InpOrNot = ∧ (inp = NotArt) ∧ (aw = h i)
∧ inp 0 ∈ Art
∧ ∨ aw 0 = hinp 0 i
∨ aw 0 = aw
∧ disp 0 = disp
∆
Display = ∧ aw 6= h i
∧ disp 0 = haw (1), 1 − disp(2)i
∧ aw 0 = h i
∧ inp 0 = inp
Figure 7.5: The program Cen2.
ping. We will see here how to define the refinement mapping under which
ICen2 implements ICen1. Section 7.4 shows how to define the refinement
mapping under which ICen1 implements ICen2.
∆
S3. As = ∨ (s = 0) ∧ A ∧ (s 0 = exp)
∨ (s > 0) ∧ (v 0 = v ) ∧ (s 0 = s − 1)
where exp is an expression whose value is a natural number; it can
contain the original variables primed or unprimed.
∆
S4. Bjs = (s = 0) ∧ B j ∧ (s 0 = 0), for j ∈ J .
Ignoring the value of s, the behaviors satisfying T s are the same as behaviors
satisfying T , except each A step in a behavior of T is followed in T s by a
finite number (possibly 0) of steps that leave the variables of T unchanged.
Therefore, by stuttering insensitivity, T and ∃ s : T s are satisfied by the
same sets of behaviors, so they are equivalent.
To show that ICen2 implements ICen1, we define ICen2s in this way,
where A equals InpOrNot and the B i are Ack and Display. In the definition
of InpOrNot s , we let:
∆
exp = if aw 0 = h i then 1 else 0
The proof of (7.11) is similar to, but simpler than, the refinement proof
sketched in Section 6.4.2. Here, we give only the briefest outline of a proof
to present results that will be used below when discussing liveness.
Let’s abbreviate (F with aw ← awBar ) by F for any formula F , so we
must prove |= ICen2s ⇒ ICen1. The proof of |= Init s ⇒ Init is trivial, since
CHAPTER 7. AUXILIARY VARIABLES 180
then ∃ s : T s equals T .
The theorem does not assume that the actions Ai and B are mutually dis-
joint. A step could be both an Ai and an Aj step for i 6= j , or both an Ai
and a B step. That should rarely be the case when applying the theorem,
since it allows a nondeterministic choice of how many stuttering steps (if
any) are added in some states. The action B will usually be the disjunc-
tion of actions B j . In that case, B s equals the disjunction of the actions
(s(0) = 0) ∧ B j ∧ (s 0 = s).
CHAPTER 7. AUXILIARY VARIABLES 182
the step rejects the input, then it sets s to 1, in which case the only enabled
action of Next2s is (s = 1) ∧ InpOrNot s ; and L2 asserts no fairness condition
for that action. To show that the (s = 0) ∧ InpOrNot s step must be followed
by an Ack s step, we first show as follows that ∃ s : IC 2s is equivalent to IC 2:
∃ s : IC 2s ≡ ∃ s : ICen2 ∧ L2 By definition of IC 2s .
≡ (∃ s : ICen2s ) ∧ L2 Because s does not occur in L2.
≡ ICen2 ∧ L2 By Theorem 7.4.
≡ IC 2 By definition of IC 2s .
step. By C3, this Display s step is a hDispOrNotiv step, which implies the
goal introduced by step 1.
5. Case: 2(s 6= 0)
Proof: The case assumption and the assumption 2Inv 2 imply 2(s = 1).
As shown above in the explanation of why a behavior of IC 2s can’t halt
in a state with s = 1, the property WFv (Ack ) implies that, in such a
state, an (s = 1) ∧ InpOrNot s step must eventually occur. By C2, that is
the hDispOrNotiv step that proves the step 1 goal.
6. Q.E.D.
Proof: Step 3 implies that the step 4 and 5 cases are exhaustive.
The proof of (7.14b) is similar but simpler, since it doesn’t have the
complication of deducing from fairness of one action (Ack s ) that a step of
another action (DispOrNot s ) of the same program must occur.
Theorem 7.2 shows how, after adding a history variable to a program,
we can rewrite the program’s fairness properties as fairness conditions of
subactions of the modified program’s next-state action. I don’t know if
there is a similar result for stuttering variables. Theorem 7.2 is relevant
to methods other than TLA for describing abstract programs. Those other
methods that I’m aware of do not assume stuttering insensitivity, so a similar
result for stuttering variables seems to be of no interest.
Two Set Operators If you’ve ever learned about sets, you should know
that S ∪ T is the set of values that are in the set S or the set T (or both),
and S ∩ T is the set of values that are in both S and T . We can define ∩
with the subsetting constructor, since S ∩ T equals {v ∈ S : (v ∈ T )}.1 It is
an axiom of ZF that S ∪ T is a set if S and T are sets.
However, this is impossible for the following reason. Because the refinement
mapping substitutes the variables inp and disp of ICen1 for the correspond-
ing variables of ICen2, an Input step of ICen1 must implement an InpOrNot
step of ICen2. Besides choosing the input, the InpOrNot action of ICen2
also decides whether or not that input is to be displayed, recording its de-
cision in the value of aw . However, that decision is made by ICen1 later,
when executing the DispOrNot action. Immediately after the Input action,
there’s no information in the state of ICen1 to determine what the value of
variable aw of ICen2 should be.
The solution to this problem is to have the Input action guess what
DispOrNot will do, indicating its guess by setting a prophecy variable p to
1
The parentheses disambiguate this expression, telling us that v ∈ T is a formula while
v ∈ S is syntax.
CHAPTER 7. AUXILIARY VARIABLES 189
a value that predicts whether the input will be displayed or rejected by the
DispOrNot step.
To make the generalization from this example more obvious, let’s write
action DispOrNot of ICen1 as the disjunction of two actions: DorN Yes that
displays the input and DorN No that doesn’t. Remember that:
∆
DispOrNot = ∧ . . .
∧ ∨ disp 0 = haw (1), 1 − disp(2)i
∨ disp 0 = disp
..
.
We can define DorN i , for i = Yes and i = No, by modifying the definition
of DispOrNot to get:
∆
DorN i = ∧ . . .
∧ ∨ (i = Yes) ∧ (disp 0 = haw (1), 1 − disp(2)i)
∨ (i = No) ∧ (disp 0 = disp)
..
.
We then replace DispOrNot in ICen1 by ∃ i ∈ Π : DorN i , where Π equals
{Yes, No}. We can then add to ICen2 an auxiliary variable p called a
prophecy variable to obtain a formula ICen2p in which the Input action is
replaced by
Input p = Input ∧ (p 0 ∈ Π)
∆
Thus the Input p action predicts what the DispOrNot action will do, and
DispOrNot p is modified to ensure that the prediction comes true. To com-
plete the definition of ICen1p , we can let Init p equal Init and Ack p equal
Ack , since the value of p matters only after an Input p step and before the
following DispOrNot p step.
In ICen2, the value of aw is h i except after an InpOrNot step that chose
to display the input. This implies
∆
vp = v ◦ hp i ,
∆
Init p = Init ∧ (p ∈ Π) ,
∆
Next p = (Ap ∧ (p 0 ∈ Π)) ∨ (∃ j ∈ J : B j ∧ C j ) ,
∆
CenSeq1 = ∃ aw : ICenSeq1
ICenSeq1 = Init ∧ 2[NextSeq1]v
∆
∆
v = hinp, disp, aw i
∆
InitSeq = ∧ inp = NotArt
∧ aw = h i
∧ disp ∈ Art × {0, 1}
∆
NextSeq1 = InputSeq ∨ DispOrNotSeq ∨ AckSeq
∆
InputSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ aw 0 = Append (aw , inp 0 )
∧ disp 0 = disp
∆
DispOrNotSeq = ∧ aw 6= h i
∧ ∨ disp 0 = haw [1], 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = Tail (aw )
∧ inp 0 = inp
∆
AckSeq = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp)
Figure 7.6: The program CenSeq1.
∆
CenSeq2 = ∃ aw : ICenSeq2
ICenSeq2 = InitSeq ∧ 2[NextSeq2]v
∆
∆
NextSeq2 = InpOrNotSeq ∨ DisplaySeq ∨ AckSeq
∆
InpOrNotSeq = ∧ inp = NotArt
∧ inp 0 ∈ Art
∧ ∨ aw 0 = Append (aw , inp 0 )
∨ aw 0 = aw
∧ disp 0 = disp
∆
DisplaySeq = ∧ aw 6= h i
∧ disp 0 = haw [1], 1 − disp(2)i
∧ aw 0 = Tail (aw )
∧ inp 0 = inp
Figure 7.7: The program CenSeq2.
CHAPTER 7. AUXILIARY VARIABLES 196
Let Π be the set {Yes, No} of predictions. The value of p should always
be a sequence of elements of Π having the same length as the value of the
variable aw of ICenSeq1. The initial predicate of ICenSeq1p is:
∆
InitSeq p = InitSeq ∧ (p = h i)
where DorNSeq Yes displays the input and DorNSeq No rejects it. The defi-
nition of DorNSeq i is obtained by modifying DispOrNotSeq the same way
we modified DispOrNot to obtain DorN i for ICen1. We can then define:
Note that having DispOrNotSeq p set p 0 to Tail (p) ensures that every pre-
diction is used only once. Since AckSeq p neither makes nor satisfies a pre-
diction, we define:
AckSeq p = AckSeq ∧ (p 0 = p)
∆
where
∆
NextSeq1p = InputSeq p ∨ DispOrNotSeq p ∨ AckSeq p
of two arguments with domain the set of pairs hwsq, ysq i where wsq is a
sequence of elements of Art, ysq is a sequence of Yes or No values, and
Len(wsq) = Len(ysq). (Remember that a function of two arguments was
defined in Section 2.8.3 to be a function of one argument whose domain is a
set of pairs.) The definition is a recursive one, justified by the well-founded
relation where hwsq 1 , ysq 1 i hwsq 2 , ysq 2 i iff the length of sequences wsq 1
and ysq 1 is greater than the length of wsq 2 and ysq 2 . Since we haven’t
bothered to define a convenient syntax for writing recursive definitions of
functions of two arguments, the definition is written somewhat informally
as:
∆
OnlyYes(wsq, ysq) =
if wsq = h i then h i
else ( if Head (ysq) = Yes then hHead (wsq)i
else h i )
◦ OnlyYes(Tail (wsq), Tail (ysq))
Defining awBar to equal OnlyYes(aw , p) makes (7.16) true.
It’s straightforward to modify Theorem 7.5 to describe an arbitrary
prophecy variable p that makes a sequence of predictions. We replace the
definition of Next p in the hypothesis of the theorem by:
∆
Next p = (Ap(1) ∧ D) ∨ (∃ j ∈ J : B j ∧ C j ) , where
D equals p 0 = Tail (p) or ∃ i ∈ Π : p 0 = Append (Tail (p), i )
C j equals p 0 = p or ∃ i ∈ Π : p 0 = Append (p, i )
However, there’s one problem: The empty sequence h i is the value of p
indicating that no prediction is being made. When p = h i, the value of the
subscript p(1) = i in this definition is undefined. That doesn’t matter in
our example because p and aw are sequences of the same length, so p = h i
implies aw = h i, which implies that DorNSeq i equals false for i ∈ Π.
Therefore, the value of the undefined subformula makes no difference. In
general, to make the modified theorem valid, we need to add to its hypothesis
the requirement that the following is an invariant of T p :
(p = h i) ⇒ ¬ E (∃ i ∈ Π : Ai )
The InputSet p action must add a prediction of whether or not the picture
inp 0 that it adds to aw will be displayed. Thus, it must assert that p 0 is
the function obtained from p by adding inp 0 to its domain and letting the
value of p 0 (inp 0 ) be either element in Π. To write that action, let’s define
FcnPlus(f , w , d ) to be the function obtained from a function f by adding an
element w to its domain and letting that function map w to d . The domain
CHAPTER 7. AUXILIARY VARIABLES 199
∆
CenSet1 = ∃ aw : ICenSet1
ICenSet1 = Init ∧ 2[NextSet1]v
∆
∆
v = hinp, disp, aw , old i
∆
InitSet = ∧ inp = NotArt
∧ aw = { }
∧ disp ∈ Art × {0, 1}
∧ old = { }
∆
NextSet1 = InputSet ∨ DispOrNotSet ∨ AckSet
∆
InputSet = ∧ inp = NotArt
∧ inp 0 ∈ Art \ old
∧ aw 0 = aw ∪ {inp 0 }
∧ (disp 0 = disp) ∧ (old 0 = old ∪ {inp 0 })
∆
DispOrNotSet = ∃ w ∈ aw :
∧ ∨ disp 0 = hw , 1 − disp(2)i
∨ disp 0 = disp
∧ aw 0 = aw \ {w }
∧ (inp 0 = inp) ∧ (old 0 = old )
∆
AckSet = ∧ inp ∈ Art
∧ inp 0 = NotArt
∧ (aw 0 = aw ) ∧ (disp 0 = disp) ∧ (old 0 = old )
Figure 7.8: The program CenSet1.
AckSet p = AckSet ∧ (p 0 = p)
∆
• No two active prophecies can be predictions for the same action set.
The prophecy variables of CenSeq1 and CenSet1 made only predictions that
were likely to be fulfilled. We could instead have used prophecy variables
that a mathematician might consider simpler that make a lot more predic-
tions. For CenSeq1, instead of having each InputSeq step add a prediction
to p, we could have let the initial value of p be an infinite sequence of predic-
tions. The first element of the sequence would be the active one, and each
DispOrNotSeq action would remove that element from p. For CenSet1, we
could have let the initial value of p be any element of Art → {Yes, No} ,
predicting for each picture w whether or not it will be displayed if it is in-
put. Since the same picture can’t be input twice, the value of p(inp) could
be set by a DispOrNotSet action to a value indicating that its prediction is
inactive.
We could have used an even more extravagant prophecy variable for
CenSet1—one that predicts not only whether each picture will be displayed
or rejected, but in which order they will be input. The initial value of
p would be an infinite sequence of predictions hw , d i, for w ∈ Art and
CHAPTER 7. AUXILIARY VARIABLES 202
d ∈ {Yes, No}, predicting not just if the next DispOrNotSet step will display
or reject the input, but that it must occur with inp equal to w . Almost all
of those predictions will be impossible to fulfill because inp will not equal
w . But as we’ve seen, impossible predictions don’t matter because they
just require the behavior to halt, which is either allowed or is ruled out by
a liveness hypothesis. This may seem silly, but a prophecy variable that
makes predictions that are almost all impossible is used in the example of
Section 7.6 because it seems to provide the simplest way to define the needed
refinement mapping.
Theorem 7.6 Let x, y, and z be lists of variables, all distinct from one
another; let the variables of T be x and z and the variables of IS be x and
y; and let T equal Init ∧ 2[Next]hx,zi ∧ L. Let the operator Φ map behaviors
satisfying T to behaviors satisfying IS such that Φ(σ) ∼y σ. By adding
history, stuttering, and prophecy variables to T , we can define a formula
CHAPTER 7. AUXILIARY VARIABLES 203
A simple example of such an object is a first in, first out queue, called
a fifo. We can think of the state of a fifo as an ordinal sequence queue
of elements from some set Data. A fifo provides two methods, usually
described as follows.
enqueue Takes an element of Data as an argument, and appends it to the
end of queue. It returns no value.2
example is to illustrate the use of auxiliary variables, which are added only
to the safety property of a program, we consider only the safety property of
a fifo.
We assume there is a set EnQers of processes that perform enqueue
operations. Execution of an enqueue operation by process e consists of
three steps: a BeginEnq(e) step that describes the call of the method, a
DoEnq(e) step that modifies the variable queue, and an EndEnq(e) step that
describes the return. The enqueuers communicate with the object through
the interface variable enq, whose value is a function with domain EnQers.
The value of enq(e) equals Done when enqueuer e is not performing an
enqueue operation, and it equals the data value it is appending to queue
when e is performing the operation, where Done is some constant not in
Data. There is also an internal variable enqInner , where enqInner (e) is set
to Busy by the BeginEnq(e) action and is set to Done by the DoEnq(e)
action.
Similarly, there is a set DeQers of dequeuer processes, each d ∈ DeQers
performing BeginDeq(d ), DoDeq(d ), and EndDeq(d ) steps. Dequeuers com-
municate with the object through the interface variable deq, where deq(d ) is
set to Busy by the BeginDeq(d ) action and to the value that was dequeued
by the EndDeq(d ) action. There is an internal variable deqInner , where
deqInner (d ) is set to Busy by the BeginDeq(d ) action and set by DoDeq(d )
to the value dequeued by the dequeue operation. The complete definition of
the abstract program is formula Fifo in Figure 7.9. It uses the unchanged
operator, where unchanged exp equals exp 0 = exp. Thus, if v is a tuple
hv 1 , . . . , v n i of variables, then unchanged v asserts that v 0i = v i for all i
in 1 . . n.
∆
Fifo = ∃ queue, enqInner , deqInner : IFifo
IFifo = Init ∧ 2[Next]v
∆
∆
v = henq, deq, queue, enqInner , deqInner i
∆
Init = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ queue = h i
∧ enqInner = (e ∈ EnQers 7→ Done)
∧ deqInner = deq
∆
Next = ∨ ∃ e ∈ EnQers : BeginEnq(e) ∨ DoEnq(e) ∨ EndEnq(e)
∨ ∃ d ∈ DeQers : BeginDeq(d ) ∨ DoDeq(d ) ∨ EndDeq(d )
∆
BeginEnq(e) = ∧ enq(e) = Done
∧ ∃ D ∈ Data : enq 0 = (enq except e 7→ D)
∧ enqInner 0 = (enqInner except e 7→ Busy)
∧ unchanged hdeq, queue, deqInner i
∆
DoEnq(e) = ∧ enqInner (e) = Busy
∧ queue 0 = Append (queue, enq(e))
∧ enqInner 0 = (enqInner except e 7→ Done)
∧ unchanged hdeq, enq, deqInner i
∆
EndEnq(e) = ∧ enq(e) 6= Done
∧ enqInner (e) = Done
∧ enq 0 = (enq except e 7→ Done)
∧ unchanged hdeq, queue, enqInner , deqInner i
∆
BeginDeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ deqInner 0 = (deqInner except d 7→ NoData)
∧ unchanged henq, queue, enqInner i
∆
DoDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) = NoData
∧ queue 6= h i
∧ deqInner 0 = (deqInner except d 7→ Head (queue))
∧ queue 0 = Tail (queue)
∧ unchanged henq, deq, enqInner i
∆
EndDeq(d ) = ∧ deq(d ) = Busy
∧ deqInner (d ) 6= NoData
∧ deq 0 = (deq except d 7→ deqInner (d ))
∧ unchanged henq, queue, enqInner , deqInner i
Figure 7.9: The program Fifo.
CHAPTER 7. AUXILIARY VARIABLES 207
Fifo, where queue is hidden, there is no way to know in which order the two
values appear in queue until that order is revealed by dequeue operations. In
their paper defining linearizability, Herlihy and Wing gave an algorithm that
implements a fifo in which, from a state immediately after both enqueue
operations have completed, it is possible for the two values to be dequeued in
either order by two successive non-concurrent dequeue operations. There is
no queue encoded in the algorithm. While their algorithm implements Fifo,
there is no refinement mapping under which it implements IFifo without the
addition of auxiliary variables. In particular, showing that their algorithm
implements Fifo requires adding a prophecy variable that predicts the order
in which data items enqueued by concurrent enqueue operations will be
dequeued.
What is encoded in the state of their algorithm is not a linearly ordered
queue of enqueued data values, but rather a partial order on the set of
enqueued values that indicates the possible orders in which the values can
be returned by dequeue operations. A partial order on a set S is a relation ≺
on S that is transitive and has no cycles (which implies a 6≺ a for any a ∈ S ).
For the partial ordering ≺ on the set of enqueued values, the relation u ≺ w
means that value u must be dequeued before value w . Program IFifo is the
special case in which that partial order is a total order, meaning that either
u ≺ w or w ≺ u for any two distinct enqueued values u and w .
Presented here is a program POFifo that is equivalent to Fifo, but which
is obtained by hiding internal variables in a program IPOFifo that main-
tains a partially ordered set of enqueued values rather than a queue. The
Herlihy-Wing algorithm can be shown to implement IPOFifo under a refine-
ment mapping defined in terms of its variables, without adding a prophecy
variable.
F1. Each dequeued value has been enqueued, and an enqueued value is
dequeued at most once.
The set beingAdded need not be a subset of elts because it can contain
datums that were removed from elts by dequeue operations before the op-
erations that enqueued them have completed.
The program POFifo is defined in Figure 7.10. Here are explanations of
the four disjuncts of the next-state action PONext.
∆
POFifo = ∃ elts, before, adding : IPOFifo
IPOFifo = POInit ∧ 2[PONext]POv
∆
∆
POv = henq, deq, elts, before, adding i
∆
POInit = ∧ enq = (e ∈ EnQers 7→ Done)
∧ deq ∈ (DeQers → Data)
∧ elts = {}
∧ before = {}
∧ adding = (e ∈ EnQers 7→ NonElt)
∆
PONext = ∨ ∃ e ∈ EnQers : BeginPOEnq(e) ∨ EndPOEnq(e)
∨ ∃ d ∈ DeQers : BeginPODeq(d ) ∨ EndPODeq(d )
∆
BeginPOEnq(e) =
∧ enq(e) = Done
∧ ∃ D ∈ Data : ∃ id ∈ {i ∈ Ids : hD, i i ∈ / (elts ∪ beingAdded )} :
0
∧ enq = (enq except e 7→ D)
∧ elts 0 = elts ∪ {hD, id i}
∧ before 0 = before ∪ {hel , hD, id ii : el ∈ (elts \ beingAdded )}
∧ adding 0 = (adding except e 7→ hD, id i)
∧ deq 0 = deq
∆
EndPOEnq(e) = ∧ enq(e) 6= Done
∧ enq 0 = (enq except e 7→ Done)
∧ adding 0 = (adding except e 7→ NonElt)
∧ unchanged hdeq, elts, before i
∆
BeginPODeq(d ) = ∧ deq(d ) 6= Busy
∧ deq 0 = (deq except d 7→ Busy)
∧ unchanged henq, elts, before, adding i
∆
EndPODeq(d ) = ∧ deq(d ) = Busy
∧ ∃ el ∈ elts :
∧ ∀ el 2 ∈ elts : ¬(el 2 ≺ el )
∧ elts 0 = elts \ {el }
∧ deq 0 = (deq except d 7→ el (1))
∧ before 0 = before ∩ (elts 0 × elts 0 )
∧ unchanged henq, adding i
EndPODeq(d ) Enabled when deq(d ) equals Busy and elts is not empty,
which implies that elts contains at least one minimal datum (a datum
not preceded in the ≺ relation by any other datum in elts), since the
datum in elts that was added first to elts must be a minimal datum.
The action chooses an arbitrary minimal datum el of elts, removes it
from elts, sets deq(d ) to its data value component, and modifies before
to remove all relations el ≺ el 2 for elements el 2 of elts.
The prediction made by the first item p(1) of the sequence p is the datum
that the next EndPODeq(d ) step will remove from elts. The datum p(1)
is removed by this step iff elts 0 = elts \ {p(1)} is true of the step. Since the
CHAPTER 7. AUXILIARY VARIABLES 213
Q3. For each i ∈ 1 . . Len(pg) and each datum u ∈ elts, if u ≺ pg(i ) then
u = pg(j ) for some j ∈ 1 . . (i − 1).
We have shown that pg must be a prefix of qBar . Our strategy for defining
IPOFifo pq by adding the history variable qBar to IPOFifo p is to keep qBar
equal to pg for as long as possible. To see how to do that, let’s see how pg
can change.
The sequence pg can become shorter only when an EndPODeq p step
occurs, in which case p is not the empty sequence and pg is a nonempty
prefix of p. The step removes the first element of p and pg, so p 0 = Tail (p),
pg 0 = Tail (pg), and qBar 0 = Tail (qBar ).
The sequence pg can be made longer by a BeginPOEnq p step as follows.
Suppose the step appends the prediction u to p and adds the datum w to
elts. The value of pg at the beginning of the step is a proper prefix of p ◦hu i.
If w equals the prediction in p ◦ hu i immediately after pg, then w will be
appended to pg iff doing so would not violate Q3. (We’ll see in a moment
when it would violate Q3.) If w can be appended to pg and the prediction
following w in p is already in elts, then it might be possible to append that
datum to pg as well. And so on. Thus, it’s possible for the BeginPOEnq p
step to append several datums to pg. If our strategy has been successful
thus far and qBar = pg at the beginning of the step, then a BeginPOEnq pq
step implies qBar 0 = pg 0 . This makes qBar a prefix of qBar 0 , as it should
be because stuttering steps to be added after a BeginPOEnq p step should
change queueBar only by appending datums to it.
There is one situation in which it is impossible for any further datum to
be appended to pg. One or more datums can be appended to pg only by
a BeginPOEnq p that adds a datum w to elts that can be appended to pg.
However if there is a datum u in elts that is neither in the sequence pg nor in
the set beingAdded , then adding w to elts also adds the relation u ≺ w . This
relation means that w can’t be appended to pg because that would violate
condition Q3. Thus, if there is a datum u in elts that is neither in pg nor
beingAdded , then no datums can be added to pg. Moreover, the datum u
CHAPTER 7. AUXILIARY VARIABLES 215
can never be removed from elts because it is not in pg and can never be in pg
because no more datums can be added to pg. (The datum u can’t be added
to beingAdded because a BeginPOEnq step can’t add a datum to elts that
is already in elts.) Let’s call a state in which there is a datum in elts that
is not in beingAdded or pg a blocked state. In a blocked state, datums can
be removed from the head of pg by EndPODeq p steps, but no new datums
can be appended to pg. So, if and when enough EndPODeq p steps have
occurred to remove all the datums from pg, then no more EndPODeq p steps
can occur. That means that any further dequeue operations that are begun
with a BeginPODeq p step must block, never able to complete.
Let’s consider the first step that caused a blocked state—that is, causing
there to be an element u in elts that is neither in pg nor beingAdded . Since
u was added to elts by a BeginPOEnq p step that put u in beingAdded , it
must be the EndPOEnq p step of the enqueue operation that added u to elts
that caused the blocked state by removing u from beingAdded . Until that
blocked state was reached, qBar equaled pg. However, since u has not been
dequeued, it must be in queueBar after that EndPOEnq p step because that
step must implement the EndEnq step of IFifo. Thus that EndPOEnq p step
must append u to qBar . Therefore, the first blocked state is the first state
in which qBar 6= pg. In that state, qBar equals pg ◦ hu i.
From that first blocked state on, no new datums can be added to pg, so
the datum u can never be dequeued. Therefore, whenever an EndPOEnq p
step occurs for an operation that enqueued a datum w , if w is in elts (so it
hasn’t been dequeued) and is not in pg, then that EndPOEnq p step must
append w to qBar .
To recapitulate, here is how we add the history variable qBar to IPOFifo p
to obtain the program IPOFifo pq . These rules imply that, at any point in the
behavior, qBar will equal pg ◦ eb where pg is the state function of IPOFifo p
defined above and eb is a sequence of datums in elts that are not in pg.
Initially, pg and eb equal h i.
These rules imply that a datum can never be removed from eb, so once eb
CHAPTER 7. AUXILIARY VARIABLES 216
Encoding in the value of the stuttering variable s for which of the three
cases the variable is being added, and in case 2 for which enqueuer e the
step is an EndPOEnq pq (e) step, allows the value of queueBar to be defined
in terms of the values of s, qBar , and (for case 2) adding.
We still have to define the state functions enqInnerBar and deqInnerBar
that are substituted for enqInner and deqInner in the refinement mapping
under which IPOFifo pqs implements IFifo. The value of enqInnerBar (e) for
an enqueuer e should equal Done except when adding(e) equals the datum
that e is enqueueing, and that datum is not yet in queueBar . This means
that enqInnerBar can be defined in terms of adding, queueBar , and s.
The value of deqInnerBar (d ) for a dequeuer d should equal the value of
deq(d ) except between when d has removed the first element of queueBar
(by executing the stuttering step added in case 1) and before the subse-
quent EndPODeq pqs (d ) step has occurred. In that case, deq(d ) should
equal qBar (1). It’s therefore easy to define deqInnerBar as a state func-
tion of IPOFifo pqs if the value of the stuttering variable s added in case 1
contains the value of d for which the following EndPODeq pqs (d ) step is to
be performed.
This completes the sketch of how auxiliary variables are added to IPOFifo
to define a refinement mapping under which it implies IFifo, showing that
POFifo refines Fifo. The intellectually challenging part was discovering how
to define qBar . It took me quite a bit of thinking to find the definition. This
was not surprising. The example of the fifo had been studied for at least
15 years before Herlihy and Wing discovered that it could be implemented
without maintaining a totally ordered queue. Given the definition of qBar ,
constructing the refinement mapping required the ability to write abstract
programs mathematically—an ability that comes with practice.
(7.17) |= T ⇒ ∃ n ∈ N : 32(x = n)
This implies:
|= T ≡ ∃ n ∈ N : 32(x = n) ∧ T
(7.18) |= (n ∈ N) ∧ 32(x = n) ∧ T ⇒ ∃ y : IS
|= (c ∈ C ) ∧ L ∧ T ⇒ ∃ . . . : IS
Loose Ends
This chapter covers two topics that, to my knowledge, have not yet seen any
industrial application. However, they might in the future become useful.
The first topic is reduction, which is about verifying that a program satisfies
a property by verifying that a coarser-grained version of the program satisfies
it. Even if you never use it, understanding the principles behind reduction
can help you choose the appropriate grain of atomicity for abstract programs.
For that purpose, skimming through sections 8.1.1–8.1.3 should suffice.
The second topic is about representing a program as the composition of
component programs. We have been representing the components that make
up a program, such as the individual processes in a multiprocess program, as
disjuncts of the next-state action. Section 8.2 explains how the components
that form a program can be described as programs. How this is done depends
on why it is done. Two reasons for doing it and the methods they lead to
are presented.
8.1 Reduction
8.1.1 Introduction
8.1.1.1 What Reduction Is
When writing an abstract program to describe some aspect of a concrete
one, we must decide what constitutes a single step of a behavior. Stated
another way, we must describe what the grain of atomicity of the next-state
action should be. The only advice provided thus far is that we should use
the coarsest grain of atomicity (the fewest steps) that is a sufficiently ac-
curate representation of that aspect of the concrete program. “Sufficiently
220
CHAPTER 8. LOOSE ENDS 221
accurate” means that either we believe it is easy to make the concrete pro-
gram implement that grain of atomicity, or we are deferring the problem of
how those atomic actions are implemented.
Some work has addressed the problem of formalizing what makes an ab-
stract program “sufficiently accurate”, starting with a 1975 paper by Richard
Lipton [41]. This work used the approach called reduction, which replaces
a program S with an “equivalent” coarser-grained program S R called the
reduced version of S . Certain properties of S are verified by showing that
S R satisfies them. The program S R is obtained from S by replacing certain
nonatomic operations with atomic actions, each atomic action producing
the same effect as executing all the steps of the nonatomic operation it re-
places one after another, with no intervening steps of other operations. The
reduced program S R is therefore simpler and easier to reason about than
program S .
It was never clear in exactly what sense S R was equivalent to S , and the
results were restricted to particular classes of programs and of the proper-
ties that could be verified. TLA enabled a new way of viewing reduction.
In that view, the variables of S are replaced in S R by “virtual” variables,
and S implements S R under a refinement mapping. The refinement map-
ping is not made explicit, but the relation between the values of the actual
and the virtual variables is described by an invariant. This mathematical
view encompasses much of the previous work on reduction for concurrent
programs.
Our basic approach to writing a correct concrete program is showing
that it refines an abstract program. There are two aspects to refining one
program with another: data refinement and step refinement. Modern coding
languages have made data refinement easier by providing higher-level, more
abstract data structures. It is now almost as easy to write a program that
manipulates integers as one that manipulates bit strings representing a finite
set of integers. There has been much less progress in making step refinement
easier. As explained in Section 7.6.1, a linearizable object allows a coarse
grain of atomicity in descriptions of the code that executes operations on
the object. However, the only general method of implementing a linearizable
object still seems to be the one invented by Dijkstra in the 1960s: putting
the code that reads and/or modifies the state of the object in a critical
section.
I believe that better ways of describing the grain of atomicity will be
needed if rigorous verification that concrete concurrent programs implement
abstract ones is to become common practice. Reduction may provide the
key to doing this. Section 8.1 provides a mathematical foundation for under-
CHAPTER 8. LOOSE ENDS 222
standing reduction. The theorems presented here are not the most general
ones possible; some generalizations can be found elsewhere [8]. Also omit-
ted are rigorous proofs. I know of no industrial use of reduction or of tools
to support it; and I have no experience using the results described here in
practice. The problem it addresses is real, but I don’t know if reduction is
the solution.
R1. |= S ∧ T ⇒ ∃ X : S R ∧ 2I R
R2. |= S R ⇒ (P with x ← X)
R3. |= (P with x ← X) ∧ 2I R ⇒ P
|= S R ⇒ P and |= P ∧ 2I R ⇒ P
We first consider the case in which S is the usual TLA safety property
Init ∧ 2[Next]hxi for an abstract program. We then consider the program
described by S ∧ F , where F is the conjunction of fairness properties for S .
Conditions R1–R3 will then have S replaced by S ∧ F , the reduced program
CHAPTER 8. LOOSE ENDS 223
R1a. |= S ⇒ ∃ X : S ⊗S R
R1b. |= S ⊗S R ∧ T ⇒ S R ∧ 2I R
8.1.2 An Example
To explain reduction, we start by examining this commonly assumed rule:
When reasoning about a multiprocess program in which interprocess com-
munication is performed by atomic operations to shared data objects, the
program can be represented with any grain of atomicity in which each atomic
action accesses at most one shared data object.1 The following is a state-
ment of the rule in our science and the argument generally used to justify
it.
Suppose S is a multiprocess program with a process that executes a
nonatomic operation, which we call RCL, described by the statements shown
in Figure 8.1. We assume this is “straight line” code, so an execution of RCL
consists of k +1+m steps. For now, we let S be the safety property described
by the code; liveness is discussed later. We assume statements Ri and Lj
can access only data local to the process, while statement C can also access
shared data. The rule asserts that we can replace S with its reduced version
S R obtained by removing all the labels between (but not including) r 1 and
o, so those k + 1 + m statements are executed as a single step, and replacing
the variables x with the variables X. It is usually claimed that we can do
this because other processes can’t observe when the statements Ri and Lj
are executed, so we can pretend that they are executed in the same step as
statement C.
We can reduce other operations of the same form to atomic actions as
well, reducing the operations one at a time. So, it suffices to see how it’s
done for just this single operation RCL, which may be executed multiple
times.
.
.
rk: .
r 1: R1 ;
.
.
.
rk: Rk ;
c: C;
l 1: L1 ;
.
.
.
lm Lm ;
o: ...
.
.
.
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 u R1 s C E2 ? E3 s ? L1 s ? L2 s ?
· · · s41 −→ 42 −→ 43 −→ s 44 −→ s 45 −→ 46 −→ 47 −→ 48 · · ·
@
E1 R1 C E2 L1 E@R ? L2
3@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ s 45 −→ r46 −→
? ? ? s 47 −→ s48
?
···
@
? E1 ? R1 C L1 E@R ? E 3 ? L2
2@
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ r46 −→ s 47 −→ s48
?
···
@
E1 R1 C L1 E2 L2 E@3@
R ?
· · · s41
?
−→ u42
?
−→ s 43 −→ s 44 −→ u45
?
−→ r46
?
−→ u47 −→ s 48 ···
@
? E1 u ? R1 s C L1 ? L2 u E2 ? E3 s ?@
R
@
· · · s41 −→ 42 −→ 43 −→ s 44 −→ u45 −→ 46 −→ u47 −→ 48 · · ·
in the next behavior except for the one state across which the actions are
interchanged.
The arrows in the picture are drawn according to the following rules.
There is a (thin) downward pointing arrow from each non-C state that is
unchanged by the interchange that yields the next behavior. From the one
state in each behavior that is changed by the interchange, there is a (thick)
diagonal arrow. If that state satisfies R (is before the C action), then the
arrow points one state to the left of the changed state. If the state satisfies
L, then the arrow points one state to the right.
These arrows define a unique path from every non-C state s i of the
original behavior to a state in the reduced behavior. Define φ(s i ) to be that
state in the reduced behavior. For the example in Figure 8.2, φ(s 45 ) = u 47
because the sequence of states in the path from s 45 in the top behavior to
the bottom behavior is:
(8.3) s 45 → s 45 → s 45 → r 46 → r 46 → u 47
Figure 8.3 contains an arrow pointing from each non-C state s i in the orig-
inal behavior to the state φ(s i ) in the reduced behavior. Observe that for
every non-C state s i , the state φ(s i ) is a state in which operation RCL is
not being executed—that is, a state satisfying ¬(R ∨ L).
C
We define φ(s) for the C states so that if the C step is s i −→ s i+1 , then
φ(s i ) is the first state to the left of s i for which ¬(R∨L) is true, and φ(s i+1 )
is the first state to the right of s i+1 for which ¬(R ∨ L) is true. In other
words, φ(s i ) and φ(s i+1 ) are the states of the reduced behavior in which the
CHAPTER 8. LOOSE ENDS 228
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 R1 C L1 L2 E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·
R1 E1 C E2 E3 L1 L2
· · · s41 −→ s 42 −→ s 43 −→ s 44 −→ s 45 −→ s 46 −→ s 47 −→ s48 · · ·
? E1 R1 C L1 L2 j E2 j E3 j
R?
· · · s41 −→ u42 −→ s 43 −→ s 44 −→ u45 −→ u46 −→ u47 −→ s48 · · ·
φ3. If s k → s k +1 is a C step, then φ(s k ) and φ(s k +1 ) are the first and
last states of an execution of operation RCL in the reduced behavior
Φ(σ), which is an execution with no interleaved steps of other process
actions.
satisfies R1a, this will show that it satisfies R1 with T equal to 23¬L. The
assumption 23¬L is discussed in Section 8.1.4.
Following the path from s to φ(s) backwards for states s in which R is true
similarly leads to the following statement, where R equals R 1 ∨ . . . ∨ R k .2
2
The action R bears no relation to the superscript R in S R . It is traditional to name
the action R for Right-mover because of the way the reduced behavior is constructed; and
there seems to be no better superscript than R to signify reduced.
CHAPTER 8. LOOSE ENDS 231
Finally, Figure 8.4 shows that for any state s of the original behavior, φ(s)
is always a state in the reduced behavior in which the RCL operation is not
being executed, so ¬(R ∨ L) is true for φ(s). Because the rule for drawing
the arrows in Figure 8.2 creates a leftward pointing arrow whenever an R
step is moved to the right and a rightward pointing arrow whenever an L
step is moved to the left, this is true in general. Therefore, we have:
φ7. For any state s, the state predicate ¬(R ∨ L) is true in state φ(s).
Statements φ4–φ7 give us relations between s and φ(s) for all states s of a
behavior of our example program. We now have to turn them into relations
between the values of the variables x and the variables X in any state s in
a behavior of S ⊗S R .
It’s easy to do this for φ4. The values of the variables X in state s are
the values of x in φ(s). Since φ(s) = s if s satisfies ¬(R ∨ L), this means
that x = X is true for any reachable state of S ⊗S R satisfying ¬(R ∨ L). In
other words:
so formula φ6 implies:
0
(8.5) R ⇒ ([R]+
x awith x ← X, x ← x) is an invariant of S ⊗S R
S = Init ∧ 2[E ∨ M ]x
∆
We can therefore define S R as follows, where Init R and E R are Init and E
with the variables x replaced by the variables X:
S R = Init R ∧ 2[E R ∨ M R ]X
∆
Theorem 8.1 Assume Init, L, and R are state predicates, M and E are
actions, x is a list of all variables appearing in these formulas, and X is a
list of the same number of variables different from the variables x. Define
S = Init ∧ 2[E ∨ M ]x
∆ ∆ ∆
R = M ∧ R0 L = L∧M
∆ ∆
Init R = Init with x ← X E R = E with x ← X
∆
M R = (¬(R ∨ L) ∧ M + ∧ ¬(R ∨ L)0 ) with x ← X
S R = Init R ∧ 2[E R ∨ M R ]X
∆
∆
IR = ∧ ¬(R ∨ L) ⇒ (X = x)
∧ R ⇒ ([R]+ 0
x awith x ← X, x ← x)
∧ + 0
L ⇒ ([L]x awith x ← X)
∧ ¬(R ∨ L) with x ← X
and assume:
(1) |= Init ⇒ ¬(R ∨ L)
(2) |= S ⇒ 2 [ ∧ E ⇒ (R0 ≡ R) ∧ (L0 ≡ L)
∧ ¬(L ∧ M ∧ R0 )
∧ ¬(R ∧ L)
∧ R·E ⇒E ·R
∧ E · L ⇒ L · E ]x
Then |= S ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R .
Assumption (1) and the first three conjuncts in the action of assumption
(2) are the conditions M1–M4, which assert that an execution of the opera-
tion described by the action M consists of a sequence of L steps followed by
a C step followed by a sequence of R steps. The final two conjuncts in the
action of assumption (2) are the assumptions that R right-commutes with
E and L left-commutes with E .
In practice, R, L, and E will be defined to be the disjunction of sub-
actions. This allows us to decompose the proofs of those commutativity
conditions by using the following theorem that is proved in the Appendix.
|= (∀ i ∈ I , j ∈ J : Ai · B j ⇒ B j · Ai ) ⇒ (A · B ⇒ B · A)
implies |= S ⇒ P .
(8.9) |= S ⊗S R ∧ F ∧ 23¬L ⇒ S R ∧ 2I R ∧ F R
XFx (A) ≡ (2
32
3 EhAix ⇒ 23hAix )
XFX (AR ) ≡ (2
32
3 EhAR iX ⇒ 23hAR iX )
where 2 323 means 32 if XF is WF and 23 if XF is SF. These formulas and
a little temporal logic imply that to prove (8.10) it suffices to prove these
two theorems:
Since hAR iX equals hAix and formulas EhAix and EhAix contain only the
variables X, (8.14a) is equivalent to:
(8.15) |= S R ⇒ 2( EhAix ⇒ EhAix )
If E were not a weird operator (see Section 6.4.4.3), EhAix would be equiv-
alent to EhAix ; and we expect that equivalence to be true for most actions
hAix . However, because it is not always true, we have to add (8.15) as an
assumption.
To see what is required to make (8.14b) true, we consider what assump-
tion is required to ensure that P ⇒ P is true for an arbitrary state predicate
P with free variables x. The free variables of P are X, and the relation be-
tween the values of x and X is described by the invariant I R of S ⊗S R .
CHAPTER 8. LOOSE ENDS 239
Let’s review what we have shown. We can deduce (8.10) from (8.11) and
(8.12). If (8.13) is true, then we can choose S R of Theorem 8.1 to make
(8.11) true for a single subaction A of E . We can deduce (8.12) from (8.14a)
and (8.14b). We can deduce (8.14a) from (8.15). And finally, we can deduce
(8.14b) from the conditions obtained above for proving P ⇒ P , substituting
EhAix for P . Putting all this together, we have shown that the program
S R of Theorem 8.1 can be chosen to make (8.10) true, for a single subaction
A of E , if the following two conditions are satisfied:
|= S ⇒ 2[ (hAix )ρ ⇒ (x0 6= x) ]x
There is seldom any reason for a program’s next-state action to allow stut-
tering steps, and modifying it to disallow stuttering steps does not change
the program. An A step of the program will usually be an hAix step; and
if it isn’t, A can be replaced by hAix . So for simplicity, we strengthen this
assumption to:
This may seem wrong because we have EhAρ ix in (8.20a) and EhAρ ix
in (8.20b) when the two formulas should be equal. However, the following
reasoning shows that they are equal. The definition of Aρ and conditions E3
and E4 of Section 6.4.4.2 imply that EhAρ ix equals ¬(R ∨ L) ∧ EhAρ ix .
The invariant I R implies that ¬(R ∨ L) ⇒ (x = X) and ¬(R ∨ L) are
true, so S ⊗S R implies that EhAρ ix always equals EhAρ ix . We make S R
implying (8.20a) one of our requirements for deducing that WFX (AR ) is
satisfied. We now consider (8.20b).
By (3.33b) of Section 3.4.2.8 and the tautology |= ¬hAix ≡ [¬A]x , to
prove (8.20b) it suffices to prove:
We have seen that to deduce WFX (AR ) from SFx (A), it suffices to show
(8.18) and:
For the other three possible pairs of fairness conditions on AR and A, the
same argument shows that we can deduce SFX (AR ) instead of WFX (AR )
by replacing 32 with 23 in (8.23b); and we can assume WFx (A) instead
of SFx (A) by replacing 23 with 32 in (8.23b).
CHAPTER 8. LOOSE ENDS 243
Theorem 8.4 With the definitions and assumptions (1) and (2) of Theo-
∆
rem 8.1, let C = (¬L) ∧ M ∧ (¬R0 ) and let
∆
|= F ⇒ ∀ i ∈ I : YFix (Ai ) F R = ∀ i ∈ I : ZFiX (AR
i )
where I is a countable set and YFi and ZFi are WF or SF for each i ∈ I ;
and assume either:
∆ i i
• Ai is a subaction of E , AR
i = Ai , YF equals ZF ,
CHAPTER 8. LOOSE ENDS 244
= Aρi ,
∆
AR
i
|= S ⇒ 2[Aρi ⇒ (x0 6= x)]x ,
ρ
|= S R ⇒ 2( EhAR
i iX ⇒ EhAi ix ) , and
|= S ∧ F ⇒ (2
32
3 Z EhAρi ix ∧ 2[¬Ai ]x ; 2
32
3 Y EhAi ix )
where for Q either Y or Z, 2
32
3 Q is 32 if QF is WF, and
it is 23 if QF is SF.3
Then |= S ∧ F ∧ 23¬L ⇒ ∃ X : S R ∧ 2I R ∧ F R .
unchanged formulas assert that all program variables other than sem and
pc are left unchanged:
∆
Pp = ∧ pc(p) = . . .
∧ (sem = 1) ∧ (sem 0 = 0)
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .
∆
Vp = ∧ pc(p) = . . .
∧ sem 0 = 1
∧ pc 0 = (pc except p 7→ . . .)
∧ unchanged . . .
CSq,j : CS q,j · V p and CS q,j · CS p,i both equal false because a CS q,j step
leaves process q inside its critical section, which by the mutual exclu-
sion algorithm implies process q is outside its critical section so neither
CS p,i nor V p is enabled.
|= S R ⇒ 2( EhM R iX ⇒ EhM ρ ix )
obtain a partial result that it appends to the end of a fifo queue. Process 2
removes the partial result from the head of the queue and completes the
computation. Process 1 can therefore get ahead of process 2, performing
its part of the i th computation while process 2 is still performing its part
of the j th computation, for i > j . The reduced program S R replaces these
two processes with a single process that performs each computation as a
single atomic action. The property we want to prove by reduction presum-
ably involves how the computed values are used after they are computed,
when they have the same values in the original and reduced programs, so
condition R3 is satisfied.
We describe steps of process 1 by an action Cmp1 ∨ Send , where that
process’s part of a computation consists of a finite sequence of Cmp1 steps
followed by a single Send step that appends the partial result to the tail
of the queue. Steps of process 2 are described by an action Rcv ∨ Cmp2,
where that process’s part of the computation consists of a single Rcv step
that removes the partial result from the head of the queue followed by a
finite sequence of Cmp2 steps. The contents of the queue are described by
the variable qBar , which is accessed only by the Send and Rcv actions. We
assume that the two processes communicate only through the fifo qBar ,
an assumption expressed by these conditions: Cmp1 commutes with the
process 2 actions Rcv and Cmp2, and Cmp2 commutes with the process 1
actions Cmp1 and Send . Since qBar is the only shared variable accessed
by Rcv and Send , it doesn’t matter in which order these two actions are
executed in a state where the queue is nonempty. Thus, we have:
The program may contain other processes that can interact in some way
with processes 1 and 2. For example, process 1 may obtain its input from a
third process and process 2 may send its output to a fourth process.
The program’s next-state action is M ∨ O, where M describes processes
1 and 2 and O describes any other processes. We rewrite M in the form
∃ n ∈ N+ : M n , where N+ is the set of positive integers and M n is an ac-
tion whose steps describe a complete execution of the n th computation. To
do this, we assume state functions snum and rnum whose values are the
numbers of Send and Rcv actions, respectively, that have been executed.
Initially, snum = rnum = 0. The Send action increments snum by 1 and
the Rcv action increments rnum by 1. We can then define:
∆
(8.25) M n = ∨ (snum = n − 1) ∧ (Cmp1 ∨ Send )
∨ ((rnum = n − 1) ∧ Rcv ) ∨ ((rnum = n) ∧ Cmp2)
CHAPTER 8. LOOSE ENDS 249
Again, with multiple reductions we let the reduced program have the same
variables as the original program, so the n th reduction replaces the action
M n with M ρn .
The remaining action E for this reduction is the disjunction of these
actions: the action O describing the other processes, the already reduced
actions M ρk for k < n, and the subactions of M k for k > n. To apply
Theorem 8.1, we must show that R n right-commutes with these actions and
Ln left-commutes with them.
That R n right-commutes and Ln left-commutes with O must be as-
sumed. The commutativity relations hold for M ρk with k < n because an
R n step is enabled only after an M ρk step, which implies R n · M ρk equals
false (so R n right commutes with M ρk ), and which also implies that Ln
cannot be enabled immediately after an M ρk step, so M ρk · Ln also equals
false.
What remains to be shown is that Cmp1n (the action R n ) right com-
mutes with M k , and that Rcv n and Cmp2n (whose disjunction equals Ln )
left commutes with M k , for k > n. For that, we have to show that each
of the four actions whose disjunction equals M k satisfy those commutativ-
ity conditions. We will use the commutativity relations we assumed above:
that Cmp1 commutes with Cmp2 and Rcv , and that Cmp2 commutes with
Send . The assumption that Cmp2 commutes with Send implies that Cmp2i
commutes with Send j for all i and j . This follows from the definitions of
Cmp2i and Send j , because Cmp2 does not depend on or modify snum and
Send j does not depend on or modify rnum. Similarly, Cmp1i commutes
with Rcv j and Cmp2j for all i and j . These assumptions are called the
commutativity assumptions in the following proof sketches of the required
commutativity relations. Recall that we are assuming k > n.
CHAPTER 8. LOOSE ENDS 250
other processes, to ensure that the operation will complete once process 1’s
Send action occurs. The obvious fairness condition we want the reduced
program to satisfy is fairness of M ρ . If an M ρn action is enabled, then no
M ρi action with i 6= n can be enabled until an M ρn step occurs. This implies
that (weak or strong) fairness of M ρ is equivalent to fairness of M ρn for all
n. For each n, ensuring fairness of M ρn is the second case in Theorem 8.4,
with Ai equal to C n , which equals Send n . The assumption |= S ∧ F ⇒ . . .
in that case of the theorem will have to be implied by fairness conditions on
subactions of Cmp1.
steps of those three parts. But math provides many ways to structure a
proof, and deciding in advance to structure it by decomposition might rule
out better proofs.
The one good reason to decompose the verification of a program in this
way is that it may make it easier to use a tool to verify correctness. For
example, a model checker might be able to verify correctness of individual
components but not correctness of the complete program. Decomposition
would allow using model checking to perform part of the verification, and
then using the results presented here to prove that correctness of the com-
ponents implies correctness of the entire program. This approach has been
applied to a nontrivial example [25], but I don’t know of any case in which
it has been used in industry.
Composition is useful if an engineer wants to verify correctness of a pro-
gram that describes a system built using an existing component whose be-
havior is specified by an abstract program. Up until now, we have described
a program by a formula that is satisfied by behaviors in which the program
to be implemented, which I will here call the actual program, and its envi-
ronment are both acting correctly. There was no need for the mathematical
description to separate the actual program and its environment, since it
makes no difference if an execution is incorrect because the programmer
didn’t understand what the code would do or what the environment would
do. However, if a program is implemented using a component purchased
elsewhere, it is important to know if an incorrect behavior is due to an in-
correct implementation of the actual program or of the component, which
is part of the environment.
For composition, we therefore describe a program with two formulas,
formula M describing the correct behavior of the actual program and a
formula E describing correct behavior of its environment. These formulas
+
are combined into a single formula, written E −. M , that can be thought of
as being true of a behavior iff M is true as long as E is (so M is always true
+
if E is always true). Formula E −. M is what is called a rely/guarantee
description of the program [23].
Currently, implementing actual programs with precisely specified exist-
ing components seems likely to arise in practice only for components that
are traditional programs that perform a computation and stop; and where
execution of the component can be considered to be a single step of the
complete program. In that case, there is no need for TLA. As explained in
Appendix Section A.5, the safety property of the component can be speci-
fied by a Hoare triple; and termination is the only required liveness property.
Composition in TLA is needed only if the existing component interacts with
CHAPTER 8. LOOSE ENDS 253
its environment in a more complex way that must be described with a more
general abstract program. Such reusable, precisely specified components do
not seem to exist now. Perhaps someday they will.
The results presented here come from a single paper [2]. The reader is
referred to that paper for the proofs. To make reading it easier, much of
the notation used here—including the identifiers in formulas—is taken from
that paper.
∆
Init M = Init a ∧ Init b
∆
Next M = Next a ∨ Next b
∆
LM = La ∧ Lb
Formula M is equivalent to the conjunction of M a and M b , defined by:
M a = Init a ∧ 2[Next a ]a ∧ La
∆
M b = Init b ∧ 2[Next b ]b ∧ Lb
∆
This result follows from the equivalence of 2(F ∧ G) and 2F ∧ 2G, for any
formulas F and G, and from
Theorem 8.5 If m 1 , . . . , m n are each lists of variables, with all the vari-
∆
ables in all the lists distinct, N = 1 . . n, and
∆
m = m 1, . . . , m n
M i = Init i ∧ 2[Next i ]hm i i ∧ Li
∆
∆
M = ∀i ∈ N :Mi
|= M ⇒ 2 [ ∀ i , j ∈ N : Next i ∧ (i 6= j ) ⇒ (hm j i0 = hm j i) ]m
then
|= M lc ∧ M ld ⇒ M c and |= M lc ∧ M ld ⇒ M d
but that doesn’t reduce the amount of work involved. However, suppose that
correctness of M lc doesn’t depend on its environment being the component
M ld , but just requires its environment to satisfy the correctness condition
M d of that component, and similarly correctness of M ld just requires that
the other component satisfies M c . We would then like to reduce verification
of (8.27) to verifying:
(8.28) |= M d ∧ M lc ⇒ M c and |= M c ∧ M ld ⇒ M d
This would reduce the amount of work because M c and M d are probably
significantly simpler than M lc and M ld . Can we do that?
Let’s consider the following trivial example, where each component ini-
tializes its variable to 0 and keeps setting its variable’s value to the value of
the other component’s variable.
∧ WFc ((c 0 = d ) ∧ (d 0 = d ))
M ld = (d = 0) ∧ 2[(d 0 = c) ∧ (c 0 = c)]d
∆
M c = 3(c = 1) M d = 3(d = 1)
∆ ∆
CHAPTER 8. LOOSE ENDS 257
while keeping M lc and M ld the same. Condition (8.28) is still satisfied be-
cause each component eventually sets its variable to 1 if the other component
sets its variable to 1. However, (8.27) is not satisfied. Changing the cor-
rectness conditions doesn’t change the behavior of the program, which is to
take nothing but stuttering steps.
We might ask why we can’t deduce (8.27) from (8.28) in this example.
However, the real question is why we can deduce it in the first example. De-
ducing (8.27) from (8.28) is deducing, from the assumption that correctness
of each component implies correctness of the other, that both components
are correct. This is circular reasoning, and letting M c = M d = false shows
that it allows us to deduce that any program implies false, from which we
can deduce that the program satisfies any property.
So, why does (8.28) imply (8.27) in the first case? Why can we deduce
that both components leave their variables equal to 0 from the assumption
that each component leaves its variable equal to 0 if the other process leaves
its variable equal to 0? The reason is that neither process can set its variable
to a value other than 0 until the other one does. Stated more generally,
we can deduce that both components in a two-component program satisfy
their correctness properties if neither component can violate its correctness
property until after the other does. So we want to replace (8.28) by:
(8.30) |= ∀ k ∈ N :
(M d true through state k − 1)
∧ (M lc true through state k ) ⇒ (M c true through state k )
plus the same condition with c and d interchanged, where F true through
state −1 is taken to be true for any property F .
To express (8.30) precisely, we have to say what it means for a property
F to be true through state k . If F is a safety property, it means that F is
true of the finite behavior σ(0) → . . . → σ(k ), which means it’s true of the
(infinite) behavior obtained by repeating the state σ(k ) forever. It follows
from Theorems 4.4 and 4.5 that any property F equals C(F ) ∧ L where L is
a liveness property such that hC(F ), Li is machine closed. By the definition
of machine closure (in Section 4.2.2.2), any finite behavior that satisfies
C(F ) can be completed to a behavior satisfying C(F ) ∧ L, which equals F .
Therefore, the only way a behavior can fail to satisfy F through state k is
for it not to satisfy C(F ) through state k , so F is true through state k means
that C(F ) is true through state k . We should therefore replace M d , M lc ,
and M c by C(Md ), C(Mcl ), and C(Mc ) in (8.30). For a safety property, true
through state k means true if all states i with i > k equal state k , so we
CHAPTER 8. LOOSE ENDS 258
(8.31) |= ∀ k ∈ N :
(every state after state k equals state k ) ⇒
( (C(M d ) true through state k − 1) ∧ C(M lc ) ⇒ C(M c ) )
Next, let v be the tuple of all variables in these formulas. We can then
replace the assertion “every state . . . state k ” in (8.31) with “v 0 = v from
state k on”. By predicate logic, if k does not appear in R or S , then
(∀ k : P ⇒ (Q ∧ R ⇒ S )) ≡ ((∃ k : P ∧ Q) ∧ R ⇒ S )
we can then write (8.32) and the condition obtained from it by interchanging
c and d as:
be the formula that we have been calling F+v . We now define F+v to equal
old ∨ F . With this definition, (8.33) implies its two conditions also hold
F+v
with the “+v ” removed. If F is a safety property, then F v is a safety
property but F+vold usually isn’t. In fact, if F is a safety property then F
+v
old
equals C(F+v ). In practice, the change should seldom make a difference in
(8.34) because we don’t expect liveness properties to be useful for proving
safety properties, so we wouldn’t expect F ∧ G ⇒ H to be true for safety
properties G and H without C(F ) ∧ G ⇒ H also being true.
The formula F+v has been defined semantically. However, to verify (8.33)
directly, we have to write C(F )+v as a formula for a given formula F . It’s
CHAPTER 8. LOOSE ENDS 259
easy to write C(F ) if F has the usual form Init ∧ 2[Next]w ∧ L , where w is
the tuple of variables in the formulas and L is the conjunction of fairness
properties of subactions of Next. In that case, the definition of machine
closure (Section 4.2.2.2) and Theorem 4.7 (Section 4.2.7) imply C(F ) equals
Init ∧ 2[Next]w . We can then write F+v as follows:
∆
F+v = ∃ h : Init
d ∧ [Next]
d
w ◦v ◦hh i
∆
where Init
d = (Init ∧ (h = 0)) ∨ (h = 1)
∆
d = ∨ (h = 0) ∧ ∨ (h 0 = 0) ∧ [Next]w
Next
∨ h0 = 1
∨ (h = 1) ∧ (h 0 = h) ∧ (v 0 = v )
While writing F+v is easy enough, we usually don’t have to for the same
reason that we didn’t have to use the +v subscripts in (8.27). Our example
has one feature that we didn’t use in our generalization—namely, that no
single program step can make both M c and M d false. Here’s how to use that
feature in general. For safety properties F and G, define F ⊥ G to be true of
a behavior σ iff for every k ∈ N, if F ∧ G is true of σ(0) → . . . → σ(k ) then
F ∨ G is true of σ(0) → . . . → σ(k + 1). Understanding why the following
theorem is true is a good test that you understand the definition of F+v .
• No step is both a c step and a d step. This condition means that a step
in a behavior of the program consists of a step of a single component.
CHAPTER 8. LOOSE ENDS 260
1. |= ∀ j ∈ 1 . . n : C(M j ) ⇒ E i
then |= (∀ i ∈ 1 . . n : M li ) ⇒ (∀ i ∈ 1 . . n : M i )
CHAPTER 8. LOOSE ENDS 261
• Instead of using the conjunction of C(M j ) for all the components’ cor-
rectness properties M j in hypothesis 2, it allows E i to be any property
implied by that conjunction that is strong enough to satisfy hypothe-
sis 2. I expect E i will always be a safety property, but it’s conceivable
that it might not be.
The theorem does not make any assumption about v . That’s because if w
is the tuple of all variables appearing in the formulas (including in v ), then
F+w implies F+v . Thus, if hypothesis 2(a) is satisfied for any state function
v , then it’s satisfied with v equal to the tuple of all variables in the formulas.
Letting v equal that tuple produces the weakest (hence easiest to satisfy)
hypothesis.
when run in an environment that satisfies 2(y = 0). So, we decide to write
our program as M lc ∧ M ld where:
∆
M lc = (M lx with x ← c, y ← d )
∆
M ld = (M lx with x ← d , y ← c)
This silly example captures the most important aspect of specifying com-
ponents: No real device will satisfy a specification such as 2(c = 0) when
executed in an arbitrary environment. For example, a process will not be
able to compute the GCD of two numbers if other processes can at any time
arbitrarily change the values of its variables.
We want to deduce that M lc ∧ M ld implies 2(c = 0) ∧ 2(d = 0)
from the properties that components c and d satisfy, without knowing
what M lc and M ld are. The property that the c component satisfies is
that if its environment satisfies 2(d = 0) then the component satisfies
2(c = 0); and d satisfies the same condition with d and c interchanged.
The obvious way to express these two properties is 2(d = 0) ⇒ 2(c = 0)
and 2(c = 0) ⇒ 2(d = 0), but those two properties obviously don’t imply
2(c = 0) ∧ 2(d = 0). We need to find the right way to express mathemati-
cally the condition that a component satisfies the property M if its environ-
ment satisfies the property E . We do this by assuming that the condition is
+ +
expressed by a formula E −. M and figuring out what the definition of −.
should be, given the assumption that the definition should make this true:
(8.36) |= 2(d = 0) −.
+
2(c = 0) and |= 2(c = 0) −.
+
2(d = 0)
implies |= 2(c = 0) ∧ 2(d = 0)
It’s instructive to compare Theorems 8.7 and 8.8. They both make no
assumption about v , since letting it equal the tuple of all variables in the
formulas yields the weakest hypothesis 2(a). Hypothesis 1 differs only in
Theorem 8.8 having the additional conjunct C(E ). This conjunct (which
weakens the hypothesis) is expected because, if M is the conjunction of the
M i , then the M in the conclusion of Theorem 8.7 is replaced in Theorem 8.8
+
by E −. M.
As we observed for Theorem 8.7, hypothesis 1 of Theorem 8.8 pretty
much requires the E i to be safety properties. However, when applying The-
orem 8.8, we can choose to make them safety properties by moving the
liveness property of E i into the liveness property of M i . More precisely,
suppose we write E i as E Si ∧ E Li , where E Si is a safety property and E Li
a liveness property such that hE Si , E Li i is machine closed; and we simi-
larly write M i as M Si ∧ M Li . We can then replace E i by E Si and M i by
M Si ∧ (E Li ⇒ M Li ).5 This replaces the property E i −.
+
M i by the stronger
property:
(8.39) E Si −.
+
(M Si ∧ (E Li ⇒ M Li ))
It is stronger because if the environment doesn’t satisfy its liveness property
E Li , then E i −.
+
M i is satisfied no matter what the component does; but in
5
By definition of machine closure, hMiS , MiL i machine closed implies hMiS , EiL ⇒ MiL i
is also machine closed, because MiL implies EiL ⇒ MiL .
CHAPTER 8. LOOSE ENDS 264
that case, (8.39) still requires the component to satisfy its safety property
M Si if the environment satisfies its safety property E Si . The two formulas
should be equivalent in practice because machine closure of hE Si , E Li i implies
that, as long as the environment satisfies its safety property, the component
can’t know that the environment’s entire infinite behavior will violate its
liveness property.
Theorem 8.8 has been explained in terms of M i being the property sat-
isfied by a component whose description M li we don’t know, with M a
property we want the composition of the components to satisfy. It can also
be applied by letting M i be the actual component M li and letting M be
the composition ∀ i ∈ 1 . . n : M li of those components. The theorem then
tells us under what environment assumption E the composition will behave
properly if each M li behaves properly under the environment assumption
E i . However, there is a problem when using it in this way. To explain
the problem, we return to our two components c and d whose composition
satisfies 2(c = 0) ∧ 2(d = 0).
The definitions M lc and M ld in (8.29) were written for components c and
d intended to be composed with one another. They were not written to de-
scribe a component that satisfies its desired property only if the environment
satisfies its property. We now want to define them and their environment
assumptions E c and E d so that:
+
|= (E c −. M lc ) ⇒ 2(c = 0)
+
|= (E d −. M ld ) ⇒ 2(d = 0)
The definition of M lc asserts that the value of d cannot change when the
value of c changes (because of the conjunct d 0 = d in the next-state relation)
and d cannot change when c doesn’t change (because of the subscript hc, d i).
That’s a property of its environment. If we want d to satisfy that property,
we should state it in E c , not inside the definition of M lc . So, the definition
of M lc should be
M lc = (c = 0) ∧ 2[c 0 = d ]c ∧ WFc (c 0 = d )
∆
[ ∨ (c 0 = d ) ∧ (d 0 = d )
∨ (d 0 = c) ∧ (c 0 = c)
∨ (c 0 = d ) ∧ (d 0 = c) ]hc,d i
6
An interleaving description is often taken to mean any description of a program’s
executions as sequences of states and/or events, so by that meaning all TLA program
descriptions are interleaving descriptions.
Appendix A
Miscellany
266
APPENDIX A. MISCELLANY 267
A.1.4 Sets [Section 2.5, Math II, Math V, Math VI, Math XI]
v ∈ S Equals true iff v is an element of the set S .
{exp1 , . . . , expn } The set for which v ∈ S equals true iff v equals one (or
more) of the expressions exp i .
Sets of Numbers
R The set of all real numbers.
I The set of all integers.
N The set of all non-negative integers (natural numbers).
m . . n The set of all integers i satisfying m ≤ i ≤ n.
Then ASqrt(4) might equal 2 and ASqrt(9) might equal −3. Since this is
math, |= ASqrt(4) = ASqrt(4) is true. The value of ASqrt(4) may be 2 or
−2. But whichever value it equals, like every mathematical expression with
no free variable, it always equals the same value.
Formally, choose is defined by the following rules:
If there is more than one value of x for which F equals true, then
choose x : F can equal any of those values. But it always equals the same
value.
No matter how often I repeat that the choose operator always chooses
the same value, there are engineers who think that choose is nondeter-
ministic, possibly choosing a different value each time it’s evaluated; and
they try to use it to describe nondeterminism in a program. I’ve also heard
computer scientists talk about “nondeterministic functions”.1 There’s no
such thing. There’s no nondeterminism in mathematics. Nondeterminism
is important in concurrent programs, and Section 3.3 shows that it’s easy
to describe mathematically. Adding nondeterminism to math for describing
nondeterminism in a program makes as much sense as adding water to math
for describing fluid dynamics.
An expression choose v : F is most often used when there is only a
√
single choice of v that makes F true, as in the definition of r above.
Sometimes, it appears within an expression whose value doesn’t depend on
which value of v satisfying F is chosen.
v ∈S →
7 exp The function f with domain S such that f (v ) = exp for all
values of the variable v in the set S .
An infinite ordinal sequence is a function with domain the set of all positive
integers.
A finite cardinal sequence σ of length n is a function σ with domain
0 . . (n − 1). We can write such a function σ as σ(0) → . . . → σ(n − 1) , if
n > 0. An infinite cardinal sequence is a function with domain the set N of
all natural numbers.
Except for Append , all the following operators that take sequences as
arguments are defined for both ordinal and cardinal sequences.
Len(σ) The length of the sequence σ.
σ ◦ τ The concatenation of the finite sequence σ and the finite or infinite
sequence τ , where the sequences are both ordinal or both cardinal.
Head (σ) The first element (σ(1) or σ(0)) of the nonempty (positive-length)
sequence σ.
Tail (σ) The sequence obtained by removing the first element of the nonempty
sequence σ.
Append (σ, exp) The sequence σ ◦ hexp i for an ordinal sequence σ.
Seq(S ) The set of all ordinal sequences σ such that σ(i ) ∈ S for all i ∈
domain(σ).
S1 × . . . × Sn The set of all n-tuples σ such that σ(i ) ∈ S i for all i in 1 . . n,
for any integer n > 1.
(If there are no assumptions F i , then the “Assume:” and “Prove:” are
omitted.) Suppose that the current goal of this statement is H . Statement
(A.6) asserts that to prove H , it suffices to prove G under the additional
assumptions F 1 , . . . F n . In other words, the statement asserts the formula
A ⇒ H where A is the assertion made by the Assume/Prove. The context
of the proof of (A.6) is the same as for the statement:
Assume: A Prove: H
The context of the following step consists of the context of the Suffices
step with the added assumptions F i . This means that F i is assumed true
if it is a formula, and if it is new v ∈ S then the context contains the
declaration of v and the assumption that v ∈ S is true. The current goal of
the following step is G.
Finally, there is a terminal proof. It specifies which formulas and defi-
nitions in the proof’s context are used to prove its current goal. A formula
APPENDIX A. MISCELLANY 274
Define Russell to be the mapping on the collection of all sets that are
∆
mappings such that Russell (S ) = choose U : U = 6 M (S )(S ).
3. Russell is a mapping such that Russell (S ) 6= M (S )(S ) for all sets S that
are mappings.
Proof: The value of any syntactically correct formula is a set, even if its
elements are unspecified. Therefore, M (S )(S ) is a set, and for any set T
there exists a set U such that U 6= T . Thus, Russell is a mapping such
that Russell (S ) 6= M (S )(S ) for every mapping S .
4. Russell (S ) 6= S (S ) for all sets S that are mappings.
Proof: Substituting S for U in step 2 shows S (S ) equals M (S )(S ), which
by step 3 is unequal to Russell (S ).
5. Q.E.D.
Proof: Since Russell is a mapping, and all mappings are assumed to
be sets, substituting Russell for S in step 4 proves Russell (Russell ) 6=
Russell (Russell ), which equals false.
It apparently defines F (3) to equal x 000 . It doesn’t. To see why not, let’s
simplify things by defining F to be a function with domain N:
∆
F = choose f : f = (n ∈ N 7→ if n = 0 then x else f (n − 1)0 )
There are also rules for deriving a Hoare triple for a program from Hoare
triples of its components. Here are three such rules:
Such rules decompose the proof of a Hoare triple for any program to proofs of
Hoare triples for elementary statements of the language, such as assignment
statements.
It was quickly realized that pre- and postconditions are not adequate to
describe what a program should do. For example, suppose S is a program
to sort an array x of numbers. The obvious Hoare triple for it to satisfy has
a precondition asserting that x is an array of numbers and a postcondition
asserting that x is sorted. But this Hoare triple is true of a program that
simply sets all the elements of the array x to 0. A postcondition needs to
be able to state a relation between the final values of the variables and their
initial values. Various ways were proposed for doing this, one of them being
to allow formulas P and Q to contain constants whose values are the same
in the initial and final states. For example, the precondition for a sorting
program could assert that the constant x 0 equals x , and the postcondition
could assert that the elements of the array x are a sorted permutation of
the elements of x 0.
Viewing a program as a relation between initial and final states means
that it can be described mathematically as a formula of the Logic of Actions.
If we represent the program S as an LA formula, then {P }S {Q} is the
assertion |= P ∧ S ⇒ Q 0 ; and the Hoare logic rules follow from rules of LA.
For example, the program S ; T is represented in LA as S · T , where “·” is
the action composition operator defined in Section 3.4.1.4. The Hoare Logic
rule (A.9) is equivalent to this LA rule:
|= P ∧ S ⇒ R 0 and |= R ∧ T ⇒ Q 0 imply |= P ∧ (S · T ) ⇒ Q 0
The program if R then S else T end if is represented by the LA formula
(R∧S )∨(¬R∧T ), and rule (A.10) becomes the propositional-logic tautology:
|= (P ∧ R ∧ S ⇒ Q 0 ) ∧ (P ∧ ¬R ∧ T ⇒ Q 0 ) ⇒
(P ∧ ((R ∧ S ) ∨ (¬R ∧ T )) ⇒ Q 0 )
Hoare’s rule (A.8) for assignment statements is obtained from LA by rep-
resenting the statement x := exp as (x 0 = exp) ∧ ((v x̃ )0 = v x̃ ) , where
v x̃ is the tuple of all program variables other than x . It is valid because
|= P ∧ (x 0 = exp) ∧ ((v x̃ )0 = v x̃ ) ⇒ Q 0 equals |= (P with x ← exp) ⇒ Q
if v x̃ is a tuple containing all variables other than x that appear in P or exp.
Rule (A.11) is a bit tricky because, when executed in a state in which R
equals false, the while statement leaves all variables unchanged. We can
represent that while statement by
((R ∧ S )+ ∧ ¬R 0 ) ∨ (¬R ∧ (v 0 = v ))
APPENDIX A. MISCELLANY 278
where v is the tuple of all program variables and (. . .)+ is defined in Sec-
tion 3.4.1.4. With this representation of the while statement, (A.11) can
be derived from the following rule of LA, where I is any state predicate and
A any action:
(A.12) |= I ∧ A ⇒ I 0 implies |= I ∧ A+ ⇒ I 0
The LA definition of a Hoare triple implies that the validity of rule (A.11)
is proved by the following theorem:
4. Q.E.D.
Proof: By the step 1 assumption, steps 2 and 3 cover all possibilities.
Do you see why these conditions imply δ(p, q) ≥ 0 for all p and q in M ?
The set R of real numbers is a metric space with δ(p, q) equal to |p − q|,
where |r | is the absolute value of the number r , defined by
∆
|r | = if r ≥ 0 then r else − r
in S and there are elements q of S such that δ(h4, 7i, q) is arbitrarily close
to 1.
For any metric space M and subset S of M , if p ∈ S then δ(p,b S) = 0
because condition M1 implies δ(p, p) = 0. In general, δ(p, S ) = 0 for p ∈ M
b
iff for every e > 0 there exists q ∈ S such that δ(p, q) < e.
The closure operation C on subsets of a metric space M is defined by
letting C(S ) be the set {p ∈ M : δ(p,
b S ) = 0} of all elements M that are a
distance 0 from S . For example, if M is the plane, let OD and CD be the
open and closed disks of radius 1 centered at the origin, defined by:
∆
OD = {p ∈ M : δ(p, h0, 0i) < 1}
∆
CD = {p ∈ M : δ(p, h0, 0i) ≤ 1}
Theorem A.2 For any subset S of a metric space, S ⊆ C(S ) and C(S ) =
C(C(S )).
Proof: The definition of C and property M1 imply S ⊆ C(S ) for any set S ,
which implies C(S ) ⊆ C(C(S )) for any S . Therefore, to show C(S ) = C(C(S )),
it suffices to assume p ∈ C(C(S )) and show p ∈ C(S ). By definition of C
b we do this by assuming e > 0 and showing there exists q ∈ S with
and δ,
δ(q, p) < e. Because p ∈ C(C(S )), there exists u ∈ C(S ) with δ(p, u) < e/2;
and u ∈ C(S ) implies there exists q ∈ S with δ(q, u) < e/2. By M2 and M3,
this implies δ(p, q) < e. End Proof
As you will have guessed by its name, the operator C on behavior predicates
is a special case of the closure operator C on metric spaces. But for now,
forget about behavior predicates and just think about metric spaces.
A set S that, like CD, equals its closure is said to be closed. The following
result shows that for any set S , its closure C(S ) is the smallest closed set
that contains S .
1 centered at the origin. For any metric space M and S ⊆ M , any element
b S ) = 0 that is not in S must be in M \ S and therefore
p of M with δ(p,
must satisfy δ(p, M \ S ) = 0. This shows that the closure of any set S is the
b
union of S and the boundary of S .
A subset S of a metric space M is said to be dense iff C(S ) = M . A
dense set is one that, for any element p of M , contains p or elements of M
arbitrarily close to p. As an example, let’s call a finite-digit real number one
that can be written in decimal notation with a finite number of digits—for
example, 123.5432. The set of all pairs of finite-digit numbers is dense in
the plane because any real number can be approximated arbitrarily closely
with a finite-digit number. Thus, for any pair of real numbers hx , y i and
any e > 0, we can find a pair of finite-digit numbers hp, q i within a distance
e of hx ,√
y i by choosing p and q such that |x − p| and |y − q| are both less
than e/ 2.
Theorem A.4 Any subset S of a metric space equals C(S ) ∩ D for a dense
set D.
Proof: Let M be the metric space and let D equal S ∪ (M \ C(S )). The set
D consists of all elements of M except those elements in the boundary of S
that are not in S . It follows from this that C(S ) ∩ D = S . Since elements
in the boundary of S are a distance 0 from S , which is a subset of D, they
are a distance 0 from D. Therefore all elements in M are a distance 0 from
D, so D is dense. End Proof
What we’re interested in is not the distance function δ, but the closure op-
erator C. Imagine that the plane was an infinite sheet of rubber that was
then stretched and shrunk unevenly in some way. Define the distance be-
tween two points on the original plane to be the distance between them
after the plane was deformed. For example, if the plane was stretched
to make everything twice as far apart in the y direction but the same
distance
p apart in the x direction, then δ(hx 1 , y 1 i, hx 2 , y 2 i) would equal
(x 1 − x 2 )2 + (2 ∗ (y 1 − y 2 ))2 . As long as the stretching and shrinking is
continuous, meaning that the rubber sheet is not torn, the boundary of a
set S in the plane after it is deformed is the set obtained by deforming the
boundary of S . This implies that the new distance function produces the
same closure operator as the ordinary distance function on the plane.
Topology is the study of properties of objects that depend only on a
closure operation, which need not be generated by a metric space. But we
are interested in a closure operator that is generated by a particular kind of
APPENDIX A. MISCELLANY 282
|= F ∨ G = (F ∪ G) |= F ⇒ G = (F ⊆ G)
|= F ∧ G = (F ∩ G) |= F ≡ G = (F = G)
We’re interested in the closure operator on sets of behaviors, which can be
the same for many different distance functions. The property of the distance
APPENDIX A. MISCELLANY 283
function that provides the closure operator we want is that behaviors with
a long prefix in common are close together. More precisely, for two different
behaviors σ and τ , define o(σ, τ ) to be the largest n such that σ and τ
have the same prefix of length n − 1—that is, the largest value n such that
∀ i ∈ 0 . . (n − 1) : σ(i ) = τ (i ). Thus, o(σ, τ ) equals 1 iff σ(0) = τ (0) and
σ(1) 6= τ (1). (There is no such n iff σ = τ , in which case we let o(σ, τ ) = ∞,
where ∞ > i for all i ∈ N.) We get the right closure operator on sets of
behaviors if δ satisfies this property:
Proofs
Most of the proofs here are structured proofs. To understand them, you
should first read Appendix Section A.2.
If the program were described in TLA instead of RTLA, the disjunct Stutter
would be removed from the definition of Next; and Next in the theorem
would be replaced by [Next]v , where v is the tuple hx , t, pc i of variables.
The proof of the theorem would be essentially the same, the only difference
being that the action Stutter would be replaced everywhere by its second
conjunct, which is v 0 = v .
The proof of the theorem is decomposed hierarchically. The first two
levels are determined by the logical structure of the theorem. There are two
standard ways to decompose the proof of a formula of the form F ⇒ G:
284
APPENDIX B. PROOFS 285
|= (∃ v ∈ S : F ∨ G) ≡ (∃ v ∈ S : F ) ∨ (∃ v ∈ S : G)
Steps 3 and 4 are simple enough that there is no need to decompose their
proofs. You should try to understand why these steps, and the others whose
proofs are given here, follow from the facts and definitions mentioned in
their proofs. To help you, a little bit of explanation has been added to some
of the proofs.
We now have to prove steps 1 and 2. They can both be decomposed using
the definition of Inv as a conjunction. We consider the proof of step 1. Here
is the first level of its decomposition.
APPENDIX B. PROOFS 286
1.1. TypeOK 0
1.3. x 0 ≤ NumberDone 0
1.4. Q.E.D.
Proof: By steps 1.1–1.3 and the definition of Inv .
Step 1.2 is the most difficult one to prove, so we examine its proof. The
standard way to prove a formula of this form is to assume i ∈ Procs and
pc 0 (i ) = b and prove t 0 (i ) ≤ NumberDone 0 . So, the first step of the proof
should be a Suffices step asserting that it suffices to make those assump-
tions and prove t 0 (i ) ≤ NumberDone 0 . Thus far, we have used only the
logical structure of the formulas, without thinking about what the formulas
mean. We can go no further that way. To write the rest of the proof of
step 1.2, we have to ask ourselves why an aStep(p) step starting in a state
with Inv true produces a state with t 0 (i ) ≤ NumberDone 0 true.
When I asked myself that question, I realized that the answer depends
on whether or not i is the process p executing the step. That suggested
proving the two cases i 6= p and i = p separately, asserting them as Case
statements. In figuring out how to write those two proofs, I found that both
of them required proving NumberDone 0 = NumberDone. Moreover, this was
true for the same reason in both cases—namely, that an aStep step of any
process leaves NumberDone unchanged. Therefore, I could prove it once in
a single step that precedes the two Case statements. This produced the
following level-3 proof:
1.2.3. Case: i = p
1.2.4. Case: i 6= p
1.2.5. Q.E.D.
Proof: By steps 1.2.3 and 1.2.4.
APPENDIX B. PROOFS 287
This leaves three steps to prove. Here is the proof of step 1.2.4, which I
think is the most interesting one.
1. Assume: F is a property.
Prove: C(F ) is a property.
1.1. Suffices: Assume: σ is a behavior.
Prove: σ satisfies C(F ) iff \σ does.
Proof: By definition of a property, it suffices to show that C(F ) is SI.
By definition of SI, it suffices to assume σ is a behavior and show σ
satisfies C(F ) iff \σ does.
1.2. Assume: σ satisfies C(F ).
Prove: \σ satisfies C(F ).
Proof: By definition of C (Section 4.1.3), it suffices to assume ρ is
a nonempty finite prefix of \σ and show it is a prefix of a behavior
satisfying F . Since ρ is a prefix of \σ, it equals \τ for some prefix τ
of σ, so σ satisfies C(F ) implies τ ◦ ν satisfies F for some behavior ν.
Since F is SI and ρ is obtained from τ by removing stuttering steps,
ρ ◦ ν also satisfies F , so ρ is a prefix of a behavior satisfying F .
1.3. Assume: \σ satisfies C(F ).
Prove: σ satisfies C(F ).
Proof: By definition of C, it suffices to show that any finite nonempty
prefix ρ of σ is the prefix of a behavior satisfying F . Since \ρ is a prefix
of \σ, by hypothesis there is a behavior τ such that (\ρ) ◦ τ satisfies F .
Since F is SI and ρ ◦ τ differs from (\ρ) ◦ τ only by stuttering steps, ρ ◦ τ
too satisfies F . Thus ρ is the prefix of a behavior satisfying F .
1.4. Q.E.D.
Proof: By steps 1.1–1.3.
2. C(F ) is a safety predicate.
2.1. Assume: ρ is a prefix of a behavior that satisfies C(F ).
Prove: ρ↑ satisfies C(F ).
2.1.1. Let σ be a behavior such that ρ ◦ σ satisfies F ; let φ(n) be the
sequence of states consisting of n copies of the final state of ρ, for
any n ∈ N; and let τ (n) equal ρ ◦ φ(n) ◦ σ. Then τ (n) satisfies
F for all n ∈ N.
Proof: A behavior σ such that ρ ◦ σ satisfies F exists by the step 2.1
assumption and the definition of C. That τ (n) satisfies F follows
from: (i) \τ (n) equals \(ρ ◦ σ) by definition of φ(n) and τ (n), (ii) ρ ◦ σ
satisfies F , and (iii) F is SI.
2.1.2. Every finite prefix of ρ↑ is a finite prefix of τ (n), for some n.
APPENDIX B. PROOFS 289
1. F is equivalent to C(F ) ∧ L .
Proof: By |= (F ⇒ C(F )) (from Theorem 4.3) and propositional logic.
2. L is a liveness property.
2.1. Suffices: Assume: ρ is a finite behavior.
Prove: ρ is a prefix of a behavior τ satisfying L.
Proof: By definition of liveness, since L is a property because the
operators of propositional logic preserve stuttering insensitivity, and
C(F ) is a property by Theorem 4.3.
2.2. Case: ρ is the prefix of a behavior τ satisfying F .
Proof: By definition of L, if τ satisfies F then it satisfies L.
2.3. Case: ρ is not the prefix of any behavior satisfying F .
Proof: By definition of C(F ), if ρ were the prefix of a behavior satis-
fying C(F ), then it would be the prefix of a behavior satisfying F . The
case assumption therefore implies that any behavior τ having ρ as a
prefix does not satisfy C(F ), so it satisfies ¬C(F ) and therefore satisfies
L by definition of L.
2.4. Q.E.D.
Proof: Steps 2.2 and 2.3 cover all possibilities.
3. Q.E.D.
Proof: By steps 1 and 2.
which by 2
32
3 EhAi iv and the definition of XF implies 3hAi iv , which
by |= Ai ⇒ Q implies 3hQ iv .
3.3. Q.E.D.
Proof: All assumptions in effect at step 3.2 are 2 formulas so, as ex-
plained in Section 4.2.4, we can deduce 2( EhQ iv ⇒ 3hQ iv ) from 3.2.
This and the temporal logic tautology
|= 2(F ⇒ 3G) ⇒ (2 323 F ⇒ 23G)
imply 2323 EhQ iv ⇒ 23Q, which by the step 3.1 assumption 2 32 3 EhQ iv
implies the step 3.1 goal 23Q.
4. Q.E.D.
Proof: By steps 1–3.
Proof sketch: For any behavior σ, let σ|x be the infinite sequence of n-
tuples of values such that σ|x (i ) equals the value of hxi in state σ(i ). The
basic idea is to define S so that the value of y in any state i of a behavior
of S always equals (σ|x )+i for some behavior σ satisfying F , and x always
equals y(0). (Remember that τ is the infinite sequence τ (0) → τ (1) → · · · ,
and τ +i equals τ (i ) → τ (i + 1) → · · · .)
To do this, for any infinite sequence τ of n-tuples of values, we define
F
e (τ ) to equal F (σ) for any behavior σ such that σ|x equals τ . This uniquely
defines F e because, by hypothesis, the value of F (σ) depends only on the
values of the variables x in the behavior σ. Define IsTupleSeq to be the
mapping such that IsTupleSeq(τ ) is true iff τ is an infinite cardinal sequence
of n-tuples of arbitrary values. We then define S by letting:
∆
Init = ∃ τ : ∧ IsTupleSeq(τ ) ∧ F
e (τ )
∧ (y = τ ) ∧ (hxi = τ (0))
∆
Next = (y 0 = Tail (y)) ∧ (hxi0 = y 0 (0))
APPENDIX B. PROOFS 296
With this definition, F (σ) equals true for a behavior σ iff there is a behavior
satisfying S in which the initial value of y is σ|x . Notice that σ is a halting
behavior iff τ ends with an infinite sequence of identical n-tuples. When
y equals that value Tail (y) = y, so hNext ihx,y i equals false and S allows
only stuttering steps from that point on.
Eliminating the conjunct WFhx,y i (Next) allows S to halt even if the
behavior y initially equals σ|x for a non-halting behavior σ that satisfies F .
That makes no difference if F is a safety property, since in that case every
finite prefix of σ also satisfies F . End Proof Sketch
1. Assume: i ∈ I
Prove: hB hi ivh ≡ hB i ∧ (h 0 = exp i )iv
1.1. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i ) ∧ (vh 0 6= vh)
Proof: By the definitions of B hi , Next i , and h. . .i... .
1.2. hB hi ivh ≡ B i ∧ (v 0 6= v ) ∧ (h 0 = exp i )
Proof: By step 1.1, since vh = v ◦ hh i implies (v 0 6= v ) ∧ (vh 0 6= vh) ≡
(v 0 6= v ).
1.3. Q.E.D.
Proof: By step 1.2 and the definition of h. . .iv .
2. Assume: i ∈ I
Prove: EhB hi ivh ≡ EhB i iv
Proof: By step 1 because exp is assumed not to contain h 0 , so rules E3
and E5 of Section 6.4.4.2 imply EhB i ∧ (h 0 = exp i )iv equals EhB i iv .
Define 2
32
3 i to equal 32 if XFi is WF and to equal 23 if XFi is SF.
3. Assume: T h ∧ ∀ i ∈ I : XFivh (B hi )
APPENDIX B. PROOFS 297
Prove: T ∧ ∀ i ∈ I : XFiv (B i )
3.1. Suffices: Assume: (i ∈ I )
Prove: XFiv (B i )
Proof: By Theorem 7.1, T h implies T . Therefore, if suffices to prove
∀ i ∈ I : XFiv (B i ) to prove step 3
3.2. Suffices: Assume: 2323 i EhB i iv
Prove: 23hB i iv
Proof: By (4.14) and (4.23)
3.3. 2
323 i EhB hi ivh
Proof: By the step 3.2 assumption and step 2. (Since step 2 is not in
the scope of any assumptions, it implies 2( EhB hi ivh ≡ EhB i iv ) .)
3.4. 23hB hi ivh
Proof: The step 3 assumption implies XFivh (Bih ), which by step 3.3,
(4.14), and (4.23) implies 23hB hi ivh .
3.5. Q.E.D.
Proof: Step 3.4 and step 1 imply 23hB i ∧(h 0 = exp i )iv , which implies
the goal introduced in step 3.2.
4. Assume: T ∧ ∀ i ∈ I : XFiv (B i )
Prove: ∃ h : T h ∧ ∀ i ∈ I : XFivh (B hi )
4.1. Suffices: Assume: T h ∧ (i ∈ I )
Prove: XFivh (B hi )
Proof: Theorem 7.1 shows that T implies ∃ h : T h . This implies that
to prove T ∧ F implies ∃ h : (T h ∧ G) for any F and G, it suffices
to prove that T ∧ F ∧ T h implies G.2 Thus, the step 4 assumption
shows that to prove the step 4 goal, it suffices to prove T h implies
∀ i ∈ I : XFivh (B hi ), which is asserted by this step’s Assume/Prove.
4.2. Suffices: Assume: 2323 i EhB hi ivh
Prove: 23hB hi ivh
Proof: By (4.14) and (4.23).
4.3. 2
32
3 i EhB i iv
Proof: By the step 4.2 assumption and step 2.
4.4. 23hB i iv
Proof: The step 4 assumption implies XFiv (Bi ), which by step 4.3,
2
To understand this reasoning, convince yourself that it is sound for formulas of ordi-
nary math (not temporal logic) when ∃ is replaced by ∃ .
APPENDIX B. PROOFS 298
state of ρ. Every state of τ is the last state of some finite prefix of τ , and
the safety property F is true of τ iff it is true of every finite prefix of τ , so
F is true of τ iff IF is true of every state of τ . This proves that IF is an
invariant of T h iff T h satisfies F ; and T satisfies F iff T h does, because F
depends only on the variables x. End Proof Sketch
Ai = (i = hx, z, t i0 ) ∧ Next th
∆
A Next thp step removes the first element from p and appends that element
to the end of h. Therefore, the value of h ◦ p remains unchanged throughout
any behavior that satisfies T thp . The value of h ◦ p during a behavior σ
satisfying T thp equals the sequence of values of hx, z, t i in the entire behavior
σ, except that σ may have additional (stuttering) steps that leave hx, z, t i
unchanged.
In any state of a behavior satisfying T thp , the value of (h ◦p)(Len(h)−1)
(the last element in the sequence h of m-tuples of values) is the current value
of hx, z, t i. The variables h and p, together with the mapping Φ contain all
the information needed to define a refinement mapping under which T thp
implements IS . To see how this is done, we need some notation.
For any behavior σ and state expression exp, define σ|exp to be the infinite
sequence of values such that (σ|exp )(i ) equals the value of exp in state σ(i ),
for all i ∈ N. Thus σ|hx,z,t i is the sequence of m-tuples of values of hx, z, t i
in the states of σ. Define the mapping Φ e from sequences of m-tuples of
values to behaviors so that Φ(ρ) e equals Φ(σ) for some behavior σ such
that σ|hx,z,t i = ρ. (It doesn’t matter what values the states of σ assign to
variables other than those in x, z, and t since they don’t affect whether or
not σ satisfies T .) We are assuming that Φ(σ) satisfies IS and Φ(σ) ∼ y σ.
Therefore, for any behavior satisfying T thp , for the value of h ◦p in any state
e ◦ p) satisfies IS and Φ(h
of that behavior, Φ(h e ◦ p) ∼y σ for some behavior
σ such that σ|hx,z,t i = h ◦ p.
To understand how to construct the needed refinement mapping, we con-
sider a simpler version of the theorem that would be true if we were using
RTLA rather than TLA, so we didn’t have the complication introduced by
e ◦ p) would satisfy Φ(h
stuttering insensitivity. In that case, Φ(h e ◦ p) ' y σ
instead of Φ(h ◦ p) ∼y σ. This means that the behavior Φ(σ) satisfying IS is
e
constructed from a behavior σ with σ|hx,z,t i equal to h ◦ p by just changing
the value of the variables y in each state of σ (without adding or removing
stuttering steps). For any state s, let s y be the list of values of the variables
y in that state. For σ to satisfy IS , the values of y in any state σ(i ) of the
behavior σ should equal the values of Φ(σ)(i )y , the values of y in the corre-
APPENDIX B. PROOFS 301
action of T thps that have occurred so far. We can define k in terms of the
values of f and s, or we can simply add k as a history variable to T thps .
However we define it, we can express the corrected version of (B.1) that
handles stuttering insensitivity by:
(B.2) |= (∃ i ∈ I : Ai ) · (∃ j ∈ J : B j ) ≡ (∃ i ∈ I , j ∈ J : Ai · B j )
|= (∀ i ∈ I , j ∈ J : Ai · B j ⇒ B j · Ai ) ⇒ (A · B ⇒ B · A)
A · B ≡ (∃ i ∈ I , j ∈ J : Ai · B j ) by (B.2)
⇒ (∃ i ∈ I , j ∈ J : B j · Ai ) we assume Ai · B j ⇒ B j · Ai for all i and j
≡ B ·A by (B.2), substituting I ← J , J ← I ,
Ai ← B j , and B j ← Ai .
End Proof
implies |= S ⇒ P .
3. Q.E.D.
Proof: By step 2, the assumption that P is a safety property, and The-
orem 4.3.
Bibliography
[1] Martı́n Abadi and Leslie Lamport. An old-fashioned recipe for real
time. ACM Transactions on Programming Languages and Systems,
16(5):1543–1571, September 1994. This paper has an appendix pub-
lished by ACM only online that contains proofs. Other online versions
of the paper might not contain the appendix.
[5] Selma Azaiez, Damien Doligez, Matthieu Lemerre, Tomer Libal, and
Stephan Merz. Proving determinacy of the PharOS real-time operating
system. In Michael Butler, Klaus-Dieter Schewe, Atif Mashkoor, and
Miklós Biró, editors, 5th Intl. Conf. Abstract State Machines, Alloy, B,
TLA, VDM, and Z (ABZ 2016), volume 9675 of LNCS, pages 70–85.
Springer, 2016.
[6] Arthur Bernstein and Paul K. Harter, Jr. Proving real time properties of
programs with temporal logic. In Proceedings of the Eighth Symposium
on Operating Systems Principles, pages 1–11, New York, 1981. ACM.
Operating Systems Review 15, 5.
[7] James E. Burns and Nancy A. Lynch. Bounds on shared memory for
mutual exclusion. Inf. Comput., 107(2):171–184, 1993.
305
BIBLIOGRAPHY 306
[8] Ernie Cohen and Leslie Lamport. Reduction in TLA. In David San-
giorgi and Robert de Simone, editors, CONCUR’98 Concurrency The-
ory, volume 1466 of Lecture Notes in Computer Science, pages 317–331.
Springer-Verlag, 1998.
[11] Laurent Doyen, Goran Frehse, George J. Pappas, and André Platzer.
Verification of hybrid systems. In Edmund M. Clarke, Thomas A. Hen-
zinger, Helmut Veith, and Roderick Bloem, editors, Handbook of Model
Checking, pages 1047–1110. Springer, 2018.
[13] Michael Fischer. Re: Where are you? Email message to Leslie Lam-
port. Arpanet message sent on June 25, 1985 18:56:29 EDT, num-
ber [email protected] (47 lines),
1985.
[16] Aman Goel, Stephan Merz, and Karem A. Sakallah. Towards an auto-
matic proof of the bakery algorithm. In Marieke Huisman and António
Ravara, editors, Formal Techniques for Distributed Objects, Compo-
nents, and Systems, volume 13910 of Lecture Notes in Computer Sci-
ence, pages 21–28. Springer, 2023.
[18] Thomas A. Henzinger, Zohar Manna, and Amir Pnueli. What good
are digital clocks? In Werner Kuich, editor, Automata, Languages
and Programming, 19th International Colloquium, ICALP92, Vienna,
Austria, July 13-17, 1992, Proceedings, volume 623 of Lecture Notes in
Computer Science, pages 545–558. Springer, 1992.
[26] Peter Ladkin, Leslie Lamport, Bryan Olivier, and Denis Roegel. Lazy
caching in TLA. Distributed Computing, 12(2/3):151–174, 1999.
[38] Leslie Lamport. How to write a 21st century proof. Journal of Fixed
Point Theory and Applications, March 2012. DOI: 10.1007/s11784-012-
0071-6.
[42] Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc
Brooker, and Michael Deardeuff. How amazon web services uses formal
methods. Communications of the ACM, 58(4):66–73, April 2015.
[46] Susan Owicki and Leslie Lamport. Proving liveness properties of con-
current programs. ACM Transactions on Programming Languages and
Systems, 4(3):455–495, July 1982.
[47] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agree-
ment in the presence of faults. Journal of the ACM, 27(2):228–234,
April 1980.
[48] Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th
Annual Symposium on the Foundations of Computer Science, pages 46–
57. IEEE, November 1977.
[49] Eric Verhulst, Raymond T. Boute, José Miguel Sampaio Faria, Bernard
H. C. Sputh, and Vitaliy Mezhuyev. Formal Development of a Network-
Centric RTOS. Springer, New York, 2011.
[50] Hagen Völzer and Daniele Varacca. Defining fairness in reactive and
concurrent systems. J. ACM, 59(3):13:1–13:37, 2012.
BIBLIOGRAPHY 310
311
INDEX 312