i roteday d edition
PRINCIPLES®
CONCURRENT
6 DISTRIBUTED
BE Ke
M. BEN-ARIPrinciples of Concurrent and
Distributed Programming
Visit the Principles of Concurrent and Distributed Programming, Second
Edition Companion Website at www.pearsoned.co.uk/ben-ari to find
valuable student learning material including:
‘* Source code for all the algorithms in the book
+ Links to sites where software for studying concurrency may be downloaded.We work with leading authors to develop the
strongest educational materials in computing,
bringing cutting-edge thinking and best learning
practice to a global market.
Under a range of well-known imprints, including
Addison-Wesley, we craft high quality print and
electronic publications which help readers to
understand and apply their content, whether
studying or at work.
To find out more about the complete range of our
publishing, please visit us on the World Wide Web
www.pearsoned.co.ukPrinciples of Concurrent and
Distributed Programming
Second Edition
M. Ben-Ari
vy ADDISON-WESLEY
Harlow, England « London « New York» Boston + San Francisco «
Tolyo Seoul Tape!» New Dei «Cape Town «Madd + Mexico iy « Amsterdam « MPearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE,
England
and Associated Companies throughout the world
Visit us on the World Wide Web at:
www-pearsoned.co.uk
First published 1990
Second edition 2006
© Prentice Hall Europe, 1990
© Mordechai Ben-Ari, 2006
The right of Mordechai Ben-Ari to be identified as author of this work has
been asserted by him in accordance with the Copyright, Designs and Patents Act 1988
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without either the prior written permission of the
publisher or a licence permitting restricted copying in the united Kingdom issued by the
Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London WIT 4LP.
All trademarks used herein are the property of their respective owners. The use of any
trademark in this text does not vest in the author or publisher any trademark ownership
rights in such trademarks, nor does the use of such trademarks imply any affiliation with
or endorsement of this book by such owners.
ISBN-13: 978-0-321-31283-9
ISBN- 10: 0-321-31283-X
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
10987654321
10 09 08 07 06
Printed and bound by Henry Ling Ltd, at the Dorset Press, Dorchester, Dorset
The publisher's policy is to use paper manufactured from sustainable forests.Contents
Preface
1 What is Concurrent Programming?
Ll
1.2
13
14
LS
16
Introduction»... . . beeen eee
Concurrency as abstract parallelism... 2.00.00.
Multitasking . .
‘The terminology of concurrency
Multiple computers
‘The challenge of concurrent programming ...... 2...
2 The Concurrent Programming Abstraction
21
2.2
2.3
24
2.5
2.6
27
2.8
2.9
2.10
211
‘The role of abstraction
Concurrent execution as interleaving of atomic statements . .
Justification of the abstraction... 2. ee es
Arbitrary interleaving... . vce ete eee
Atomic statements . .
Correctness,
Fairness
Machine-code instructions... 2... ......-
Volatile and non-atomic variables... 2... 2. .
The BACT concurrency simulator... 2... ee eee
Concurrency in Ada... 2. ce ee eee eee
3
xi
13
17
19
21
23
24
28
29
31vi
Contents
2.12 Concurrency inJava. oe eee eee es)
2.13 Writing concurrent programs in Promela . : 36
2.14 Supplement: the state diagram for the frog puzzle ........ 37
The Critical Section Problem 45
3.1 Introduction... 2... eee wee 45
3.2 The definition of the problem ............-. see 45
3.3 Firstattempt.. 2... eee eee 48
3.4 Proving correctness with state diagrams . 49
3.5 Correctness of the first attempt 53
3.6 Secondattempt. 0... eee eee eee eee a)
3.7 Thirdattempt 2.0... ee Lo. 37
3.8 Fourthattempt 2... eee eee 58
3.9 Dekker’salgorithm .............-00.00058 ae 60
3.10 Complex atomic statements .........-..-.-.-04- 61
Verification of Concurrent Programs 67
4.1 Logical specification of correctness properties .......... 68
4.2. Inductive proofs of invariants... 0... Lie 69
4.3. Basic concepts of temporal logic... 22.0... eee ee 2
4.4 Advanced concepts of temporal logic. 5... 5... el)
4.5 A deductive proof of Dekker's algorithm 9
4.6 Model checking. 83
4.7 Spin and the Promela modeling language 83
4.8 — Correctness specifications in Spin... 0.2... ee 86
4.9 Choosing a verification technique... 2.2... ee 88
Advanced Algorithms for the Critical Section Problem 93
5.1 The bakeryalgorithm ..............0005 rr)
5.2. The bakery algorithm for N processes... 0.2.0... 005 95
5.3 Less restrictive models of concurrency ...... 2. « wee 96Contents vii
5.4 Fastalgorithms. 6... ee eee 97
5.5 Implementations in Promela... 22... 0.00.02 . 104
6 Semaphores 107
6.1 Processstates 6.2.2.2... bee e eee eee 107
6.2 Definition of the semaphore type ©... 2... eee ee 109
6.3 The critical section problem for two processes 110
64 Semaphore invariants 112
6.5 The critical section problem for N processes .......... 113
6.6 Order of execution problems. 6... 0... eee eee 4
6.7 The producer-consumer problem... 2... 2.000000 0 115
68 Definitions of semaphores... 0.0.22... 0000-22222. 119
69 — The problem of the dining philosophers... ........... 122
6.10 Barz’s simulation of general semaphores... . 126
6.11 Udding’s starvation-free algorithm... 2.2.0.0... eee 129
6.12 Semaphores in BAC]. 2... oe eee . 131
6.13 Semaphoresin Ada ........... bee e eee ee 132
6.14 Semaphores in Java... ... Seve e teen nee 133
6.15 Semaphores in Promela 134
7 Monitors 145
7A Introduction 2.2.2.0... bee e eee eee 145
7.2 Declaring and using monitors... 2... ee ee 146
7.3 Condition variables... eee 147
7.4 — The producer-consumer problem... .........0-5 151
7.5 The immediate resumption requirement. .... 2.2... . 152
7.6 — The problem of the readers and writers 2... 0.0... 000. 154
7.7 Correctness of the readers and writers algorithm 157
7.8 A monitor solution for the dining philosophers 160
7.9 MonitorsinBACI ... 2... 2.0. 0..002002.26.. 162viii
10
1
7.10 Protected objects. 6... eee
7.11 Monitors in Java... . . cee
7.12. Simulating monitors in Promela. . 2... 2...
Channels
8.1 Models for communications... 2... 2...
8.2. Channels .
8.3 Parallel matrix multiplication
8.4 The dining philosophers with channels»... . .
8.5 Channels in Promela. . . .
8.6 Rendezvous ...........4 see
8.7 Remote procedure calls.
Spaces
9.1 TheLindamodel.......... bees
9.2 Expressiveness of the Linda model
9.3 Formal parameters... «
9.4 The master-worker paradigm .
9.5 Implementations of spaces... 2... 0.2200
Distributed Algorithms
10.1. The distributed systems model
10.2 Implementations... 2... 2. bee e eee
10.3. Distributed mutual exclusion .... 2.0.2...
10.4 Correctness of the Ricart-Agrawala algorithm
10.5 The RA algorithm in Promela... 2... 2...
10.6 Token-passing algorithms
10.7 Tokens in virtual trees... .
Global Properties
11.1 Distributed termination
Contents
162
167
173
179
179
181
183
bee 187
188
wee 190
193
197
bee 197
199
200
202
bee 204Contents
11.2. The Dijkstra-Scholten algorithm ......
11.3. Credit-recovery algorithms
11.4 Snapshots
12 Consensus
12.1 Introduction»... 0.2.0... bees
12.2. The problem statement ........
12.3. A one-round algorithm
12.4 The Byzantine Generals algorithm
125° Crash failures 2.2.0...
12.6 Knowledge trees... 2.0.0...
12.7 Byzantine failures with three generals. .
12.8 Byzantine failures with four generals. .
12.9 The flooding algorithm
12.10 The King algorithm . . .
12.11 Impossibility with three generals.
13 Real-Time Systems
13.1 Introduction ............-..
13.2 Definitions .
13.3. Reliability and repeatability... .. .
13.4 Synchronous syst
13.5 Asynchronous systems. 2... 0.0.22.
13.6 Interrupt-driven systems... 0.0... 0.
13.7 Priority inversion and priority inheritance
13.8 The Mars Pathfinder in Spin . .
13.9 Simpson’s four-slot algorithm... 2... .
13.10 The Ravenscar profile... . . .
13.11 UPPAAL .
13.12 Scheduling algorithms for real-time systems .
257
257
258
260
261
263
264
266
2
280
285
285
287
288
290
293
297
299
303
306
309
311
312Contents
The Pseudocode Notation
Review of Mathematical Logic
B.1 The propositional calculus... 2.0... 2. eee
B.2. Induction... 2.20. ..000. ete eee eee
B.3 Proof methods ....... be ee
BA Correctness of sequential programs
Concurrent Programming Problems
Software Tools
D.1 BACLandjBACI.. 2.2.0... beet eee eee
D.2 Spin and jSpin
D3 DAI ..........-- eee eee
Further Reading
Bibliography
Index
323
324,
326
331
339
339
341
345,
349
351
355
Supporting Resources
Visit www.pearsoned.co.uk/ben-ari to find valuable online resources
Companion Website for students
* Source code for all the algorithms in the book
« Links to sites where software for studying concurrency may be downloaded
For instructors
* PDF slides of all diagrams, algorithms and scenarios (with EEX source)
# Answers to exercises
For more information please contact your local Pearson Education sales
representative or visit www.pearsoned.co.uk/ben-ariPreface
Concurrent and distributed programming are no longer the esoteric subjects for
graduate students that they were years ago. Programs today are inherently concur-
rent or distributed, from event-based implementations of graphical user interfaces
to operating and real-time systems (o Internet applications like multiuser games,
chats and ecommerce. Modern programming languages and systems (including
Java, the system most widely used in education) support concurrent and distributed
programming within their standard libraries. These subjects certainly deserve a
central place in computer science education.
What has not changed over time is that concurrent and distributed programs cannot
be “hacked.” Formal methods must be used in their specification and verifica-
tion, making the subject an ideal vehicle to introduce students to formal methods.
Precisely for this reason I find concurrency still intriguing even after forty years’
experience writing programs; I hope you will too.
Thave been very gratified by the favorable response to my previous books Princi-
ples of Concurrent Programming and the first edition of Principles of Concurrent
and Distributed Programming. Several developments have made it advisable to
write a new edition. Surprisingly, the main reason is not any revolution in the prin-
ciples of this subject. While the superficial technology may change, basic concepts
like interleaving, mutual exclusion, safety and liveness remain with us, as have
the basic constructs used to write concurrent programs like semaphores, monitors,
channels and messages. The central problems we try to solve have also remained
with us: critical section, producer-consumer, readers and writers and consensus.
What has changed is that concurrent programming has become ubiquitous, and this,
has affected the choice of language and software technology.
Language: I see no point in presenting the details of any particular language or
system, details that in any case are likely to obscure the principles. For that reason,
Thave decided not to translate the Ada programs from the first edition into Java
programs, but instead to present the algorithms in pseudocode. I believe that the
igh-level pseudocode makes it easier to study the algorithms. For example, in the
xixii Preface
Byzantine Generals algorithm, the pseudocode line:
for all other nodes
is much easier to understand than the Java lines:
for (int i = 0; i < numberOfNodes; i++)
if (i != mylD)
and yet no precision is lost.
In addition, I am returning to the concept of Principles of Concurrent Program-
ming, where concurrency simulators, not concurrent programming languages, are
the preferred tool for teaching and learning. There is simply no way that extreme
scenarios—like the one you are asked to construct in Exercise 2.3—can be demon-
strated without using a simulator.
‘Along with the language-independent development of models and algorithms, ex-
planations have been provided on concurrency in five languages: the Pascal and
C dialects supported by the BACI concurrency simulator, Ada! and Java, and
Promela, the language of the model checker Spin. Language-dependent sections
are marked by +. Implementations of algorithms in these languages are supplied in
the accompanying software archive.
‘A word on the Ada language that was used in the first edition of this book. I
believe that—despite being overwhelmed by languages like C++ and Java—Ada is
Still the best language for developing complex systems. Its support for concurrent
and real-time programming is excellent, in particular when compared with the trials,
and tribulations associated with the concurrency constructs in Java. Certainly, the
protected object and rendezvous are elegant constructs for concurrency, and I have
explained them in the language-independent pseudocode.
Model checking: A truly new development in concurrency that justifies writing
a revised edition is the widespread use of model checkers for verifying concurrent
and distributed programs. The concept of a state diagram and its use in checking
correctness claims is explained from the very start. Deductive proofs continue to
be used, but receive less emphasis than in the first edition. A central place has
been given to the Spin model checker, and I warmly recommend that you use it in
your study of concurrency. Nevertheless, I have refrained from using Spin and its
language Promela exclusively because I realize that many instructors may prefer to
use a mainstream programming language.
"All references to Ada in this book are to Ada 95.Preface xii
T have chosen to present the Spin model checker because, on the one hand, it is
a widely-used industrial-strength tool that students are likely to encounter as soft
ware engineers, but on the other hand, it is very “friendly.” The installation is triv-
ial and programs are written in a simple programming language that can be easily
learned. I have made a point of using Spin to verify all the algorithms in the book,
and I have found this to be extremely effective in increasing my understanding of
the algorithms.
An outline of the book: After an introductory chapter, Chapter 2 describes the
abstraction that is used: the interleaved execution of atomic statements, where the
simplest atomic statement is a single access to a memory location. Short introduc-
tions are given to the various possibilities for studying concurrent programming:
using a concurrency simulator, writing programs in languages that directly support
concurrency, and working with a model checker. Chapter 3 is the core of an in-
troduction to concurrent programming. The critical-section problem is the central
problem in concurrent programming, and algorithms designed to solve the problem
demonstrate in detail the wide range of pathological behaviors that a concurrent
program can exhibit. The chapter also presents elementary verification techniques
that are used to prove correctness.
More advanced material on verification and on algorithms for the critical-section
problem can be found in Chapters 4 and 5, respectively. For Dekker’s algorithm,
we give a proof of freedom from starvation as an example of deductive reasoning
with temporal logic (Section 4.5). Assertional proofs of Lamport’s fast mutual ex-
clusion algorithm (Section 5.4), and Barz’s simulation of general semaphores by
binary semaphores (Section 6.10) are given in full detail: Lamport gave a proof that
is partially operational and Barz’s is fully operational and difficult to follow. Study-
ing assertional proofs is a good way for students to appreciate the care required to
develop concurrent algorithms.
Chapter 6 on semaphores and Chapter 7 on monitors discuss these classical con-
current programming primitives. The chapter on monitors carefully compares the
original construct with similar constructs that are implemented in the programming
languages Ada and Java.
Chapter 8 presents synchronous communication by channels, and generalizations
to rendezvous and remote procedure calls. An entirely different approach discussed
in Chapter 9 uses logically-global data structures called spaces; this was pioneered
in the Linda model, and implemented within Java by JavaSpaces.
‘The chapters on distributed systems focus on algorithms: the critical-section prob-
Jem (Chapter 10), determining the global properties of termination and snapshots
(Chapter 11), and achieving consensus (Chapter 12). The final Chapter 13 gives
an overview of concurrency in real-time systems. Integrated within this chapterxiv Preface
are descriptions of software defects in spacecraft caused by problems with concur-
rency. They are particularly instructive because they emphasize that some software
really does demand precise specification and formal verification.
A summary of the pseudocode notation is given in Appendix A. Appendix B re-
views the elementary mathematical logic needed for verification of concurrent pro-
grams. Appendix C gives a list of well-known problems for concurrent program-
ming. Appendix D describes tools that can be used for studying concurrency: the
BACI concurrency simulator; Spin, a model checker for simulating and verifying
concurrent programs; and DAJ, a tool for constructing scenarios of distributed al-
gorithms, Appendix E contains pointers to more advanced textbooks, as well as
references to books and articles on specialized topics and systems; it also contains
a list of websites for locating the languages and systems discussed in the book.
Audience: The intended audience includes advanced undergraduate and begin-
ning graduate students, as well as practicing software engineers interested in ob-
taining a scientific background in this field. We have taught concurrency sue-
cessfully to high-school students, and the subject is particularly suited to non-
specialists because the basic principles can be explained by adding a very few
constructs to a simple language and running programs on a concurrency simulator.
While there are no specific prerequisites and the book is reasonably self-contained,
a student should be fluent in one or more programming languages and have a bas
knowledge of data structures and computer architecture or operating systems
Advanced topics are marked by “. This includes material that requires a degree of
mathematical maturity.
Chapters I through 3 and the non- parts of Chapter 4 form the introductory core
that should be part of any course. I would also expect a course in concurrency
to present semaphores and monitors (Chapters 6 and 7); monitors are particularly
important because concurrency constructs in modern programming languages are
based upon the monitor concept. The other chapters can be studied more or less
independently of each other.
Exercises: The exercises following each chapter are technical exercises intended
to clarify and expand on the models, the algorithms and their proofs. Classical
problems with names like the sleeping barber and the cigarette smoker appear in
Appendix C because they need not be associated with any particular construct for
synchronization,
Supporting material: The companion website contains an archive with the
source code in various languages for the algorithms appearing in the book. Its
address is:
http://www. pearsoned.co.uk/ben-ariPreface xv
Lecturers will find slides of the algorithms, diagrams and scenarios, both in ready-
to-display PDF files and in ISTX source for modification. The site includes in-
structions for obtaining the answers to the exercises.
Acknowledgements: I would like to thank:
Yifat Ben-David Kolikant for six years of collaboration during which we
learned together how to really teach concurrency;
© Pieter Hartel for translating the examples of the first edition into Promela,
eventually tempting me into learning Spin and emphasizing it in the new
edition;
© Pieter Hartel again and Hans Henrik Lovengreen for their comprehensive
reviews of the manuscript;
© Gerard Holzmann for patiently answering innumerable queries on Spin dur-
ing my development of jSpin and the writing of the book;
© Bill Bynum, Tracy Camp and David Strite for their help during my work on
jBACI;
© Shmuel Schwarz for showing me how the frog puzzle can be used to teach
state diagrams;
© The Helsinki University of Technology for inviting me for a sabbatical dur-
ing which this book was completed.
M. Ben-Ari
Rehovot and Espoo, 20051 What is Concurrent
Programming?
1.1 Introduction
An “ordinary” program consists of data declarations and assignment and control-
flow statements in a programming language. Modern languages include structures
such as procedures and modules for organizing large software systems through ab-
straction and encapsulation, but the statements that are actually executed are still
the elementary statements that compute expressions, move data and change the
flow of control. In fact, these are precisely the instructions that appear in the ma-
chine code that results from compilation. These machine instructions are executed
sequentially on a computer and access data stored in the main or secondary mem-
ories.
A concurrent program is a set of sequential programs that can be executed in paral-
lel. We use the word process for the sequential programs that comprise a concurrent
program and save the term program for this set of processes.
Traditionally, the word parallel is used for systems in which the executions of
several programs overlap in time by running them on separate processors. The
word concurrent is reserved for potential parallelism, in which the executions may,
but need not, overlap; instead, the parallelism may only be apparent since it may
be implemented by sharing the resources of a small number of processors, often
only one. Concurrency is an extremely useful abstraction because we can better
understand such a program by pretending that all processes are being executed in
parallel. Conversely, even if the processes of a concurrent program are actually
executed in parallel on several processors, understanding its behavior is greatly
facilitated if we impose an order on the instructions that is compatible with shared
execution on a single processor. Like any abstraction, concurrent programming is,
important because the behavior of a wide range of real systems can be modeled
and studied without unnecessary detail.
In this book we will define formal models of concurrent programs and study algo-
rithms written in these formalisms. Because the processes that comprise a concur-2 1 What is Concurrent Programming?
rent program may interact, it is exceedingly difficult to write a correct program for
even the simplest problem. New tools are needed to specify, program and verify
these programs. Unless these are understood, a programmer used to writing and
testing sequential programs will be totally mystified by the bizarre behavior that a
concurrent program can exhibit.
Concurrent programming arose from problems encountered in creating real sys-
tems. To motivate the concurrency abstraction, we present a series of examples of
real-world concurrency.
1.2 Concurrency as abstract parallelism
It is difficult to intuitively grasp the speed of electronic devices. The fingers of
a fast typist seem to fly across the keyboard, to say nothing of the impression of
speed given by a printer that is capable of producing a page with thousands of
characters every few seconds. Yet these rates are extremely slow compared to the
time required by a computer to process each character.
As I write, the clock speed of the central processing unit (CPU) of a personal
computer is of the order of magnitude of one gigahertz (one billion times a second).
That is, every nanosecond (one-billionth of a second), the hardware clock ticks
and the circuitry of the CPU performs some operation. Let us roughly estimate
that it takes ten clock ticks to execute one machine language instruction, and ten
instructions to process a character, so the computer can process the character you
typed in one hundred nanoseconds, that is 0.000001 of a second:
oor ooo trv
o 100 200 300 400 500
time (nanoseconds) —
To get an intuitive idea of how much effort is required on the part of the CPU,
let us pretend that we are processing the character by hand. Clearly, we do not
consciously perform operations on the scale of nanoseconds, so we will multiply
the time scale by one billion so that every clock tick becomes a second:
° 100-200» 300400500
time (seconds) +
Thus we need to perform 100 seconds of work out of every billion seconds. How
much is a billion seconds? Since there are 60 x 60 x 24 = 86,400 seconds in a1.2 Concurrency as abstract parallelism 3
day, a billion seconds is 1,000,000,000/86,400 = 11,574 days or about 32 years.
‘You would have to invest 100 seconds every 32 years to process a character, and a
3,000-character page would require only (3,000 x 100)/(60 x 60) ~ 83 hours over
half a lifetime. This is hardly a strenuous job!
The tremendous gap between the speeds of human and mechanical processing on
the one hand and the speed of electronic devices on the other led to the develop-
ment of operating systems which allow I/O operations to proceed “in parallel” with
computation. On a single CPU, like the one in a personal computer, the processing
required for each character typed on a keyboard cannot really be done in parallel
with another computation, but it is possible to “steal” from the other computation
the fraction of a microsecond needed to process the character. As can be seen from
the numbers in the previous paragraph, the degradation in performance will not be
noticeable, even when the overhead of switching between the two computations is
included.
‘What is the connection between concurrency and operating systems that overlap
V/O with other computations? It would theoretically be possible for every program
to include code that would periodically sample the keyboard and the printer to see
if they need to be serviced, but this would be an intolerable burden on programmers
by forcing them to be fully conversant with the details of the operating system. In-
stead, /O devices are designed to interrupt the CPU, causing it to jump to the code
to process a character. Although the processing is sequential, it is conceptually
simpler to work with an abstraction in which the I/O processing performed as the
result of the interrupt is a separate process, executed concurrently with a process
doing another computation. The following diagram shows the assignment of the
CPU to the two processes for computation and 1/0.
Vo —
Computation
tt
start 1/O end 1/0
time >4 1 What is Concurrent Programming?
1.3. Multitasking
Multitasking is a simple generalization from the concept of overlapping /O with a
computation to overlapping the computation of one program with that of another.
Multitasking is the central function of the kernel of all modern operating systems.
A scheduler program is run by the operating system to determine which process
should be allowed to run for the next interval of time. The scheduler can take into
account priority considerations, and usually implements time-slicing, where com-
putations are periodically interrupted to allow a fair sharing of the computational
resources, in particular, of the CPU. You are intimately familiar with multitask-
ing; it enables you to write a document on a word processor while printing another
document and simultaneously downloading a file.
Multitasking has become so useful that modern programming languages support
it within programs by providing constructs for multithreading. ‘Threads enable
the programmer to write concurrent (conceptually parallel) computations within a
single program, For example, interactive programs contain a separate thread for
handling events associated with the user interface that is run concurrently with the
main thread of the computation. It is multithreading that enables you to move the
mouse cursor while a program is performing a computation.
1.4 The terminology of concurrency
The term process is used in the theory of concurrency, while the term thread is
commonly used in programming languages. A technical distinction is often made
between the two terms: a process runs in its own address space managed by the
operating system, while a thread runs within the address space of a single process
and may be managed by a multithreading kernel within the process. ‘The term
thread was popularized by pthreads (POSIX threads), a specification of concur-
rency constructs that has been widely implemented, especially on UNIX systems.
The differences between processes and threads are not relevant for the study of
the synchronization constructs and algorithms, so the term process will be used
throughout, except when discussing threads in the Java language.
The term task is used in the Ada language for what we call a process, and we will
use that term in discussions of the language. The term is also used to denote small
units of work; this usage appears in Chapter 9, as well as in Chapter 13 on real-
time systems where task is the preferred term to denote units of work that are to be
scheduled.1.5 Multiple computers 5
1.5 Multiple computers
The days of one large computer serving an entire organization are long gone. To-
day, computers hide in unforeseen places like automobiles and cameras. In fact,
your personal “computer” (in the singular) contains more than one processor: the
graphics processor is a computer specialized for the task of taking information from
the computer’s memory and rendering it on the display screen. /O and commu-
nications interfaces are also likely to have their own specialized processors. Thus,
in addition to the multitasking performed by the operating systems kernel, parallel
processing is being carried out by these specialized processors.
The use of multiple computers is also essential when the computational task re-
quires more processing than is possible on one computer. Perhaps you have seen
pictures of the “server farms” containing tens or hundreds of computers that are
used by Internet companies to provide service to millions of customers. In fact, the
entire Internet can be considered to be one distributed system working to dissemi-
nate information in the form of email and web pages.
Somewhat less familiar than distributed systems are multiprocessors, which are
systems designed to bring the computing power of several processors to work in
concert on a single computationally-intensive problem. Multiprocessors are exten-
sively used in scientific and engineering simulation, for example, in simulating the
atmosphere for weather forecasting and studying climate.
1.6 The challenge of concurrent programming
The challenge in concurrent programming comes from the need to synchronize
the execution of different processes and to enable them to communicate. If the
processes were totally independent, the implementation of concurrency would only
require a simple scheduler to allocate resources among them. But if an /O process
accepts a character typed on a keyboard, it must somehow communicate it to the
process running the word processor, and if there are multiple windows on a display,
processes must somehow synchronize access to the display so that images are sent
to the window with the current focus.
Itturns out to be extremely difficult to implement safe and efficient synchronization
and communication. When your personal computer “freezes up” or when using one
application causes another application to “crash,” the cause is generally an error in
synchronization or communication. Since such problems are time- and situation-
dependent, they are difficult to reproduce, diagnose and correct.6 1_ What is Concurrent Programming?
The aim of this book is to introduce you to the constructs, algorithms and systems
that are used to obtain correct behavior of concurrent and distributed programs.
The choice of construct, algorithm or system depends critically on assumptions
concerning the requirements of the software being developed and the architecture
of the system that will be used to execute it. This book presents a survey of the
main ideas that have been proposed over the years; we hope that it will enable you
to analyze, evaluate and employ specific tools that you will encounter in the future.
Transition
We have defined concurrent programming informally, based upon your experience
with computer systems. Our goal is to study concurrency abstractly, rather than
a particular implementation in a specific programming language or operating sys-
tem. We have to carefully specify the abstraction that describe the allowable data
structures and operations. In the next chapter, we will define the concurrent pro-
gramming abstraction and justify its relevance. We will also survey languages and
systems that can be used to write concurrent programs.2 The Concurrent Programming
Abstraction
2.1 The role of abstraction
Scientific descriptions of the world are based on abstractions, A living animal is
a system constructed of organs, bones and so on. These organs are composed of
cells, which in tum are composed of molecules, which in turn are composed of
atoms, which in turn are composed of elementary particles. Scientists find it con-
venient (and in fact necessary) to limit their investigations to one level, or maybe
two levels, and to “abstract away” from lower levels. Thus your physician will
listen to your heart or look into your eyes, but he will not generally think about the
molecules from which they are composed, There are other specialists, pharmacolo-
gists and biochemists, who study that level of abstraction, in turn abstracting away
from the quantum theory that describes the structure and behavior of the molecules.
In computer science, abstractions are just as important. Software engineers gener-
ally deal with at most three levels of abstraction:
Systems and libraries Operating systems and libraries—often called Application
Program Interfaces (API)—define computational resources that are available
to the programmer. You can open a file or send a message by invoking the
proper procedure or function call, without knowing how the resource is im-
plemented.
Programming languages A programming language enables you to employ the
computational power of a computer, while abstracting away from the details
of specific architectures.
Instruction sets Most computer manufacturers design and build families of CPUs
which execute the same instruction set as seen by the assembly language
programmer or compiler writer. The members of a family may be imple-
mented in totally different ways—emulating some instructions in software
or using memory for registers—but a programmer can write a compiler for
that instruction set without knowing the details of the implementation.
78 2. The Concurrent Programming Abstraction
Of course, the list of abstractions can be continued to include logic gates and their
implementation by semiconductors, but software engineers rarely, if ever, need to
work at those levels. Certainly, you would never describe the semantics of an
assignment statement like x-y+z in terms of the behavior of the electrons within
the chip implementing the instruction set into which the statement was compiled.
‘Two of the most important tools for software abstraction are encapsulation and
concurrency.
Encapsulation achieves abstraction by dividing a software module into a public
specification and a hidden implementation. The specification describes the avail-
able operations on a data structure or real-world model. The detailed implemen-
tation of the structure or model is written within a separate module that is not
accessible from the outside. Thus changes in the internal data representation and
algorithm can be made without affecting the programming of the rest of the system.
Modern programming languages directly support encapsulation.
Concurrency is an abstraction that is designed to make it possible to reason about
the dynamic behavior of programs. This abstraction will be carefully explained
in the rest of this chapter. First we will define the abstraction and then show
how to relate it to various computer architectures. For readers who are famil-
iar with machine-language programming, Sections 2.8-2.9 relate the abstraction
to machine instructions; the conclusion is that there are no important concepts of
concurrency that cannot be explained at the higher level of abstraction, so these
sections can be skipped if desired. The chapter concludes with an introduction
to concurrent programming in various languages and a supplemental section on a
puzzle that may help you understand the concept of state and state diagram.
2.2. Concurrent execution as interleaving of atomic state-
ments
We now define the concurrent programming abstraction that we will study in this
textbook. The abstraction is based upon the concept of a (sequential) process,
which we will not formally define. Consider it as a “normal” program fragment
written in a programming language. You will not be misled if you think of a process
as a fancy name for a procedure or method in an ordinary programming language.
Definition 2.1. A concurrent program consists of a finite set of (sequential) pro-
cesses. The processes are written using a finite set of atomic statements. The
execution of a concurrent program proceeds by executing a sequence of the atomic2.2 Concurrent execution as interleaving of atomic statements 9
statements obtained by arbitrarily interleaving the atomic statements from the pro-
cesses. A computation is an execution sequence that can occur as a result of the
interleaving. Computations are also called scenarios. '
Definition 2.2 During a computation the control pointer of a process indicates the
next statement that can be executed by that process.' Each process has its own
control pointer. 1
Computations are created by interleaving, which merges several statement streams.
teach step during the execution of the current program, the next statement to be
executed will be “chosen” from the statements pointed to by the control pointers
ep of the processes.
pl, rl, p2,ql +—— q2,
tery
\ 2,
tep,
‘Suppose that we have two processes, p composed of statements p1 followed by p2
and q composed of statements q1 followed by q2, and that the execution is started
with the control pointers of the two processes pointing to p1 and q1. Assuming that
the statements are assignment statements that do not transfer control, the possible
scenarios are:
pl+ql+p2+42,
pl+qlq2—p2,
pl+p2+q1+42,
ql+pl+q2+p2,
qlpl+p2+q2,
gl+q2p1p2.
Note that p2+p1—+q1—q2 is not a scenario, because we respect the sequential
execution of each individual process, so that p2 cannot be executed before p1.
"Alternate terms for this concept are instruction pointer and location counter.10 2 The Concurrent Programming Abstraction
We will present concurrent programs in a language-independent form, because the
concepts are universal, whether they are implemented as operating systems calls,
directly in programming languages like Ada or Java,? or in a model specification
language like Promela. The notation is demonstrated by the following trivial two-
process concurrent algorithm:
Algorithm 2.1: Trivial concurrent program
integer n — 0
P 4a
integer kl — 1 integer k2 — 2
pl: ne kl al: ne k2
‘The program is given a title, followed by declarations of global variables, followed
by two columns, one for each of the two processes, which by convention are named
process p and process q. Each process may have declarations of local variables,
followed by the statements of the process. We use the following convention:
Each labeled line represents an atomic statement.
A description of the pseudocode is given in Appendix A.
States
The execution of a concurrent program is defined by states and transitions between
states. Let us first look at these concepts in a sequential version of the above
algorithm:
Algorithm 2.2: Trivial sequential program
integer n — 0
integer kl — 1
integer k2 — 2
pl: ne kl
pa: ne k2
At any time during the execution of this program, it must be in a state defined by
the value of the control pointer and the values of the three variables. Executing a
statement corresponds to making a transition from one state to another. It is clear
that this program can be in one of three states: an initial state and two other states
2The word Java will be used as an abbreviation for the Java programming language.2.2 Concurrent execution as interleaving of atomic statements 11
obtained by executing the two statements, This is shown in the following diagram,
where a node represents a state, arrows represent the transitions, and the initial
state is pointed to by the short arrow on the left:
(end)
kKL=1,k2=2
n=2
Consider now the trivial concurrent program Algorithm 2.1. There are two pro-
cesses, so the state must include the control pointers of both processes. Further-
more, in the initial state there is a choice as to which statement to execute, so there
are two transitions from the initial state.
pl:n
. The statement executed
must be one of those pointed to by a control pointer ins). l
Definition 2.5 A state diagram is a graph defined inductively. The initial state
diagram contains a single node labeled with the initial state. If state 5; labels a
node in the state diagram, and if there is a transition from sy to s2, then there is a
node labeled s2 in the state diagram and a directed edge from s1 to s2.
For each state, there is only one node labeled with that state.
The set of reachable states is the set of states in a state diagram. '
It follows from the definitions that a computation (scenario) of a concurrent pro-
gram is represented by a directed path through the state diagram starting from the
initial state, and that all computations can be so represented. Cycles in the state
diagram represent the possibility of infinite computations in a finite graph.
The state diagram for Algorithm 2.1 shows that there are two different scenarios,
each of which contains three of the five reachable states.
Before proceeding, you may wish to read the supplementary Section 2.14, which
describes the state diagram for an interesting puzzle.
Scenarios
A scenario is defined by a sequence of states. Since diagrams can be hard to draw,
especially for large programs, it is convenient to use a tabular representation of
scenarios. This is done simply by listing the sequence of states in a table; the
columns for the control pointers are labeled with the processes and the columns for
the variable values with the variable names. The following table shows the scenario
of Algorithm 2.1 corresponding to the lefthand pat
Process p_| Processq [n] ki] k2
pl: neki | qi: ne-k2 [0] 1 | 2
(end) al: n-k2 {1 | 1 [2
(end) (end) 2{1 [22.3 Justification of the abstraction 13
Ina state, there may be more than one statement that can be executed. We use bold
font to denote the statement that was executed to get to the state in the following
row.
Rows represent states. If the statement executed is an assignment
statement, the new value that is assigned to the variable is a component of
the next state in the scenario, which is found in the next row.
At first this may be confusing, but you will soon get used to it.
2.3 Justification of the abstraction
Clearly, it doesn’t make sense to talk of the global state of a computer system,
or of coordination between computers at the level of individual instructions. The
electrical signals in a computer travel at the speed of light, about 2 x 108 m/sec,*
and the clock cycles of modern CPUs are at least one gigahertz, so information
cannot travel more than 2 x 108 - 10-® = 0.2 m during a clock cycle of a CPU.
There is simply not enough time to coordinate individual instructions of more than
one CPU.
Nevertheless, that is precisely the abstraction that we will use! We will assume
that we have a “bird’s-eye” view of the global state of the system, and that a state-
ment of one process executes by itself and to completion, before the execution of a
statement of another process commences.
It is a convenient fiction to regard the execution of a concurrent program as being
carried out by a global entity who at each step selects the process from which the
next statement will be executed. The term interleaving comes from this image: just
as you might interleave the cards from several decks of playing cards by selecting
cards one by one from the decks, so we regard this entity as interleaving statements
by selecting them one by one from the processes. The interleaving is arbitrary,
that is—with one exception to be discussed in Section 2.7—Wwe do not restrict the
choice of the process from which the next statement is taken.
The abstraction defined is highly artificial, so we will spend some time justifying
it for various possible computer architectures.
“The speed of light in a metal like copper is much less than it is in a vacuum,14 2. The Concurrent Programming Abstraction
Multitasking systems
Consider the case of a concurrent program that is being executed by multitasking,
that is, by sharing the resources of one computer. Obviously, with a single CPU
there is no question of the simultaneous execution of several instructions. The se-
lection of the next instruction to execute is carried out by the CPU and the operating
system. Normally, the next instruction is taken from the same process from which
the current instruction was executed; occasionally, interrupts from I/O devices or
internal timers will cause the execution to be interrupted. A new process called an
interrupt handler will be executed, and upon its completion, an operating system
function called the scheduler may be invoked to select a new process to execute.
This mechanism is called a context switch. The diagram below shows the memory
divided into five segments, one for the operating system code and data, and four
for the code and data of the programs that are running concurrently:
R! Operating
: System
Ri Ri R Ri
e: Program 1 |e: Program 2 |e: Program 3 |e: Program 4
&: g & g:
R:
e: CPU
ge
When the execution is interrupted, the registers in the CPU (not only the registers
used for computation, but also the control pointer and other registers that point to
the memory segment used by the program) are saved into a prespecified area in
the program's memory. Then the register contents required to execute the interrupt
handler are loaded into the CPU. At the conclusion of the interrupt processing,
the symmetric context switch is performed, storing the interrupt handler registers
and loading the registers for the program. The end of interrupt processing is a
convenient time to invoke the operating system scheduler, which may decide to
perform the context switch with another program, not the one that was interrupted.
In a multitasking system, the non-intuitive aspect of the abstraction is not the in
terleaving of atomic statements (that actually occurs), but the requirement that any
arbitrary interleaving is acceptable. Afier all, the operating system scheduler may
only be called every few milliseconds, so many thousands of instructions will be
executed from each process before any instructions are interleaved from another.
We defer a discussion of this important point to Section 2.4.2.3 Justification of the abstraction 15
Multiprocessor computers
A multiprocessor computer is a computer with more than one CPU. The memory
is physically divided into banks of local memory, each of which can be accessed
only by one CPU, and global memory, which can be accessed by all CPUs:
Global
Memory
cPU cPU cPU
Local Local Local
Memory Memory Memory
If we have a sufficient number of CPUs, we can assign each process to its own
CPU. The interleaving assumption no longer corresponds to reality, since each
CPU is executing its instructions independently. Nevertheless, the abstraction is
useful here.
As long as there is no contention, that is, as long as two CPUs do not attempt
to access the same resource (in this case, the global memory), the computations
defined by interleaving will be indistinguishable from those of truly parallel exe-
cution. With contention, however, there is a potential problem. The memory of
a computer is divided into a large number of cells that store data which is read
and written by the CPU. Eight-bit cells are called bytes and larger cells are called
words, but the size of a cell is not important for our purposes. We want to ask what
might happen if two processors try to read or write a cell simultaneously so that
the operations overlap. The following diagram indicates the problem:
Global memory
(0000 0000 0000 0011|
— ~
[0000 0000 0000 0001) (0000 0000 0000 0010;
Local memory Local memory16 2 The Concurrent Programming Abstraction
It shows 16-bit cells of local memory associated with two processors; one cell
contains the value 0--- 01 and one contains 0- + 10 = 2. If both processors write
to the cell of global memory at the same time, the value might be undefined; for
example, it might be the value 0--- 11 = 3 obtained by or’ing together the bit
representations of 1 and 2.
In practice, this problem does not occur because memory hardware is designed
so that (for some size memory cell) one access completes before the other com-
mences. Therefore, we can assume that if two CPUs attempt to read or write the
same cell in global memory, the result is the same as if the two instructions were
executed in either order. In effect, atomicity and interleaving are performed by the
hardware
Other less restrictive abstractions have been studied; we will give one example of
an algorithm that works under the assumption that if a read of a memory celll over-
laps a write of the same cell, the read may return an arbitrary value (Section 5.3).
The requirement to allow arbitrary interleaving makes a lot of sense in the case of
a multiprocessor; because there is no central scheduler, any computation resulting
from interleaving may certainly occur.
Distributed systems
A distributed system is composed of several computers that have no global re-
sources; instead, they are connected by communications channels enabling them
to send messages to each other. The language of graph theory is used in discussing
distributed systems; each computer is a node and the nodes are connected by (di-
rected) edges. The following diagram shows two possible schemes for intercon-
necting nodes: on the left, the nodes are fully connected while on the right they are
connected in a ring:
Node Node Node +, Node.
Node Node Node }—+) Node2.4 Arbitrary interleaving 17
In a distributed system, the abstraction of interleaving is, of course, totally false,
since it is impossible to coordinate each node in a geographically distributed sys-
tem. Nevertheless, interleaving is a very useful fiction, because as far as each node
is concerned, it only sees discrete events: it is either executing one of its own
statements, sending a message or receiving a message. Any interleaving of all the
events of all the nodes can be used for reasoning about the system, as long as the
interleaving is consistent with the statement sequences of each individual node and
with the requirement that a message be sent before it is received.
Distributed systems are considered to be distinct from concurrent systems. In a
concurrent system implemented by multitasking or multiprocessing, the global
memory is accessible to all processes and each one can access the memory effi-
ciently. Ina distributed system, the nodes may be geographically distant from each
other, so we cannot assume that each node can send a message directly to all other
nodes. In other words, we have to consider the topology or connectedness of the
system, and the quality of an algorithm (its simplicity or efficiency) may be de-
pendent on a specific topology. A fully connected topology is extremely efficient
in that any node can send a message directly to any other node, but it is extremely
expensive, because for n nodes, we need n-(n— 1) = 1? communications channels.
The ring topology has minimal cost in that any node has only one communications
line associated with it, but it is inefficient, because to send a message from one
arbitrary node to another we may need to have it relayed through up to n ~ 2 other
nodes.
A further difference between concurrent and distributed systems is that the behavior
of systems in the presence of faults is usually studied within distributed systems.
In a multitasking system, hardware failure is usually catastrophic since it affects
all processes, while a software failure may be relatively innocuous (if the process
simply stops working), though it can be catastrophic (if it gets stuck in an infinite
loop at high priority). In a distributed system, while failures can be catastrophic
for single nodes, it is usually possible to diagnose and work around a faulty node,
because messages may be relayed through alternate communication paths. In fact,
the success of the Internet can be attributed to the robustness of its protocols when:
individual nodes or communications channels fail.
2.4 Arbitrary interleaving
We have to justify the use of arbitrary interleavings in the abstraction, What this
means, in effect, is that we ignore rime in our analysis of concurrent programs.
For example, the hardware of our system may be such that an interrupt can occur
only once every millisecond. Therefore, we are tempted to assume that several18 2 The Concurrent Programming Abstraction
thousand statements are executed from a single process before any statements are
executed from another. Instead, we are going to assume that after the execution
of any statement, the next statement may come from any process. What is the
justification for this abstraction?
The abstraction of arbitrary interleaving makes concurrent programs amenable to
formal analysis, and as we shall see, formal analysis is necessary to ensure the
correctness of concurrent programs. Arbitrary interleaving ensures that we only
have to deal with finite or countable sequences of statements a1, dz, a3,.... and
need not analyze the actual time intervals between the statements. The only relation
between the statements is that a; precedes or follows (or immediately precedes or
follows) a). Remember that we did not specify what the atomic statements are,
so you can choose the atomic statements to be as coarse-grained or as fine-grained
as you wish. You can initially write an algorithm and prove its correctness under
the assumption that each function call is atomic, and then refine the algorithm to
assume only that each statement is atomic.
‘The second reason for using the arbitrary interleaving abstraction is that it enables
us to build systems that are robust to modification of their hardware and software.
Systems are always being upgraded with faster components and faster algorithms.
If the correctness of a concurrent program depended on assumptions about time
of execution, every modification to the hardware or software would require that
the system be rechecked for correctness (see [62] for an example). For example,
suppose that an operating system had been proved correct under the assumption
that characters are being typed in at no more than 10 characters per terminal per
second. That is a conservative assumption for a human typist, but it would become
invalidated if the input were changed to come from a communications channel.
‘The third reason is that it is difficult, if not impossible, to precisely repeat the ex-
ecution of a concurrent program. This is certainly true in the case of systems that
accept input from humans, but even in a fully automated system, there will al-
ways be some jitter, that is some unevenness in the timing of events. A concurrent
program cannot be “debugged” in the familiar sense of diagnosing a problem, cor-
recting the source code, recompiling and rerunning the program to check if the bug
still exists. Rerunning the program may just cause it to execute a different scenario
than the one where the bug occurred. The solution is to develop programming and
verification techniques that ensure that a program is correct under all interleavings.2.5 Atomic statements 19
2.5 Atomic statements
The concurrent programming abstraction has been defined in terms of the inter~
leaving of atomic statements. What this means is that an atomic statement is exe-
cuted to completion without the possibility of interleaving statements from another
process, An important property of atomic statements is that if two are executed
“simultaneously,” the result is the same as if they had been executed sequentially
(in either order). The inconsistent memory store shown on page 15 will not occur.
It is important to specify the atomic statements precisely, because the correctness
of an algorithm depends on this specification. We start with a demonstration of the
effect of atomicity on correctness, and then present the specification used in this
book.
Recall that in our algorithms, each labeled line represents an atomic statement.
Consider the following trivial algorithm:
Algorithm 2.3: Atomic assignment statements
integer n — 0
Pp q
pl: nent] qi nendtl
There are two possible scenarios:
Process p__| Processq_ [ni Process p | Processq [a
pl: non+I | ql: nenti [0 pl: nenti | ql: non+tl | 0
(end) qi: n=n+1 | 1 pl: non-+1 | (end) 1
(end) (end) 2 (end) (end) 2
In both scenarios, the final value of the global variable n is 2, and the algorithm is,
a correct concurrent algorithm with respect to the postcondition n = 2.
Now consider a modification of the algorithm, in which each atomic statement
references the global variable n at most once:
Algorithm 2.4: Assignment statements with one global reference
integer n — 0
i q
integer temp integer temp
pl: temp