Multiagent Systems (PDFDrive)
Multiagent Systems (PDFDrive)
Dorigo, Marco, and Marco Colombetti, Robot Shaping: An Experiment in Behavior En-
gineering
Arkin, Ronald C.,Behavior-Based Robotics
Stone, Peter, Layered Learning in Multiagent Systems: A Winning Approach to Robotic
Soccer
Wooldridge, Michael, Reasoning About Rational Agents
Murphy, Robin R., An Introduction to AI Robotics
Mason, Matthew T., Mechanics of Robotic Manipulation
Kraus, Sarit, Strategic Negotiation in Multiagent Environments
Nolfi, Stefano, and Dario Floreano, Evolutionary Robotics: The Biology, Intelligence, and
Technology of Self-Organizing Machines
Siegwart, Roland, and Illah R. Nourbakhsh, Introduction to Autonomous Mobile Robots
Breazeal, Cynthia L., Designing Sociable Robots
Bekey, George A., Autonomous Robots: From Biological Inspiration to Implementation
and Control
Choset, Howie, Kevin M. Lynch, Seth Hutchinson, George Kantor, Wolfram Burgard,
Lydia E. Kavraki, and Sebastian Thrun, Principles of Robot Motion: Theory, Algorithms,
and Implementations
Thrun, Sebastian, Wolfram Burgard, and Dieter Fox, Probabilistic Robotics
Mataric, Maja J., The Robotics Primer
Wellman, Michael P., Amy Greenwald, and Peter Stone, Autonomous Bidding Agents:
Strategies and Lessons from the Trading Agent Competition
Floreano, Dario and Claudio Mattiussi, Bio-Inspired Artificial Intelligence: Theories,
Methods, and Technologies
Sterling, Leon S. and Kuldar Taveter, The Art of Agent-Oriented Modeling
Stoy, Kasper, David Brandt, and David J. Christensen, An Introduction to Self-
Reconfigurable Robots
Lin, Patrick, Keith Abney, and George A. Bekey, editors, Robot Ethics: The Ethical and
Social Implications of Robotics
Weiss, Gerhard, editor, Multiagent Systems, second edition
Multiagent Systems
second edition
All rights reserved. No part of this book may be reproduced in any form by any electronic
or mechanical means (including photocopying, recording, or information storage and re-
trieval) without permission in writing from the publisher.
MIT Press books may be purchased at special quantity discounts for business or sales
promotional use. For information, please email [email protected] or write
to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142.
This book was set in Times Roman by the editor. Printed and bound in the United States
of America.
10 9 8 7 6 5 4 3 2 1
For Sofie, Alina, and Tina
—G. W.
Contents in Brief
Preface xxxv
The Subject of This Book • Main Features of This Book •
Readership and Prerequisites • Changes from the First Edition •
Structure and Chapters • The Exercises • How to Use This Book •
Slides and More – The Website of the Book • Acknowledgments xxxv
2 Multiagent Organizations 51
Virginia Dignum and Julian Padget
Part II Communication 99
3 Agent Communication 101
Amit K. Chopra and Munindar P. Singh
Preface xxxv
The Subject of This Book • Main Features of This Book •
Readership and Prerequisites • Changes from the First Edition •
Structure and Chapters • The Exercises • How to Use This Book •
Slides and More – The Website of the Book • Acknowledgments xxxv
3.4.1 TouringMachines . . . . . . . . . . . . . . . . 38
3.4.2 InteRRaP . . . . . . . . . . . . . . . . . . . . . 40
3.4.3 Sources and Further Reading . . . . . . . . . . 42
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Multiagent Organizations 51
Virginia Dignum and Julian Padget
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 From Intelligent Agents to Multiagent Systems . . . . . . 53
2.2 From Multiagent Systems to Multiagent Organizations . . 55
2.3 Sources of Inspiration . . . . . . . . . . . . . . . . . . . 56
2.3.1 Organization as Structure . . . . . . . . . . . . 56
2.3.2 Organization as Institution . . . . . . . . . . . . 58
2.3.3 Organization as Agent . . . . . . . . . . . . . . 59
2.4 Autonomy and Regulation . . . . . . . . . . . . . . . . . 60
2.5 Example Scenario . . . . . . . . . . . . . . . . . . . . . 62
3 Multiagent Organizations . . . . . . . . . . . . . . . . . . . . . . 62
3.1 Organization Concepts . . . . . . . . . . . . . . . . . . . 64
3.2 Example of Organization Modeling: The OperA Framework 65
3.2.1 The Social Structure . . . . . . . . . . . . . . . 68
3.2.2 The Interaction Structure . . . . . . . . . . . . 70
3.2.3 The Normative Structure . . . . . . . . . . . . 71
3.2.4 The Communication Structure . . . . . . . . . 72
4 Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1 Organizations, Institutions, and Norms . . . . . . . . . . 73
4.2 Events and States . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Obligations, Permission, and Power . . . . . . . . . . . . 77
4.4 Example of Institutional Modeling: InstAL . . . . . . . . 78
4.4.1 The Formal Model . . . . . . . . . . . . . . . . 78
4.4.2 The Conference Scenario . . . . . . . . . . . . 78
5 Agents in Organizations . . . . . . . . . . . . . . . . . . . . . . . 82
6 Evolution of Organizations . . . . . . . . . . . . . . . . . . . . . 85
6.1 Organizational Adaptation . . . . . . . . . . . . . . . . . 86
6.2 Emergent Organizations . . . . . . . . . . . . . . . . . . 87
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Contents xiii
Part II Communication 99
3 Agent Communication 101
Amit K. Chopra and Munindar P. Singh
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
1.1 Autonomy and Its Implications . . . . . . . . . . . . . . . 102
1.2 Criteria for Evaluation . . . . . . . . . . . . . . . . . . . 105
2 Conceptual Foundations of Communication in MAS . . . . . . . 106
2.1 Communicative Acts . . . . . . . . . . . . . . . . . . . . 106
2.2 Agent Communication Primitives . . . . . . . . . . . . . 107
3 Traditional Software Engineering Approaches . . . . . . . . . . . 108
3.1 Choreographies . . . . . . . . . . . . . . . . . . . . . . . 110
3.2 Sequence Diagrams . . . . . . . . . . . . . . . . . . . . . 111
3.3 State Machines . . . . . . . . . . . . . . . . . . . . . . . 112
3.4 Evaluation with Respect to MAS . . . . . . . . . . . . . . 113
4 Traditional AI Approaches . . . . . . . . . . . . . . . . . . . . . 114
4.1 KQML . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2 FIPA ACL . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3 Evaluation with Respect to MAS . . . . . . . . . . . . . . 117
5 Commitment-Based Multiagent Approaches . . . . . . . . . . . . 118
5.1 Commitments . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Commitment Protocol Specification . . . . . . . . . . . . 119
5.3 Evaluation with Respect to MAS . . . . . . . . . . . . . . 121
6 Engineering with Agent Communication . . . . . . . . . . . . . . 122
6.1 Programming with Communications . . . . . . . . . . . . 123
6.2 Modeling Communications . . . . . . . . . . . . . . . . . 124
6.2.1 Business Patterns . . . . . . . . . . . . . . . . 125
6.2.2 Enactment Patterns . . . . . . . . . . . . . . . 125
6.2.3 Semantic Antipatterns . . . . . . . . . . . . . . 126
6.3 Communication-Based Methodologies . . . . . . . . . . . 127
7 Advanced Topics and Challenges . . . . . . . . . . . . . . . . . . 128
7.1 Primacy of Meaning . . . . . . . . . . . . . . . . . . . . 128
7.2 Verifying Compliance . . . . . . . . . . . . . . . . . . . 129
7.3 Protocol Refinement and Aggregation . . . . . . . . . . . 129
7.4 Role Conformance . . . . . . . . . . . . . . . . . . . . . 130
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
xiv Contents
5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
14.1 Two robots and a carriage: a schematic view (left) and a transition
system M0 that models the scenario (right). 647
14.2 Two robots and a carriage: a refined version of our example and a
concurrent game structure (CGS). 651
14.3 Automata-theoretic view of model checking. 661
List of Figures xxxi
14.1 Overview of the complexity results: most are completeness results. 673
Gerhard Weiss
attribute range
number from two upward
uniformity homogeneous . . . heterogeneous
goals contradicting . . . complementary
agents flexibility purely reactive . . . purely deliberative
abilities (sensors,
effectors, cognition) simple . . . advanced
autonomy low . . . high
frequency low . . . high
persistence short-term . . . long-term
level signal-passing . . . knowledge-intensive
language elementary . . . semantically rich
interaction pattern (flow of
data and control) decentralized . . . hierarchical
variability fixed . . . changeable
purpose competitive . . . cooperative
predictability forseeable . . . unforseeable
accessibility
and knowability unlimited . . . limited
environment dynamics fixed . . . variable
diversity poor . . . rich
availability of
resources restricted . . . ample
with single-agent systems) but also from mathematics, logics, game theory, and
other areas. The multiagent systems field is multidisciplinary in nature. Examples
of disciplines to which the field is related are cognitive psychology, sociology,
organization science, economics, and philosophy.
A main reason for the vast interest and attention multiagent systems are receiv-
ing is that they are seen as an enabling technology for applications that rely on dis-
tributed and parallel processing of data, information, and knowledge in complex –
networked, open, and large-scale – computing environments. With advancing
technological progress in interconnectivity and interoperability of computers and
software, such applications are becoming standard in a variety of domains such
as e-commerce, logistics, supply chain management, telecommunication, health
care, and manufacturing. More generally, such applications are characteristic of
several widely recognized computing paradigms known as grid computing, peer-
to-peer computing, pervasive computing, ubiquitous computing, autonomic com-
Preface xxxvii
As far as possible, the chapters are written so they can be understood without
advanced prior knowledge. The main prerequisite for making the most of the book
and for understanding its contents in detail is familiarity with basic concepts of
computer science (especially algorithms and programming) and mathematics (es-
pecially logics and game theory) at the freshman level. Some useful background
in logics and game theory is supplied in Part VI of this book.
The Exercises
At the end of each chapter, exercises of varying difficulty are provided, which
concern relevant theoretical and practical aspects of multiagent systems. The fol-
lowing four levels of difficulty are distinguished to roughly indicate the amount
of effort required for solving the exercises:
• Level 1 Simple test of comprehension or slightly more subtle problem,
solvable within a few hours or days. (Appropriate for Bachelor education)
• http://mitpress.mit.edu/multiagentsystems
The site is intended to provide useful teaching material for students and teachers.
The website starts with lecture slides for the chapters (prepared by the respective
chapter authors) and some other resources such as copies of the figures and a list
of exercises in the book. I hope to extend the supplementary material once this
book is in use.
Teachers, students, and industrial professionals are invited and encouraged to
contribute additional resources based on the contents of the book. Examples of
such additional resources are
Teachers using the book are asked to notify me of their courses (with URLs).
I will maintain an online list of these courses. The material I receive will be
made available on the website so that all readers and the multiagent community
as a whole can benefit from it. Additional resources and related questions can be
mailed to [email protected]
Acknowledgments
This book has been developing for nearly two years (taking into account that the
17 chapters had been prepared in parallel by different authors, this amounts to a
total development period of 34 years invested in this book!). During that time
many people have contributed to the book project. I cannot thank all of them, and
so below I mention those to whom I am particularly indebted. Please also see the
acknowledgments at the end of the individual chapters.
I am most grateful to the contributing authors for their engagement and enthu-
siasm. They not only provided the chapters and chapter slides, but also gave many
xlii Preface
useful comments and suggestions on how the overall quality of the book could be
further improved. In particular, they all put tremendous effort in carefully coor-
dinating the contents of their chapters. Developing a textbook like this, with 31
leading experts in the field being involved as the chapter authors, is an endeavor
that is thrilling and appealing on the one hand but can easily fail for many reasons
on the other. Obviously this book project did not fail, and in large part this is due
to the professionalism and commitment of the involved authors. I enjoyed a lot
working with them on the new edition. Here is my advice to all who think about
editing a comparable multi-author textbook: do it, but only if you have such a
strong author team behind you as I had.
My special thanks also goes to the authors of the first edition: their excellent
chapters laid the foundation for this second edition.
I want to thank Tuomas Sandholm for valuable discussions in the early phase
of this book project on possibilities for covering certain key topics in the best
possible way.
At MIT Press, I am grateful to Marc Lowenthal, James DeWolf and Virginia
Crossman for providing professional guidance, assistance, and support during this
book project whenever necessary. Ada Brunstein provided great support in the
start-up phase of the project. Thanks also to the anonymous reviewers provided
by MIT Press for their useful remarks and comments.
I “misused” numerous family-time evenings and weekends to work on this
book project – my warmest thanks go to Tina, Alina, and Sofie for their patience
and understanding.
Contributing Authors
Rafael H. Bordini
Faculty of Informatics
PUCRS - Pontifical Catholic University of Rio Grande do Sul
Porto Alegre, RS, Brazil
⇒ Chapter 13
Felix Brandt
Institut für Informatik
Technische Universität München
Munich, Germany
⇒ Chapter 6
Amit K. Chopra
Department of Information Engineering and Computer Science
University of Trento
Trento, Italy
⇒ Chapter 3
Vincent Conitzer
Department of Computer Science
Duke University
Durham, NC, USA
⇒ Chapter 6
xliv Contributing Authors
Virginia Dignum
Faculty of Technology, Policy and Management
Delft University of Technology
Delft, The Netherlands
⇒ Chapter 2
Jürgen Dix
Department of Computer Science
Clausthal University of Technology
Clausthal, Germany
⇒ Chapters 13 and 14
Ed Durfee
Computer Science and Engineering Division
University of Michigan
Ann Arbor, MI, USA
⇒ Chapter 11
Edith Elkind
Division of Mathematical Sciences
School of Physical and Mathematical Sciences
Nanyang Technological University
Singapore
⇒ Chapters 8 and 17
Ulle Endriss
Institute for Logic, Language and Computation
University of Amsterdam
Amsterdam, The Netherlands
⇒ Chapter 6
Alessandro Farinelli
Computer Science Department
University of Verona
Verona, Italy
⇒ Chapter 12
Contributing Authors xlv
Shaheen Fatima
Department of Computer Science
Loughborough University
Loughborough, United Kingdom
⇒ Chapter 4
Michael Fisher
Department of Computer Science
University of Liverpool
Liverpool, United Kingdom
⇒ Chapter 14
Nicholas R. Jennings
Electronics and Computer Science
University of Southampton
Southampton, United Kingdom
⇒ Chapters 8 and 12
Kevin Leyton-Brown
Department of Computer Science
University of British Columbia
Vancouver, BC, Canada
⇒ Chapter 7
Evangelos Markakis
Athens University of Economics and Business
Department of Informatics
Athens, Greece
⇒ Chapter 17
Julian Padget
Department of Computer Science
University of Bath
Bath, United Kingdom
⇒ Chapter 2
xlvi Contributing Authors
Lin Padgham
School of Computer Science and Information Technology
RMIT University
Melbourne, Australia
⇒ Chapter 15
Iyad Rahwan
Computing & Information Science Program
Masdar Institute of Science and Technology, Abu Dhabi, UAE, and
Technology & Development Program
Massachusetts Institute of Technology, Cambridge, MA, USA
⇒ Chapters 4 and 5
Talal Rahwan
Electronics and Computer Science
University of Southampton
Southampton, United Kingdom
⇒ Chapter 8
Alex Rogers
Electronics and Computer Science
University of Southampton
Southampton, United Kingdom
⇒ Chapter 12
Jordi Sabater-Mir
IIIA - Artificial Intelligence Research Institute
CSIC - Spanish National Research Council
Catalonia, Spain
⇒ Chapter 9
Yoav Shoham
Computer Science Department
Stanford University
Stanford, CA, USA
⇒ Chapter 7
Contributing Authors xlvii
Munindar P. Singh
Department of Computer Science
North Carolina State University
Raleigh, NC, USA
⇒ Chapter 3
Kagan Tumer
School of MIME
Oregon State University
Corvallis, OR, USA
⇒ Chapter 10
Karl Tuyls
Department of Knowledge Engineering
Maastricht University
Maastricht, The Netherlands
⇒ Chapter 10
Laurent Vercouter
LITIS Laboratory
INSA de Rouen
Saint-Etienne du Rouvray, France
⇒ Chapter 9
Meritxell Vinyals
Computer Science Department
University of Verona
Verona, Italy
⇒ Chapter 12
xlviii Contributing Authors
Michael Winikoff
Department of Information Science
University of Otago
Dunedin, New Zealand
⇒ Chapter 15
Michael Wooldridge
Department of Computer Science
University of Oxford
Oxford, England
⇒ Chapters 1 and 16
Shlomo Zilberstein
Department of Computer Science
University of Massachusetts
Amherst, MA, USA
⇒ Chapter 11
Intelligent Agents
Michael Wooldridge
1 Introduction
Computers are not very good at knowing what to do: every action a computer per-
forms must be explicitly anticipated, planned for, and coded by a programmer. If a
computer program ever encounters a situation that its designer did not anticipate,
then the result is ugly – a system crash at best, loss of life at worst. This mun-
dane fact is at the heart of our relationship with computers. It is so self-evident to
the computer literate that it is rarely mentioned. And yet it comes as a complete
surprise to those programming computers for the first time.
For the most part, we are happy to accept computers as obedient, literal,
unimaginative servants. For many applications, it is entirely acceptable. How-
ever, for an increasingly large number of applications, we require systems that
can decide for themselves what they need to do in order to achieve the objectives
that we delegate to them. Such computer systems are known as agents. Agents
that must operate robustly in rapidly changing, unpredictable, or open environ-
ments, where there is a significant possibility that actions can fail, are known as
intelligent agents, or sometimes autonomous agents. Here are some examples of
recent application areas for intelligent agents:
• When a space probe makes its long flight from earth to the outer planets, a
ground crew is usually required to continually track its progress, and decide
how to deal with unexpected eventualities. This is costly, and if decisions
4 Chapter 1
are required quickly, it is simply not practicable. For these reasons, orga-
nizations such as NASA and the European Space Agency are interested in
the possibility of making probes more autonomous – giving them richer
onboard decision-making capabilities and responsibilities.
• Searching the Internet for the answer to a specific query can be a long and
tedious process. So, why not allow a computer program – an agent – do
searches for us? The agent would typically be given a query that would
require synthesizing pieces of information from various different Internet
information sources. Failure would occur when a particular resource was
unavailable (perhaps due to network failure), or where results could not be
obtained.
This chapter is about intelligent agents. Specifically, it aims to give you an in-
troduction to the main issues associated with the design and implementation of
intelligent agents. After reading it, you will understand:
• what intelligent agents are (and are not), and how agents relate to other
software paradigms – in particular, expert systems and object-oriented pro-
gramming; and
• some of the main approaches that have been advocated for designing and
implementing intelligent agents, the issues surrounding these approaches,
their relative merits, and the challenges that face the agent implementor.
sensors
perception
tion
rva
pe
e
obs
rc
ep
ts
decision
environment
acti
?
ons
action
effectors/
actuators
Figure 1.1: An agent in its environment. The agent takes sensory input in the
form of percepts from the environment, and produces as output actions that affect
it. The interaction is usually an ongoing, non-terminating one.
There are several points to note about this definition. First, the definition refers to
“agents” and not “intelligent agents.” The distinction is deliberate: it is discussed
in more detail below. Second, the definition does not say anything about what type
of environment an agent occupies. Again, this is deliberate: agents can occupy
many different types of environment, as we shall see below. Third, we have not
defined autonomy. Like agency itself, autonomy is a somewhat tricky concept to
tie down precisely. In this chapter, it is used to mean that agents are able to act
without the intervention of humans or other systems: they have control both over
their own internal state and over their behavior. In Section 2.3, we will contrast
agents with the objects of object-oriented programming, and we will elaborate this
point there. In particular, we will see how agents embody a much stronger sense
of autonomy than objects do.
Figure 1.1 gives an abstract, top-level view of an agent. In this diagram, we can
see the action output generated by the agent in order to affect its environment. In
most domains of reasonable complexity, an agent will not have complete control
over its environment. It will have at best partial control, in that it can influence
it. From the point of view of the agent, this means that the same action performed
twice in apparently identical circumstances might appear to have entirely different
effects, and in particular, it may fail to have the desired effect. Thus agents in
all but the most trivial of environments must be prepared for the possibility of
6 Chapter 1
failure. We can sum this situation up formally by saying that environments are
non-deterministic.
Normally, an agent will have a repertoire of actions available to it. This set of
possible actions represents the agent’s effectoric capability: its ability to modify
its environment. Note that not all actions can be performed in all situations. For
example, an action “lift table” is only applicable in situations where the weight
of the table is sufficiently small that the agent can lift it. Similarly, the action
“purchase a Ferrari” will fail if insufficient funds are available to do so. Actions
therefore have preconditions associated with them, which define the possible sit-
uations in which they can be applied.
The key problem facing an agent is that of deciding which of its actions it
should perform in order to best satisfy its delegated objectives. Agent architec-
tures, of which we shall see several examples later in this chapter, are software
architectures for decision-making systems that are embedded in an environment.
The complexity of the action selection process can be affected by a number
of different environmental properties. Russell and Norvig suggest the following
classification of environment properties [55, p. 46]:
perform based only on the current episode – it need not reason about the
interactions between this and future episodes.
The most complex general class of environments are those that are inaccessible,
non-deterministic, non-episodic, dynamic, and continuous.
To summarize, agents are simply computer systems that are capable of au-
tonomous action in some environment in order to meet objectives that are del-
egated to them by us. An agent will typically sense its environment (by physical
sensors in the case of agents situated in part of the real world, or by software sen-
sors in the case of software agents), and will have available a repertoire of actions
that can be executed to modify the environment, which may appear to respond
non-deterministically to the execution of these actions.
• reactivity: intelligent agents are able to perceive their environment, and re-
spond in a timely fashion to changes that occur in it in order to satisfy their
delegated objectives;
• social ability: intelligent agents are capable of interacting with other agents
(and possibly humans) in order to satisfy their design objectives.
These properties are more demanding than they might at first appear. To see why,
let us consider them in turn. First, consider proactiveness: goal-directed behavior.
It is not hard to build a system that exhibits goal-directed behavior – we do it every
Chapter 1 9
attempt to achieve a goal either when it is clear that the procedure will not work,
or when the goal is for some reason no longer valid. In such circumstances, we
want our agent to be able to react to the new situation, in time for the reaction to
be of some use. However, we do not want our agent to be continually reacting,
and hence never focusing on a goal long enough to actually achieve it.
On reflection, it should come as little surprise that achieving a good balance
between goal-directed and reactive behavior is hard. After all, it is comparatively
rare to find humans that do this very well. How many of us have had a manager
who stayed blindly focused on some project long after the relevance of the project
was passed, or it was clear that the project plan was doomed to failure? Sim-
ilarly, how many have encountered managers who seem unable to stay focused
at all, who flit from one project to another without ever managing to pursue a
goal long enough to achieve anything? This problem — of effectively integrating
goal-directed and reactive behavior – is one of the key problems facing the agent
designer. As we shall see, a great many proposals have been made for how to
build agents that can do this.
Finally, let us say something about social ability, the final component of flex-
ible autonomous action as defined here. In one sense, social ability is trivial:
every day, millions of computers across the world routinely exchange information
with both humans and other computers. But the ability to exchange bit streams
is not really social ability. Consider that in the human world, comparatively few
of our meaningful goals can be achieved without the cooperation of other people,
who cannot be assumed to share our goals – in other words, they are themselves
autonomous, with their own agenda to pursue. To achieve our goals in such sit-
uations, we must negotiate and cooperate with others. We may be required to
understand and reason about the goals of others, and to perform actions (such as
paying them money) that we would not otherwise choose to perform in order to
get them to cooperate with us, and achieve our goals. This type of social ability is
much more complex, and much less well understood, in computational terms than
simply the ability to exchange binary information. Social ability in general (and
topics such as negotiation and cooperation in particular) are dealt with elsewhere
in this book, and will not therefore be considered here. In this chapter, we will
be concerned with the decision making of individual intelligent agents in environ-
ments that may be dynamic, unpredictable, and uncertain, but do not contain other
agents.
stops to consider the relative properties of agents and objects, this is perhaps not
surprising.
Objects are defined as computational entities that encapsulate some state, are
able to perform actions, or methods, on this state, and communicate by message-
passing. While there are obvious similarities between agents and objects, there
are also significant differences. The first is in the degree to which agents and
objects are autonomous. Recall that the defining characteristic of object-oriented
programming is the principle of encapsulation – the idea that objects can have
control over their own internal state. In programming languages like JAVA, we can
declare instance variables (and methods) to be private, meaning they are only
accessible from within the object. (We can of course also declare them public,
meaning that they can be accessed from anywhere, and indeed we must do this
for methods so that they can be used by other objects. But the use of public
instance variables is usually considered poor programming style.) In this way, an
object can be thought of as exhibiting autonomy over its state: it has control over
it. But an object does not exhibit control over its behavior. That is, if a method m
is made available for other objects to invoke, then they can do so whenever they
wish – once an object has made a method public, then it subsequently has no
control over whether or not that method is executed. Of course, an object must
make methods available to other objects, or else we would be unable to build a
system out of them. This is not normally an issue, because if we build a system,
then we design the objects that go in it, and they can thus be assumed to share a
“common goal.” But in many types of multiagent systems (in particular, those that
contain agents built by different organizations or individuals), no such common
goal can be assumed. It cannot be taken for granted that an agent i will execute
an action (method) a just because another agent j wants it to – a may not be in
the best interests of i. We thus do not think of agents as invoking methods upon
one another, but rather as requesting actions to be performed. If j requests i to
perform a, then i may perform the action or it may not. The locus of control with
respect to the decision about whether to execute an action is thus different in agent
and object systems. In the object-oriented case, the decision lies with the object
that invokes the method. In the agent case, the decision lies with the agent that
receives the request.
Note that there is nothing to stop us from implementing agents using object-
oriented techniques. For example, we can build some kind of decision making
about whether to execute a method into the method itself, and in this way achieve
a stronger kind of autonomy for our objects. The point is that autonomy of this
kind is not a component of the basic object-oriented model.
The second important distinction between object and agent systems is with re-
spect to the notions of reactive, proactive, social, and autonomous behavior. The
12 Chapter 1
standard object model has nothing whatsoever to say about how to build systems
that integrate these types of behavior. Again, one could object that we can build
object-oriented programs that do integrate these types of behavior. But this ar-
gument misses the point, which is that the standard object-oriented programming
model has nothing to do with these types of behavior.
The third important distinction between the standard object model and our
view of agent systems is that agents are each considered to have their own thread
of control – in the standard object model, there is a single thread of control in
the system. Of course, a lot of work has recently been devoted to concurrency in
object-oriented programming. For example, the JAVA language provides built-in
constructs for multithreaded programming. There are also many programming
languages available that were specifically designed to allow concurrent object-
based programming. But such languages do not capture the idea we have of agents
as autonomous entities. Perhaps the closest that the object-oriented community
comes is in the idea of active objects:
Thus active objects are essentially agents that do not necessarily have the ability
to exhibit flexible autonomous behavior. Objects in the standard object-oriented
sense are simple passive service providers.
To summarize, the traditional view of an object and our view of an agent have
at least three distinctions:
• agents are capable of reactive, proactive, social behavior, and the standard
object model has nothing to say about such types of behavior; and
In what follows, we will use a little light notation to help explain the main ideas.
We use A = {a, a , . . .} to denote the set of possible actions that the agent can
perform, and S = {s, s , . . .} to denote the set of states that the environment can be
in.
Open(valve221)
Temperature(reactor4726, 321)
Pressure(tank776, 28)
It is not difficult to see how formulae such as these can be used to represent en-
vironment properties. The database is the information that the agent has about
its environment. An agent’s database plays a somewhat analogous role to that of
belief in humans. Thus a person might have a belief that valve 221 is open – the
agent might have the predicate Open(valve221) in its database. Of course, just
like humans, agents can be wrong. Thus I might believe that valve 221 is open
when it is in fact closed; the fact that an agent has Open(valve221) in its database
does not mean that valve 221 (or indeed any valve) is open. The agent’s sensors
may be faulty, its reasoning may be faulty, the information may be out of date, or
the interpretation of the formula Open(valve221) intended by the agent’s designer
may be something entirely different.
Let L be the set of sentences of classical first-order logic. The internal state
of a deliberate agent – the agent’s “beliefs” – is then a subset of L, i.e., a set of
formulae of first-order logic. We write Δ, Δ1 , . . . to denote such belief databases.
An agent’s decision-making process is modeled through a set of deduction rules,
ρ. These are simply rules of inference for the logic. We write Δ ρ ϕ if the first-
order formula ϕ can be proved from the database Δ using only the deduction rules
ρ.
The pseudo-code definition of the action selection process for a deliberate
agent is then given in Figure 1.2. This function action(· · · ) takes as input the be-
liefs of the agent (Δ) and deduction rules (ρ) and returns as output either an action
(in which case this is the action selected for execution) or else null (indicating that
no action can be found).
The idea is that the agent programmer will encode the deduction rules ρ and
database Δ in such a way that if a formula Do(a) can be derived, where a is a
term that denotes an action, then a is the best action to perform. Thus, in the first
part of the function (lines (3)–(7)), the agent takes each of its possible actions a
in turn, and attempts to prove the formula Do(a) from its database (passed as a
parameter to the function) using its deduction rules ρ. If the agent succeeds in
proving Do(a), then a is returned as the action to be performed.
What happens if the agent fails to prove Do(a), for all actions a ∈ A? In this
case, it attempts to find an action that is consistent with the rules and database, i.e.,
one that is not explicitly forbidden. In lines (8)–(12), therefore, the agent attempts
to find an action a ∈ A such that ¬Do(a) cannot be derived from its database
using its deduction rules. If it can find such an action, then this is returned as the
action to be performed. If, however, the agent fails to find an action that is at least
16 Chapter 1
consistent, then it returns a special action null (or noop), indicating that no action
has been selected.
In this way, the agent’s behavior is determined by the agent’s deduction rules
(its “program”) and its current database (representing the information the agent
has about its environment).
To illustrate these ideas, let us consider a small example (based on the vacuum
cleaning world example of [55, p. 51]). The idea is that we have a small robotic
agent that will clean up a house. The robot is equipped with a sensor that will tell
it whether it is over any dirt, and a vacuum cleaner that can be used to suck up dirt.
In addition, the robot always has a definite orientation (one of north, south, east,
or west). In addition to being able to suck up dirt, the agent can move forward
one “step” or turn right 90◦ . The agent moves around a room, which is divided
grid-like into a number of equally sized squares (conveniently corresponding to
the unit of movement of the agent). We will assume that our agent does nothing
but clean – it never leaves the room, and, further, we will assume in the interests of
simplicity that the room is a 3 × 3 grid, and the agent always starts in grid square
(0, 0) facing north.
To summarize, our agent can receive a percept dirt (signifying that there is dirt
beneath it), or null (indicating no special information). It can perform any one of
three possible actions: f orward, suck, or turn. The goal is to traverse the room,
continually searching for and removing dirt. See Figure 1.3 for an illustration of
the vacuum world.
Chapter 1 17
dirt
dirt
(0,2) (1,2) (2,2)
First, note that we make use of three simple domain predicates in this exercise:
Now we can move on to the rules ρ that govern our agent’s behavior. The rules
we use have the form
ϕ(. . .) −→ ψ(. . .)
where ϕ and ψ are predicates over some arbitrary list of constants and variables.
The idea being that if ϕ matches against the agent’s database, then ψ can be con-
cluded, with any variables in ψ instantiated.
The first rule deals with the basic cleaning action of the agent: this rule will
take priority over all other possible behaviors of the agent (such as navigation).
reaches (2, 2), it must head back to (0, 0). The rules dealing with the traversal up
to (0, 2) are very simple.
Notice that in each rule, we must explicitly check whether the antecedent of rule
(1.1) fires. This is to ensure that we only ever prescribe one action via the Do(. . .)
predicate. Similar rules can easily be generated that will get the agent to (2, 2),
and once at (2, 2) back to (0, 0).
At this point, let us step back and examine the pragmatics of this logic-based
approach to building agents. Probably the most important point to make is that
a literal, naive attempt to build agents in this way would be more or less entirely
impractical. To see why, suppose we have designed our agent’s rule set ρ such that
for any database Δ, if we can prove Do(a), then a is an optimal action – that is, a
is the best action that could be performed when the environment is as described in
Δ. Then imagine we start running our agent. At time t1 , the agent has generated
some database Δ1 , and begins to apply its rules ρ in order to find which action
to perform. Some time later, at time t2 , it manages to establish Δ1 ρ Do(a) for
some a ∈ A, and so a is the optimal action that the agent could perform at time t1 .
But if the environment has changed between t1 and t2 , then there is no guarantee
that a will still be optimal. It could be far from optimal, particularly if much
time has elapsed between t1 and t2 . If t2 − t1 is infinitesimal – that is, if decision
making is effectively instantaneous – then we could safely disregard this problem.
But in fact, we know that reasoning of the kind our logic-based agents use will
be anything but instantaneous. (If our agent uses classical first-order predicate
logic to represent the environment, and its rules are sound and complete, then
there is no guarantee that the decision-making procedure will even terminate.)
An agent is said to enjoy the property of calculative rationality if and only if
its decision-making apparatus will suggest an action that was optimal when the
decision-making process began. Calculative rationality is clearly not acceptable
in environments that change faster than the agent can make decisions – we shall
return to this point later.
One might argue that this problem is an artifact of the pure logic-based ap-
proach adopted here. There is an element of truth in this. By moving away from
strictly logical representation languages and complete sets of deduction rules, one
can build agents that enjoy respectable performance. But one also loses what is ar-
guably the greatest advantage that the logical approach brings: a simple, elegant,
Chapter 1 19
logical semantics.
There are several other problems associated with the logical approach to
agency. First, there is the problem of “translating” raw data provided by the
agent’s sensors into an internal symbolic form. For many environments, it is
not obvious how the mapping from environment to symbolic form might be re-
alized. For example, the problem of transforming an image to a set of declarative
statements representing that image has been the object of study in AI for decades,
and is still essentially open. Another problem is that actually representing prop-
erties of dynamic, real-world environments is extremely hard. As an example,
representing and reasoning about temporal information – how a situation changes
over time – turns out to be extraordinarily difficult. Finally, as the simple vacuum
world example illustrates, representing even rather simple procedural knowledge
(i.e., knowledge about “what to do”) in traditional logic can be rather unintuitive
and cumbersome.
To summarize, in logic-based approaches to building agents, decision-making
is viewed as deduction. An agent’s “program” – that is, its decision-making strat-
egy – is encoded as a logical theory, and the process of selecting an action reduces
to a problem of proof. Logic-based approaches are elegant, and have a clean (log-
ical) semantics – wherein lies much of their long-lived appeal. But logic-based
approaches have many disadvantages. In particular, the inherent computational
complexity of theorem proving makes it questionable whether agents as theorem
provers can operate effectively in time-constrained environments. Decision mak-
ing in such agents is predicated on the assumption of calculative rationality – the
assumption that the world will not change in any significant way while the agent
is deciding what to do, and that an action that is rational when decision making
begins will be rational when it concludes. The problems associated with repre-
senting and reasoning about complex, dynamic, possibly physical environments
are also essentially unsolved.
proach to agent-building, then this means agents are essentially theorem provers,
employing explicit symbolic reasoning (theorem proving) in order to make deci-
sions. But just because we find logic a useful tool for conceptualizing or spec-
ifying agents, this does not mean that we must view decision making as logical
manipulation. An alternative is to compile the logical specification of an agent into
a form more amenable to efficient decision making. The difference is rather like
the distinction between interpreted and compiled programming languages. The
best-known example of this work is the situated automata paradigm of Rosen-
schein and Kaelbling [54]. A review of the role of logic in intelligent agents may
be found in [59]. Finally, for a detailed discussion of calculative rationality and
the way that it has affected thinking in AI, see [56].
• the idea that intelligent, rational behavior is seen as innately linked to the
environment an agent occupies – intelligent behavior is not disembodied,
but is a product of the interaction the agent maintains with its environment;
• the idea that intelligent behavior emerges from the interaction of various
simpler behaviors.
situation −→ action
The problem we are faced with is that of building an agent control architecture for
each vehicle, so that they will cooperate to collect rock samples from the planet
surface as efficiently as possible. Luc Steels argues that logic-based agents, of the
type we described above, are “entirely unrealistic” for this problem [58]. Instead,
he proposes a solution using the subsumption architecture.
The solution makes use of two mechanisms introduced by Steels. The first
is a gradient field. In order that agents can know in which direction the mother-
ship lies, the mothership generates a radio signal. Now this signal will obviously
weaken as distance to the source increases – to find the direction of the mother-
ship, an agent need therefore only travel “up the gradient” of signal strength. The
signal need not carry any information – it need only exist.
The second mechanism enables agents to communicate with one another. The
characteristics of the terrain prevent direct communication (such as message-
passing), so Steels adopted an indirect communication method. The idea is that
agents will carry “radioactive crumbs,” which can be dropped, picked up, and
detected by passing robots. Thus if an agent drops some of these crumbs in a
particular location, then later, another agent happening upon this location will be
able to detect them. This simple mechanism enables a quite sophisticated form of
cooperation.
The behavior of an individual agent is then built up from a number of behav-
iors, as we indicated above. First, we will see how agents can be programmed to
individually collect samples. We will then see how agents can be programmed to
generate a cooperative solution.
For individual (non-cooperative) agents, the lowest-level behavior (and hence
the behavior with the highest “priority”) is obstacle avoidance. This behavior can
be represented in the rule:
will not return with samples to reinforce it. After a few agents have followed the
trail to find no sample at the end of it, the trail will in fact have been removed.
The modified behaviors for this example are as follows. Obstacle avoidance
(1.6) remains unchanged. However, the two rules determining what to do if car-
rying a sample are modified as follows.
if sense crumbs then pick up 1 crumb and travel down gradient. (1.13)
Finally, the random movement behavior (1.10) remains unchanged. These behav-
iors are then arranged into the following subsumption hierarchy:
• If agents do not employ models of their environment, then they must have
sufficient information available in their local environment for them to de-
termine an acceptable action.
Figure 1.4: The value iteration algorithm for Markov decision processes.
d ∗ (s) = arg max r(s, a) + λ ∑ p(s | s, a)v∗ (s )
a∈A
s ∈S
Now, since at this stage we will have the values for r(s, a), p(s | s, a), and v∗ (s ),
this equation can be directly applied to find the optimal policy d ∗ (s).
• the agent network architecture developed by Pattie Maes [35, 37, 38];
• Schoppers’ universal plans – which are essentially decision trees that can
be used to efficiently determine an appropriate action in any situation [57];
Kaelbling [28] gives a good discussion of the issues associated with developing
resource-bounded rational agents, and proposes an agent architecture somewhat
similar to that developed by Brooks.
Markov decision processes are a huge research topic in computer science and
operations research, going much further than the simple framework and algo-
rithms we have described here. The definitive reference to MDP s and their endless
variations is [44]; a good introduction to Markov models in AI is [30].
that intention — to try to achieve it. For example, you might expect me to apply
to various PhD programs. You would expect to make a reasonable attempt to
achieve the intention. Thus you would expect me to carry out some course of
action that I believed would best satisfy the intention. Moreover, if a course of
action fails to achieve the intention, then you would expect me to try again –
you would not expect me to simply give up. For example, if my first application
for a PhD program is rejected, then you might expect me to apply to alternative
universities.
In addition, once I have adopted an intention, then the very fact of having this
intention will constrain my future practical reasoning. For example, while I hold
some particular intention, I will not entertain options that are inconsistent with
that intention. Intending to become an academic, for example, would preclude the
option of partying every night: the two are mutually exclusive.
Next, intentions persist. If I adopt an intention to become an academic, then I
should persist with this intention and attempt to achieve it. For if I immediately
drop my intentions without devoting resources to achieving them, then I will never
achieve anything. However, I should not persist with my intention for too long —
if it becomes clear to me that I will never become an academic, then it is only
rational to drop my intention to do so. Similarly, if the reason for having an
intention goes away, then it is rational of me to drop the intention. For example,
if I adopted the intention to become an academic because I believed it would be
an easy life, but then discover that I would be expected to actually teach, then the
justification for the intention is no longer present, and I should drop the intention.
Finally, intentions are closely related to beliefs about the future. For example,
if I intend to become an academic, then I should believe that I will indeed become
an academic. For if I truly believe that I will never be an academic, it would be
nonsensical of me to have an intention to become one. Thus if I intend to become
an academic, I should at least believe that there is a good chance I will indeed
become one.
From this discussion, we can see that intentions play a number of important
roles in practical reasoning:
• Intentions persist.
I will not usually give up on my intentions without good reason – they will
persist, typically until either I believe I have successfully achieved them, I
believe I cannot achieve them, or else because the purpose for the intention
is no longer present.
• an agent that does not stop to reconsider sufficiently often will continue
attempting to achieve its intentions even after it is clear that they cannot be
achieved, or that there is no longer any reason for achieving them;
a BDI agent framework called dMARS [32]. They investigate how bold agents
(those that never stop to reconsider) and cautious agents (those that are constantly
stopping to reconsider) perform in a variety of different environments. The most
important parameter in these experiments was the rate of world change, γ. The
key results of Kinny and Georgeff were as follows.
• If γ is low (i.e., the environment does not change quickly) then bold agents
do well compared to cautious ones, because cautious ones waste time recon-
sidering their commitments while bold agents are busy working towards –
and achieving – their goals.
The lesson is that different types of environments require different types of de-
cision strategies. In static, unchanging environments, purely proactive, goal-
directed behavior is adequate. But in more dynamic environments, the ability
to react to changes by modifying intentions becomes more important.
The process of practical reasoning in a BDI agent is summarized in Figure 1.5.
As this figure illustrates, there are seven main components to a BDI agent:
• a set of current beliefs, representing information the agent has about its
current environment;
• a belief revision function (br f ), which takes a perceptual input and the
agent’s current beliefs, and on the basis of these, determines a new set of
beliefs;
sensor
input
belief
revision
beliefs
generate
options
desires
filter
intentions
action
output
It is straightforward to formally define these components. First, let Bel be the set
of all possible beliefs, Des be the set of all possible desires, and Int be the set
of all possible intentions. For the purposes of this chapter, the content of these
sets is not important. (Often, beliefs, desires, and intentions are represented as
logical formulae, perhaps of first-order logic.) Whatever the content of these sets,
it is worth noting that they should have some notion of consistency defined upon
them, so that one can answer the question of, for example, whether having an
intention to achieve x is consistent with the belief that y. Representing beliefs,
Chapter 1 33
br f : 2Bel × P → 2Bel
which on the basis of the current percept and current beliefs determines a new
set of beliefs. Belief revision is out of the scope of this chapter (and indeed this
book), and so we shall say no more about it here.
The option generation function, options, maps a set of beliefs and a set of
intentions to a set of desires.
achieving them exceeds the expected gain associated with successfully achieving
them. Second, it should retain intentions that are not achieved, and that are still
expected to have a positive overall benefit. Finally, it should adopt new intentions,
either to achieve existing intentions, or to exploit new opportunities.
Notice that we do not expect this function to introduce intentions from
nowhere. Thus f ilter should satisfy the following constraint:
execute : 2Int → A
The action selection function action of a BDI agent is then a function that takes
as input a percept (i.e., some raw sensor data, denoted by p), and returns an action;
it is defined by the following pseudo-code.
to have – i.e., deciding what to do) and means-ends reasoning (deciding how to do
it). Intentions play a central role in the BDI model: they provide stability for deci-
sion making, and act to focus the agent’s practical reasoning. A major issue in BDI
architectures is the problem of striking a balance between being committed to and
overcommitted to one’s intentions: the deliberation process must be finely tuned
to its environment, ensuring that in more dynamic, highly unpredictable domains,
it reconsiders its intentions relatively frequently – in more static environments,
less frequent reconsideration is necessary.
The BDI model is attractive for several reasons. First, it is intuitive – we all
recognize the processes of deciding what to do and then how to do it, and we
all have an informal understanding of the notions of belief, desire, and intention.
Second, it gives us a clear functional decomposition, which indicates what sorts
of subsystems might be required to build an agent. But the main difficulty, as ever,
is knowing how to efficiently implement these functions.
developed a range of BDI logics, which they use to axiomatize properties of BDI-
based practical reasoning agents [46, 47, 48, 49, 50, 51].
• Horizontal layering
In horizontally layered architectures (Figure 1.6(a)), the software layers are
each directly connected to the sensory input and action output. In effect,
each layer itself acts like an agent, producing suggestions as to what action
to perform.
• Vertical layering
In vertically layered architectures (Figure 1.6(b) and 1.6(c)), sensory input
and action output are each dealt with by at most one layer each.
Figure 1.6: Information and control flows in three types of layered agent architec-
tures (Source: [41, p. 263]).
The introduction of a central control system also introduces a bottleneck into the
agent’s decision making.
order for a vertically layered architecture to make a decision, control must pass
between each different layer. This is not fault tolerant: failures in any one layer
are likely to have serious consequences for agent performance.
In the remainder of this section, we will consider two examples of layered
architectures: Innes Ferguson’s TOURINGMACHINES, and Jörg Müller’s INTER -
RAP. The former is an example of a horizontally layered architecture; the latter is
a (two-pass) vertically layered architecture.
3.4.1 TouringMachines
The TOURINGMACHINES architecture is illustrated in Figure 1.7. As this figure
shows, T OURING M ACHINES consists of three activity producing layers. That is,
each layer continually produces “suggestions” for what actions the agent should
perform. The reactive layer provides a more or less immediate response to
changes that occur in the environment. It is implemented as a set of situation-
action rules, like the behaviors in Brooks’s subsumption architecture (Section 3.2).
These rules map sensor input directly to effector output. The original demonstra-
tion scenario for TOURINGMACHINES was that of autonomous vehicles driving
between locations through streets populated by other similar agents. In this sce-
nario, reactive rules typically deal with functions like obstacle avoidance. For
example, here is an example of a reactive rule for avoiding the kerb (from [15,
Chapter 1 39
p. 59]):
rule-1: kerb-avoidance
if
is-in-front(Kerb, Observer) and
speed(Observer) > 0 and
separation(Kerb, Observer) < KerbThreshHold
then
change-orientation(KerbAvoidanceAngle)
censor-rule-1:
if
entity(obstacle-6) in perception-buffer
then
remove-sensory-record(layer-R, entity(obstacle-6))
This rule prevents the reactive layer from ever knowing about whether
obstacle-6 has been perceived. The intuition is that although the reactive
layer will in general be the most appropriate layer for dealing with obstacle avoid-
ance, there are certain obstacles for which other layers are more appropriate. This
rule ensures that the reactive layer never comes to know about these obstacles.
3.4.2 InteRRaP
INTERRAP is an example of a vertically layered two-pass agent architecture – see
Figure 1.8.
As Figure 1.8 shows, INTERRAP contains three control layers, as in TOURING -
MACHINES. Moreover, the purpose of each INTERRAP layer appears to be rather
similar to the purpose of each corresponding TOURINGMACHINES layer. Thus
the lowest (behavior-based) layer deals with reactive behavior; the middle (local
planning) layer deals with everyday planning to achieve the agent’s goals, and the
Chapter 1 41
uppermost (cooperative planning) layer deals with social interactions. Each layer
has associated with it a knowledge base, i.e., a representation of the world ap-
propriate for that layer. These different knowledge bases represent the agent and
its environment at different levels of abstraction. Thus the highest level knowl-
edge base represents the plans and actions of other agents in the environment;
the middle-level knowledge base represents the plans and actions of the agent it-
self; and the lowest level knowledge base represents “raw” information about the
environment. The explicit introduction of these knowledge bases distinguishes
TOURINGMACHINES from INTERRAP.
The way the different layers in INTERRAP conspire to produce behavior is
also quite different from TOURINGMACHINES. The main difference is in the way
the layers interact with the environment. In TOURINGMACHINES, each layer was
directly coupled to perceptual input and action output. This necessitated the in-
troduction of a supervisory control framework, to deal with conflicts or problems
between layers. In INTERRAP, layers interact with each other to achieve the same
end. The two main types of interaction between layers are bottom-up activation
and top-down execution. Bottom-up activation occurs when a lower layer passes
control to a higher layer because it is not competent to deal with the current situ-
ation. Top-down execution occurs when a higher layer makes use of the facilities
provided by a lower layer to achieve one of its goals. The basic flow of control in
INTERRAP begins when perceptual input arrives at the lowest layer in the archi-
tecture. If the reactive layer can deal with this input, then it will do so; otherwise,
bottom-up activation will occur, and control will be passed to the local planning
layer. If the local planning layer can handle the situation, then it will do so, typ-
ically by making use of top-down execution. Otherwise, it will use bottom-up
activation to pass control to the highest layer. In this way, control in INTERRAP
will flow from the lowest layer to higher layers of the architecture, and then back
down again.
The internals of each layer are not important for the purposes of this chapter.
However, it is worth noting that each layer implements two general functions. The
first of these is a situation recognition and goal activation function. This function
acts rather like the options function in a BDI architecture (see Section 3.3). It
maps a knowledge base (one of the three layers) and current goals to a new set
of goals. The second function is responsible for planning and scheduling – it is
responsible for selecting which plans to execute, based on the current plans, goals,
and knowledge base of that layer.
Layered architectures are currently the most popular general class of agent ar-
chitecture available. Layering represents a natural decomposition of functionality:
it is easy to see how reactive, proactive, social behavior can be generated by the
reactive, proactive, and social layers in an architecture. The main problem with
42 Chapter 1
layered architectures is that while they are arguably a pragmatic solution, they
lack the conceptual and semantic clarity of unlayered approaches. In particular,
while logic-based approaches have a clear logical semantics, it is difficult to see
how such a semantics could be devised for a layered architecture. Another issue
is that of interactions between layers. If each layer is an independent activity pro-
ducing process (as in TOURINGMACHINES), then it is necessary to consider all
possible ways that the layers can interact with one another. This problem is partly
alleviated in two-pass vertically layered architectures such as INTERRAP.
4 Conclusions
I hope that after reading this chapter, you understand what agents are and why
they are considered to be an important area of research and development. The
requirement for systems that can operate autonomously is very common. The re-
quirement for systems capable of flexible autonomous action, in the sense that I
have described in this chapter, is similarly common. This leads me to conclude
that intelligent agents have the potential to play a significant role in the future
of software engineering. Intelligent agent research is about the theory, design,
construction, and application of such systems. This chapter has focused on the
design of intelligent agents. It has presented a high-level, abstract view of intel-
ligent agents, and described the sort of properties that one would expect such an
agent to enjoy. It went on to show how this view of an agent could be refined
into various different types of agent architectures — purely logical agents, purely
reactive/behavioral agents, BDI agents, and layered agent architectures.
5 Exercises
1. Level 1 Give other examples of agents (not necessarily intelligent) that you
know of. For each, define as precisely as possible:
Chapter 1 43
(a) the environment that the agent occupies (physical, software, . . . ), the
states that this environment can be in, and whether the environment is:
accessible or inaccessible; deterministic or non-deterministic; episodic
or non-episodic; static or dynamic; discrete or continuous.
(b) the action repertoire available to the agent, and any preconditions as-
sociated with these actions.
(c) the goal, or design objectives, of the agent – what it is intended to
achieve.
2. Level 2 The following few questions refer to the vacuum world example.
Give the full definition (using pseudo-code if desired) of the new function,
which defines the predicates to add to the agent’s database.
3. Level 2 Complete the vacuum world example by filling in the missing rules.
How intuitive do you think the solution is? How elegant is it? How compact
is it?
5. Level 2 If you are familiar with PROLOG, try encoding the vacuum world
example in this language and running it with randomly placed dirt. Make
use of the assert and retract meta-level predicates provided by PRO -
LOG to simplify your system (allowing the program itself to achieve much
of the operation of the next function).
6. Level 2 Develop a solution to the vacuum world example using the sub-
sumption architecture. How does it compare to the logic-based example?
8. Level 3 Suppose that the vacuum world could also contain obstacles, which
the agent needs to avoid. (Imagine it is equipped with a sensor to detect
such obstacles.) Try to adapt the example to deal with obstacle detection
and avoidance. Again, compare a logic-based solution to one implemented
in a traditional (imperative) programming language.
44 Chapter 1
10. Level 3 Try developing a solution to the “distant planet exploration" exam-
ple (see page 21) using the logic-based approach. How does it compare to
the reactive solution?
11. Level 3 In the programming language of your choice, implement the “dis-
tant planet exploration" example using the subsumption architecture. (To do
this, you may find it useful to implement a simple subsumption architecture
“shell” for programming different behaviors.) Investigate the performance
of the two approaches described, and see if you can do better.
12. Level 3 Using the simulator implemented for the preceding question, see
what happens as you increase the number of agents. Eventually, you should
see that overcrowding leads to a suboptimal solution – agents spend too
much time getting out of each other’s way to get any work done. Try to get
around this problem by allowing agents to pass samples to each other, thus
implementing chains. (See the description in [14, p. 305].)
13. Level 3 Read about traditional control theory, and compare the problems
and techniques of control theory to what we are trying to accomplish in
building intelligent agents. How are the techniques and problems of tradi-
tional control theory similar to those of intelligent agent work, and how do
they differ?
References
[1] P. Agre and D. Chapman. PENGI: An implementation of a theory of activity. In
Proceedings of the Sixth National Conference on Artificial Intelligence (AAAI-87),
pages 268–272, Seattle, WA, 1987.
[7] M. E. Bratman. Intention, Plans, and Practical Reason. Harvard University Press:
Cambridge, MA, 1987.
[9] R. A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of
Robotics and Automation, 2(1):14–23, 1986.
46 Chapter 1
[10] R. A. Brooks. Elephants don’t play chess. In P. Maes, editor, Designing Autonomous
Agents, pages 3–15. The MIT Press: Cambridge, MA, 1990.
[17] I. A. Ferguson. Integrated control and coordinated behaviour: A case for agent
models. In M. Wooldridge and N. R. Jennings, editors, Intelligent Agents: Theories,
Architectures, and Languages (LNAI Volume 890), pages 203–218. Springer-Verlag:
Berlin, Germany, January 1995.
[20] M. Fisher. A survey of Concurrent M ETATE M — the language and its applications.
In D. M. Gabbay and H. J. Ohlbach, editors, Temporal Logic — Proceedings of
the First International Conference (LNAI Volume 827), pages 480–505. Springer-
Verlag: Berlin, Germany, July 1994.
Chapter 1 47
[24] M. P. Georgeff and A. S. Rao. A profile of the Australian AI Institute. IEEE Expert,
11(6):89–92, December 1996.
[33] K. Konolige. A Deduction Model of Belief. Pitman Publishing: London and Morgan
Kaufmann: San Mateo, CA, 1986.
48 Chapter 1
[35] P. Maes. The dynamics of action selection. In Proceedings of the Eleventh In-
ternational Joint Conference on Artificial Intelligence (IJCAI-89), pages 991–997,
Detroit, MI, 1989.
[36] P. Maes, editor. Designing Autonomous Agents. The MIT Press: Cambridge, MA,
1990.
[37] P. Maes. Situated agents can have goals. In P. Maes, editor, Designing Autonomous
Agents, pages 49–70. The MIT Press: Cambridge, MA, 1990.
[38] P. Maes. The agent network architecture (ANA). SIGART Bulletin, 2(4):115–120,
1991.
[39] J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of
artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence 4,
pages 463–502. Edinburgh University Press, 1969.
[40] J. P. Müller. The Design of Intelligent Agents (LNAI Volume 1177). Springer-Verlag:
Berlin, Germany, 1997.
[42] J. P. Müller, M. Wooldridge, and N. R. Jennings, editors. Intelligent Agents III (LNAI
Volume 1193). Springer-Verlag: Berlin, Germany, 1997.
[43] N. J. Nilsson. Towards agent programs with circuit semantics. Technical Report
STAN–CS–92–1412, Computer Science Department, Stanford University, Stanford,
CA 94305, January 1992.
[44] M. L. Puterman. Markov Decision Processes. John Wiley & Sons, 1994.
[45] A. S. Rao. AgentSpeak(L): BDI agents speak out in a logical computable language.
In W. Van de Velde and J. W. Perram, editors, Agents Breaking Away: Proceedings
of the Seventh European Workshop on Modelling Autonomous Agents in a Multi-
Agent World, (LNAI Volume 1038), pages 42–55. Springer-Verlag: Berlin, Germany,
1996.
Chapter 1 49
[47] A. S. Rao and M. P. Georgeff. Asymmetry thesis and side-effect problems in linear
time and branching time intention logics. In Proceedings of the Twelfth International
Joint Conference on Artificial Intelligence (IJCAI-91), pages 498–504, Sydney, Aus-
tralia, 1991.
[49] A. S. Rao and M. P. Georgeff. An abstract architecture for rational agents. In C. Rich,
W. Swartout, and B. Nebel, editors, Proceedings of Knowledge Representation and
Reasoning (KR&R-92), pages 439–449, 1992.
[52] R. Reiter. Knowledge in Action. The MIT Press: Cambridge, MA, 2001.
[53] S. Rosenschein and L. P. Kaelbling. The synthesis of digital machines with provable
epistemic properties. In J. Y. Halpern, editor, Proceedings of the 1986 Conference
on Theoretical Aspects of Reasoning About Knowledge, pages 83–98. Morgan Kauf-
mann Publishers: San Mateo, CA, 1986.
[60] M. Wooldridge and N. R. Jennings. Intelligent agents: Theory and practice. The
Knowledge Engineering Review, 10(2):115–152, 1995.
Chapter 2
Multiagent Organizations
1 Introduction
The previous chapter discusses the design of intelligent agents. Each agent is an
individual entity capable of independent action. However, many applications re-
quire the interaction between several individuals in order to realize a certain goal.
Think, for instance, of a logistics system that coordinates transport and storage
of different goods belonging to different owners, using different transportation
forms. Another example is a network of sensors that monitors traffic in a busy
intersection. If we think of each sensor as an intelligent agent, then it is clear
that they should be able to coordinate their activities. Such systems, composed of
several agents, are called Multiagent Systems (MAS).
This chapter builds on the concept of agent put forward in Chapter 1, in which
the agent is seen as situated in an environment in which it may sense and upon
which it may act, and consequently a multiagent system is seen as a collection of
agents that are concurrently sensing and acting on an environment. The premise
of multiagent systems is that an agent can be more effective in the context of
others because it can concentrate on tasks within its competence, delegate other
tasks, and use its ability to communicate, coordinate, and negotiate to achieve
its goals. But how can a collection of agents achieve the desired level of mutual
coordination? Moreover, how can MAS account for global goals, which are not
necessarily assumed by any of the agents?
A multiagent system often cannot be fully described by the sum of all the de-
scriptions of the agents in it. Not only that not all problems can be easily described
52 Chapter 2
in terms of individual mental states, but also, in many cases, situations are better
described based on activities and constraints that characterize the externally ob-
servable behavior of the whole population. Think, for example, of resources that
are collectively owned or shared between or among communities, such as a shared
Internet connection. If each agent follows its own goal of having as much and as
frequent access to the Internet as possible, soon the quality of the connection of
each will be very low because they all will be limiting bandwidth to each other.
This example demonstrates the need to somehow “organize” these agents so that
all benefit (possible solutions are to divide access time equally between them, or
to use a pay-per-minute system, or to make a rooster, etc.). Sometimes, such “or-
ganization” will emerge from the interactions between the agents, but in many
cases it is an entity in itself which has its own goals, that regulates the agents in
the environment.
Since their origin in the 1980s, Multiagent Systems (MAS) have often been
defined as organizations or societies of agents, i.e., as a set of agents that interact
together to coordinate their behavior and often cooperate to achieve some col-
lective goal [28]. The term agent organization has become commonplace within
the MAS community, but is often used to refer to different, even incompatible,
concepts. In short, some take organization as the process of organizing a set of
individuals, whereas others see organization as an entity in itself, with its own
requirements and objectives. These differences are in large part due to the diverse
worldviews and backgrounds of different research fields, namely sociology and
organization theory (OT), on the one hand, and distributed artificial intelligence
(DAI), on the other hand. From a sociological perspective, agent organization is
specified independently of its participants and relates the structure of a (complex)
system to its externally observable global behavior. The artificial intelligence view
on MAS is mostly directed to the study of the mental state of individual agents
and their relation to the overall behavior of the system.
The main focus of this chapter is on the idea of organization as an entity itself,
which is different from the agents in it. Taking an example from human organiza-
tions, think for instance of a university. As an entity, it clearly has its own goals
(e.g., being a place of learning), its plans (e.g., admitting students or producing
research), and its design (e.g., departments, positions, and activities). However,
in itself, the university cannot do anything! In order to achieve its goals, the uni-
versity is dependent on the individuals it will attract to fulfill its positions (i.e.,
lecturers, students, and so on). On the other hand, an agent with the goal of get-
ting a degree in computer science is dependent on the existence of a university
that awards it. So, both agents and organizations are dependent on each other.
This chapter will show that organizations are an essential aspect of multiagent
systems, because they can complement the concept of agents in that they allow
Chapter 2 53
for simplified agent models through reducing uncertainty. This chapter discusses
comprehensive frameworks to build multiagent organizations, based on notions
from human societies, in which we identify roles, assign responsibilities, and
specify permissions – among other things. We will discuss structural and insti-
tutional approaches to multiagent organizations, where the former is mostly con-
cerned with the specification and enactment of organization goals, and the latter
sees organizations as regulative instruments for interaction.
The remainder of this chapter is organized as follows. In Section 2 we set
out the broader context of multiagent systems and organizations, identifying
the drivers for bringing organizational issues into MAS; describe the different
views on organization; and introduce the conference management scenario used
throughout this chapter to illustrate different concepts. In Section 3 we exam-
ine the elements and key properties of organizations, concluding with an exam-
ple of organizational modeling using the OperA framework. This perspective is
complemented in Section 4 by a presentation of institutions and their formal un-
derpinnings, followed by an example of institutional modeling using InstAL. In
Section 5 we move to the agent perspective to consider how organizations appear
from the position of agents. Finally, in Section 6 we examine motivations and
mechanisms for organizational change.
2 Background
In this section, we describe the evolution of the organization perspective in MAS,
and discuss how different sources of inspiration have lead to different approaches
to organization in MAS.
Despite these limitations, sensor networks are able to provide some global ser-
vices.
Research in MAS is concerned with the modeling, analysis, and construction
of a collection of possibly pre-existing, autonomous agents that interact with each
other and their environments. Agents are considered to be autonomous entities,
such as software programs or robots. In MAS, the study of such systems goes
beyond the study of individual intelligence to consider, in addition, problem solv-
ing that has social components. Interactions in MAS can be either cooperative
or selfish. That is, the agents can share a common goal (e.g., an ant colony), or
they can pursue their own interests (as in a free market economy). In coopera-
tive situations, agents collaborate to achieve a common goal, shared between the
agents or, alternatively, the goal of a central designer who is designing the various
agents [63]. Interaction between selfish agents usually uses coordination tech-
niques based on auctions or other resource sharing mechanisms. The definition
of coordination mechanisms between different agent types is a major part of the
MAS, resulting in implementations where social aspects, goals, and behaviors are
part of the architecture of the specific agents [55].
From an engineering perspective, the development of MAS is not straightfor-
ward. Even though it is commonly accepted that “some way of structuring the
society is typically needed to reduce the system’s complexity, to increase the sys-
tem’s efficiency, and to more accurately model the problem being tackled" [39],
in many situations, individuals do not share nor necessarily pursue the same aims
and requirements as the global system or society to which they belong. In these
cases, the view of coordination and control needs to be expanded to consider not
only an agent-centric perspective but also takes a societal focus. However, many
approaches assume a predefined agent type or class when designing MAS. A dis-
advantage is that systems are then closed for agents that are not able to use the
same type of coordination and behavior, and that all global characteristics and
requirements are implemented in the individual agents and not outside them.
The pursuit of open solutions to MAS, which are independent of the architec-
tures of a particular set of agents, has led to the concept of organization-based ap-
proaches to MAS. These can be broadly divided into two main categories: struc-
tural and institutional. Structural approaches see organizations mainly as ways
to provide the means for coordination, which enable the achievement of global
goals. Institutions are mechanisms used to regulate social action by defining and
upholding norms. According to Ostrom, institutions are rules-in-use that structure
organizations and activities [52].
So far, we have discussed organizations, primarily from a design perspective,
as a mechanism that can be crafted to enable coordination between agents. A
complementary view of organizations stems from emergence, that starts from the
Chapter 2 55
assumption that MAS require no real structure or patterns and the outcomes of
interaction are unpredictable. From an emergent perspective, MAS are viewed as
a population of (simple) agents interacting locally with one another and with their
environment. In these systems, organization is not designed but is taken as an
externally observable result of the collective behavior of agents. Communication
is often based on modification of the environment (stigmergy). There is no cen-
tralized control structure dictating how individual agents should behave, meaning
that local interactions between such agents are expected to lead to the emergence
of global behavior. Emergent agent systems are mostly used in the domains of
social simulation, adaptive planning, logistics, and artificial life.
In this chapter, we mostly follow the design view on organization, but will also
discuss the emergence view whenever relevant.
used in MAS, such as hierarchies, coalitions, teams, federations, markets, and so-
cieties. In [37], they give some insight into how they can be used and generated,
and compare their strengths and weaknesses. The use of organizational structures
in MAS is explored further in Section 3.
The first requirement relates to the fact that since an open organization allows
the participation of multiple heterogeneous entities – the number, characteristics,
and architecture of which are unknown to the designer – the design of the organi-
zation cannot be dependent on their individual designs. The second requirement
highlights the fundamental tension between the goals of the organization and the
autonomy of the participating entities. On the one hand, the more detail about
62 Chapter 2
3 Multiagent Organizations
In this section, we present in more detail the structural approach to multiagent
organizations, as introduced in Section 2.3.1.
Organization structures define the formal lines of communication, allocation
of information processing tasks, distribution of decision-making authority, and the
provision of incentives. That is, organizations describe objectives, roles, interac-
tions, and rules in an environment without considering the particular character-
istics of the individuals involved. Organizational objectives are not necessarily
shared by any of the individual participants, but can only be achieved through
their combined action. In order to achieve its goals, it is thus necessary that an
organization employs the relevant agents, and structures their interactions and re-
sponsibilities such that organizational objectives can be realized. The performance
of an organization is therefore determined both by its interaction structures, and
by the individual characteristics of the participating agents.
Chapter 2 63
Agents associated with the persons involved in the process play roles in this
organization, and therefore their behavior can be influenced by those persons.
For example, an author could attempt to review his or her own paper or a PC
member could try to deal with fewer papers than he or she should. As such,
the organization must define the holding norms of behavior, which can, for
instance, include the following: (a) papers must be submitted before the dead-
line; (b) reviewers must submit their evaluations on time; (c) all papers must
be written in English; (d) reviewing must follow a single-blind (or double-
blind) process.
human organizations, agent organizations describe how agents interact with each
other and with the environment.
Implicit in the definition of organizations as instruments of purpose are the
ideas that organizations have goals, or objectives, to be realized and, therefore,
the shape, size, and characteristics of the organization affect its behavior [37].
Objectives of an organization are achieved through the action of the individuals in
the organization, which means that an organization should make sure to employ
the relevant actors, so that it can “enforce" the possibility of making its desires
happen. Note that here an explicit distinction is made between the organization
position, or role, and the actor, or agent. In this way, separation of concerns is
possible according to the requirements described in Section 2.4. Furthermore,
one of the main reasons for creating organizations is efficiency, that is, to provide
the means for coordination that enables the achievement of global goals in an
efficient manner. This means that the actors in the organization need to coordinate
their activities in order to efficiently achieve those objectives.
and validation of OperA organization models (OMs) can be done using the Op-
erettA toolset [3]. OperettA is a combination of tools based on the Eclipse Model-
ing Framework (EMF) and the Graphical Modeling Framework (GMF), integrated
into a single editor. Developed as an Eclipse plug-in, OperettA is fully open source
and follows the model-driven engineering (MDE) principles of tool development.
OperettA is available open source at http://www.operettatool.nl.
The OM provides the overall organization design that fulfills the stakehold-
ers’ requirements. Objectives of an organization are achieved through the action
of agents, which means that, at each moment, an organization should employ the
relevant agents that can make its objectives happen. However, the OM does not
allow for specifying the individual agents. The Social Model (SM) maps orga-
nizational roles to (existing) agents and describes agreements concerning the role
enactment and other conditions in enactment contracts. Finally, the Interaction
Model (IM) describes the run-time interactions between role-enacting agents. The
overall development process is depicted in Figure 2.3.
design
organization agents
model
dynamic instantiation
social model
runtime deployment
interaction
model
Id PC_member
Objectives paper_reviewed(Paper,Report)
Sub-objectives {read(P), report_written(P, Rep), review_received(Org, P, Rep)}
Rights access_confmanagement_system(me)
Norms & PC_member OBLIGED understand_english
Rules PC_member OBLIGED review_paper BEFORE deadline
IF paper_by_colleague THEN PC_member FORBIDDEN review_paper
In this chapter, we focus on the OM that specifies the structure and global char-
acteristics of a domain from an organizational perspective, e.g., how a conference
should be organized, its program, submissions, etc. That is, the OM describes
the means to achieve global objectives. Components of the OM are the social
and interaction structures, in which global goals are specified in terms of roles
and interactions. Moreover, organization specification should include the descrip-
tion of concepts holding in the domain, and of expected or required behaviors.
Therefore, these structures should be linked with the norms, defined in normative
structure, and with the ontologies and communication languages defined in the
communication structure.
The social structure describes the roles and dependencies holding in the organi-
zation. It consists of a list of role definitions, Roles (including their objectives,
rights and requirements), a list of role groups’ definitions, Groups, and a Role De-
pendency’s graph. Examples of roles in the conference scenario are PC member,
program chair, author, etc.
Global objectives form the basis for the definition of the objectives of roles.
From the organization perspective, role descriptions should identify the activities
and services necessary to achieve the organizational objectives and also to make
it possible to abstract from the individuals that will eventually perform the role.
From the agent perspective, roles specify the expectations of the society with re-
spect to the agent’s activity in the society. In OperA, the definition of a role
consists of an identifier, a set of role objectives, possibly sets of sub-objectives
per objective, a set of role rights, a set of norms, and the type of role. An example
of a role description for a PC member in the conference scenario is depicted in
Table 2.2.
Groups provide the means to collectively refer to a set of roles and are used to
specify norms that hold for all roles in the group. Groups are defined by means of
an identifier, a non-empty set of roles, and group norms. An example of a group
Chapter 2 69
in the conference scenario is the organizing team, consisting of the roles program
chair, local organizer, and general chair.
The distribution of objectives in roles is defined by means of the role hierar-
chy. Different criteria can guide the definition of role hierarchy. In particular,
a role can be refined by decomposing it into sub-roles that, together, fulfill the
objectives of the given role.
This refinement of roles defines role dependencies. A dependency graph rep-
resents the dependency relations between roles. Nodes in the graph are roles in
the society. Arcs are labeled with the objectives for which the parent role de-
pends on the child role. Part of the dependency graph for the conference society
is displayed in Figure 2.4.
Conference
society
conference_organized paper_submitted
organizer author
role role
paper_reviewed session_organized
PC session
member chair
role role
paper_presented
presenter
role
For example, the arc between nodes PC-Chair and PC-member represents
the dependency between PC-Chair and PC-member concerning paper-reviewed
(PC_Chair paper_reviewed PC_Member). The way objective g of r1 in a depen-
dency relation r1 g r2 is actually passed to r2 depends on the coordination type
of the society, defined in the architectural templates. In OperA, three types of role
dependencies are identified: bidding, request, and delegation. These dependency
types result in three different interaction possibilities:
Bidding defines market, or auction-like interactions, where the dependent (ini-
tiator) of the dependency asks for proposals from the dependees. Typically,
the best proposal is selected for the achievement of the objective.
70 Chapter 2
Request leads to networks, where roles interact cooperatively toward the achieve-
ment of an objective;
Delegation gives rise to hierarchies, where the dependent of the dependency dele-
gates the responsibility of the achievement of the objective to the dependees
(i.e., subordinates).
assign receive
paper review
PC1 PC1
Assign Review
start end
deadline deadline
assign receive
paper review
PC2 PC2
M
SendCallfor Registration Conference
Participation Sessions
Conference
FormPC Review Paper
start onsite end
Process Acceptance
registration
N
SendCall Paper Workshops
forPapers Submission
ure 2.6). In this diagram, transitions describe a partial ordering of the scenes, plus
eventual synchronization constraints. Note that, at run-time, several scenes can
be happening at the same time and one agent can participate in different scenes
simultaneously. Transitions also describe the conditions for the creation of a new
instance of the scene, and specify the maximum number of scene instances that
are allowed simultaneously. Furthermore, the enactment of a role in a scene may
have consequences in following scenes. Role evolution relations describe the con-
straints that hold for the role-enacting agents as they move from scene to scene; for
example, in the transition between paper acceptance and conference registration,
authors will became participants.
At the highest level of abstraction, norms are the values of a society, in the sense
that they define the concepts that are used to determine the value or utility of sit-
uations. For the conference organization scenario, the desire to share information
and uphold scientific quality can be seen as values. However, values do not specify
how, when, or in which conditions individuals should behave appropriately in any
given social setup. In OperA, these aspects are defined in the normative structure.
72 Chapter 2
In OperA, norms are specified using a deontic logic that is temporal, rela-
tivized (in terms of roles and groups), and conditional [23]. For instance, the
following norm might hold: “The authors should submit their contributions
before the submission deadline" – which can be formalized as, for example,
Oauthor (submit(paper) ≤ Submission_deadline).
Furthermore, in order to check norms and act on possible violations of the
norms by the agents within an organization, abstract norms have to be translated
into actions and concepts that can be handled within such organizations. To do so,
the definition of the abstract norms are iteratively concretized into more concrete
norms, and then translated into specific rules, violations, and sanctions. Concrete
norms are related to abstract norms through a mapping function, based on the
“counts-as” operator as developed in [2]. Norms are further discussed in Section 4.
4 Institutions
The concept of institution – and its realization in multiagent systems – has largely
been inspired by the economic [49] and social [36, 51] perspectives, as discussed
in Section 2.3.2. In contrast, the concept of organization has drawn more on the lit-
erature of organizational structure, management, and business [47]. While the ter-
minology, emphasis, and tools are different, there is substantial overlap in goals –
indeed institutions can be seen to underpin organizations, and it is these connec-
tions that we aim to illustrate in this section.
Chapter 2 73
Institutions are closely tied to the concept of norm – in both its implicit, so-
cial manifestation and its explicit, legal form – and reiterating North [49], they
constitute “the rules of the game in a society, or more formally, the humanly de-
vised constraints that shape social interaction.” Harré and Secord [36] emphasize
the importance of roles in institutions as “that part of the act-action structure pro-
duced by the subset of the rules followed by a particular category of individual,”
and state that “Role is a normative concept, focusing on what is proper for a per-
son in a particular category to do.” These views are also echoed by Ostrom [52],
who describes institutions as “the prescriptions that humans use to organize all
forms of repetitive and structured interaction ... at all scales” and, just as impor-
tantly, observes that individuals “face choices regarding the actions and strategies
they take, leading to consequences for themselves and for others.” All of which
are underlined by the dictionary [53] definitions:
I NSTITUTION An established law, custom, usage, practice, organi-
zation, or other element in the political or social life of a people; a
regulative principle or convention subservient to the needs of an or-
ganized community or the general ends of civilization.
N ORM A standard or pattern of social behavior that is accepted in or
expected of a group.
This has led to the view in multiagent systems research [12, 54, 59, 74] that an
institution is a set of norms, while still encompassing the rich variety of usage
surrounding the term “norm.”
In this section, we first address the relationship between institutions and or-
ganizations, then examine how individual actions affect institutional state through
the concept of conventional generation. Consequently it becomes possible to rec-
ognize particular configurations of the institution, in order to monitor both individ-
ual and collective action, including norm violation and subsequent application of
sanction.We conclude with an illustration of one approach to normative modeling,
taking the conference scenario, using the institutional action language InstAL.
synthesize clone
refined, and in due course may lead to the synthesis of new institutions.
In practice, an agent is likely to be subject to the governance of more than
one institution, certainly concurrently, perhaps even simultaneously – consider
for example a job submitted from one country to a cloud computing facility in
another and the question of which legislation governs the security of which stages
of the computation. It would be surprising, given such a scenario, if the norms
of one institution do not sometimes conflict with another: in the worst case, an
agent may take a norm-compliant action in one institution, only to violate a norm
in another, or vice versa, so that whichever action the agent takes, it may suffer
sanctions. This may seem a pathological case, but neither can it be ruled out: given
that institutions may be developed independently of one another, without knowing
how they may be combined in the future, such conflicts are inevitable and is the
reason why agents need to be able to find out the potential consequences of actions
before taking them.
I NSTITUTION
s1
s2
s0
e0 f act1 e1 f act1
f act1 f act2
f act3
f act3
e
e0
1
→
→
e0
e2
E NVIRONMENT
s0 e0 s1 e1 s2 e2 s3 e3 s4
agent – and the (institutional) state transformer function that takes a set of insti-
tutional facts and an event and produces the consequent institutional state. If the
event is of no relevance for the institution, the institutional state does not change.
If it is relevant, the state does change. For example, the program chair may only
close submission if it was previously opened. The notion of interpreting an action
in the physical world as an institutional action is called conventional generation,
deriving from speech acts [60], action theory [32], and the unification of these
ideas [18], as reflected in the upper part of Figure 2.8.
The attraction of an event-based approach is that it offers a fine-grained view
of the action, enabling a (relatively) forensic focus on details and ordering that
may be critical to system state evolution. In contrast, the attraction of a situation-
based approach is that it abstracts away from the detail of individual events and
orderings (that may be irrelevant), enabling a focus on significant phases in the
evolution of the system state. Both are feasible within a common formal and
computational framework, as described in [43].
The point of such a formalization, from an organizational modeling perspec-
Chapter 2 77
if the institutional action is permitted and empowered for that principal in the cur-
rent state. Power is represented in the institutional state, associating an action
and some principal, and may be added and removed consequent to institutional
actions. In the conference scenario, the program chair is empowered to open
and close submission, for example, whereas a program committee member is not.
But the empowerment to close submission may only be added after submission is
opened.
Normative system N := E, F, G, C, Δ
Events, comprising exogenous, E = Eex ∪ Einst with Einst = Eact ∪ Eviol
(normative) actions and (normative)
violations
Normative facts (fluents): power, F = W∪P∪O∪D
permission, obligations, and
domain-specific facts
Generation relation: maps state and G : X × E → 2Einst
event to a set of events
State formula: the set of positive and X = 2F∪¬F
negative fluents comprising the
current normative state
Consequence relation: maps state and C : X × E → 2F × 2F where
event to a pair (additions, deletions) of C(X, e) = (C↑ (ϕ, e), C↓ (ϕ, e)) where
sets of fluents (i) C↑ (ϕ, e) initiates a fluent
(ii) C↓ (ϕ, e) terminates a fluent
The initial set of fluents Δ
institution and subsequently to institutional events. For example, there is the open-
ing of submission, the closing, the start of reviewing, the end of reviewing, and so
forth. For the sake of the example, we assume the following declarations:
• Person: frank, gerhard, julian, virginia
• Paper: paper01
• Review: review02
where Person, Paper, and Review are types, as shown in Figure 2.10.
All of the physical world events bring about institutional events and begin to
lay the foundations of the model. Once such a set of events has been identified,
two issues become apparent: (i) that order matters: opening comes before closing,
submission comes before review, etc., and (ii) that the identity of actors matters:
actors play roles and only certain roles should cause certain events. For example,
only the conference chair can declare that the conference is open for submissions;
and likewise, paper assignments can only be made by the chair, and then only
subject to certain conditions, such as the reviewer not being a listed author of the
paper. Institutions make explicit this separation of concerns and enable reasoning
about such matters. The keys to dealing with the issues described above are the
twin concepts of permission and power. Physical world events are always em-
powered, but within the institution, power can be given and taken away in order
80 Chapter 2
to ensure that events have (or not) their intended institutional consequences; thus,
for example, closing submission might only be empowered after the opening of
submission. The PC chair can close submission because he or she has the permis-
sion, but the PC chair only has the power if sufficient time has elapsed since the
opening, and he or she may only do it (say) once, as shown by the fragment in
Figure 2.11.
Obligations are used to express that certain events have to take place, e.g.,
the review requires that a review is delivered before the review period closes. To
emphasize that the evolution of the model state is controlled by external events, no
dates and the like are encoded in the model. Instead we use exogenous events that
act as deadlines, i.e., the obligation has to be satisfied before the deadline occurs.
This deadline event can be generated by an agent acting as a timekeeper, as above,
where the program chair declares the review period closed, so any reviews not sent
Chapter 2 81
registerPaper submitPaper
(paper01, virginia) (paper01, virginia)
iregisterP aper isubmitP aper
(paper01, virginia) (paper01, virginia)
ij−1 ij ik
assignReviewer sendReview
(paper01, gerhard, frank) (paper01, gerhard, review02)
iassignReviewer isendReview
(paper01, gerhard) (paper01, gerhard, review02)
il im im+1
5 Agents in Organizations
An important challenge in agent organizations is the specification of mechanisms
through which agents can evaluate the characteristics and objectives of organi-
zational roles, in order to decide about participation. In particular, an agent has
to reason about whether it wants to play a role and whether it has the capabili-
ties to behave as the role requires [4]. This is a complex issue and an open area
of research. To counter this problem, in many situations, agents are designed
from scratch so that their behavior complies with that of the organization. In
Chapter 2 83
such systems, the organizational model is often implicit in the specification of the
agents. However, comprehensive approaches to organizations cannot assume that
all agents are known at design time, but require organization-aware agents, that
is, agents that are able to reason about their own objectives and desires and thus
decide and negotiate their participation in an organization [71].
By specifying the way interactions can occur in an environment, multiagent
organizations are able to specify global organizational goals independently of the
design of the agents (cf. Section 2.4). A role description, as provided by an organi-
zation model, identifies a “position” to be filled by a player [50], which contributes
to some part of the organizational objectives and interaction rules. By consider-
ing role descriptions, global goals can be verified independently of the agents that
will act in the system. From the perspective of the organization it often does not
matter whether agent A or agent B takes a role, as long as they both have sufficient
capabilities. However, the ways in which each agent, A or B, will enact the role
will probably differ, leading to different global results. This is because agents are
assumed to have their own goals, which may be different from those of the orga-
nization, and will use their own reasoning capabilities to decide on the enactment
of one or another organizational role, and to determine which protocol available
to them is the most appropriate to achieve the objectives of the organizational
positions assigned to them. The ability to dynamically bind different players to
roles gives the organization a degree of adaptability in meeting changing goals
and environments [57].
Existing approaches for programming role enactment focus mainly on the pro-
cess of role enactment through communication with the organization [5], and on
the result of role enactment (for example, the adoption of the objectives of a role
as agents’ own goals) [16, 17]. In [16], compatibility between agent goals and
those of the role is investigated and taken as a prerequisite for enacting the role.
Moreover, these approaches assume that (1) organizational specification is explicit
and available to the agents, and (2) an agent is able to interpret that specification
and reason about whether it has the required capabilities to play a role in order to
decide on participating. However, given the heterogeneity of agents in open envi-
ronments, such level of organization-awareness cannot be assumed for all agents.
Role-enacting agents must be able to perform the assigned role(s). These ca-
pabilities include [14]:
• the execution of the functions defined by the role or imposed by role rela-
tionships, including the ability to use resources available to the role.
• the ability to communicate, as a proxy for its role, with players of other
roles.
• the ability to reason about which of its plans and activities can be used to
achieve role objectives.
84 Chapter 2
A possible way to deal with these issues, as proposed in [72], is to equip agents
with an interface to the organization (called a governor). This interface prevents
any action not allowed by the role definition and therefore ensures organizational
norms are met. However, it is not flexible enough to incorporate different agents,
enacting styles, capabilities, and requirements. It actually makes the actual agent
“invisible" to the society and only its enactment of the role behavior is apparent
in the organization. Moreover, the interface determines exactly which actions are
allowed, while differences between agents are invisible to the organization.
From the perspective of an agent, the role description provides a more or less
abstract definition of the organizational knowledge and skills required to perform
the role adequately. Depending on the level of detail of an organization specifi-
cation, more or less interpretation is required from agents. Detailed organization
models support agent designers to develop agents whose behavior complies with
the behavior described by the role(s) they will take up in the society. However,
such a solution is not applicable to open systems, where it is assumed that hetero-
geneous agents are designed to run independently of the organization.
From the perspective of the organization, the concerns are the effect of the
attitudes of agents toward the performance of roles. Agent literature discusses
extensively different types of social attitudes of agents: selfish, altruistic, honest,
dishonest, etc. [11, 21, 45, 48, 64]. Different agents result in different role perfor-
mances, because the way an agent will plan its goals, which is dependent on its
social attitude, influences the realization of its role objectives and the fulfillment
of the role norms. For instance, some agents will only attempt to achieve the goals
of their adopted roles and forget their own private goals, while others will only at-
tempt to achieve the goals of the role after all their own goals have been satisfied.
Furthermore, the relations between agent plans and role objectives, and of agent
goals and role sub-objectives must be considered, as well as the influence of the
role norms on the behavior of agents.
The participation of agents in an organization assumes that there is some ben-
efit to be gained, preferably both by the agent and by the organization. Depending
on how the agent will enact its role(s), different behaviours can be distinguished
[16]. Agents that follow a social enactment strategy will attempt first to realize
their organizational objectives (obtained from the roles they enact) before it will
consider its own goals. In selfish enactment strategies, the situation is reversed.
Many other situations are also possible. In the same way, the effect of agent plans
on role objectives, and of role objectives on agent goals leads to different types
of role enactment strategies [64], in which either the role or the individual plans
can be enriched. Organizational norms also affect the behavior of agents in the
organization, as those can limit or alter the achievement of individual goals.
In summary, most agent organization models assume that the effective engi-
Chapter 2 85
6 Evolution of Organizations
One of the main reasons for having organizations is to achieve stability. However,
organizations and their environments are never static. They change, disappear, or
grow. Agents can migrate, organizational objectives can change, or the environ-
ment can evolve, all of which require adaptation of organizations. Reorganization
is the response to two different stimuli: a reaction to (local) changes in the envi-
ronment, and a means to implement modified overall intentions or strategies.
Multiagent organization models must therefore not only enable the adaptation
of individual agents, but also be able to adapt organizations’ structures dynami-
cally in response to changes in the environment. Depending on the type of organi-
zation and on the perceived impact of the changes in the environment, adaptation
can be achieved by behavioral changes at the agent level, modification of interac-
tion agreements, or the adoption of a new social structure.
Organizational evolution is the process by which organizations change and
adapt over time to meet new requirements and changes in the deployment envi-
ronment. There is a growing recognition that in organizations, a combination of
regulation and autonomy is often necessary. Most human organizations follow
ordered structures and stated goals, but develop an informal structure that reflects
the spontaneous efforts of individuals and subgroups to control and adapt to the
environment. The autonomy to develop informal structures is indispensable to
the process of organizational control and stability [61]. Also in the area of MAS,
Wellman noted in 1993 that "combining individual rationality with laws of so-
cial interaction provides perhaps the most natural approach to [...] distributed
computations" [77]. However, although each change may itself be justified, of-
ten, emergent patterns became the norm, and with time, part of the structures and
rules fixed in the organization. Figure 2.14 shows this cycle, common in human
interactions, moving from explicit rules to implicit practical structures, and back.
86 Chapter 2
implicit
organization
revolution evolution
explicit
organization
7 Conclusions
This chapter gives an organization-oriented perspective on multiagent systems.
Assuming MAS to be an organization, or society of agents, makes it possible to
describe the set of agents interacting to coordinate their behavior as well as the
cooperation requirements that can lead to the achievement of some collective goal
[28]. The stance taken in this chapter is that the conceptualization of multiagent
systems is better served by an explicit separation of concerns about organizational
and individual issues. Whereas the individual aspect should concern the mental
states and capabilities of an agent, organizational issues can better describe ac-
tivities and constraints that characterize the externally observable behavior of a
whole agent population. This chapter further presents the two most common per-
spectives on organization, that of organization as structure (cf. Section 3) and that
of organization as institution (cf. Section 4).
An important ongoing research issue is that of organization-aware agents, as
discussed in Section 5 [70]. Agents who want to enter and play roles in an orga-
nization are expected to understand and reason about the organizational specifica-
tion, if they are to operate effectively and flexibly in the organization. The broader
aim of this line of research is the development of languages and techniques for
programming organization-aware agents. Such agents should be able to reason
about role enactment, about whether they want to play a role and whether they
have the capabilities to behave as the role requires.
Another open research issue concerns the interaction between human and ar-
tificial agents in organizations. What happens when human and artificial agents
interact in organizations? Such cooperation is increasingly happening, mostly in
situations where reaction speed is important (such as emergency response), where
knowledge is diffuse, where a high level of connectivity is necessary, or where
operation in constantly changing environments is needed. As yet, the reach and
consequences of coordinated activity between people and artificial agents working
in close and continuous interaction is not well understood. Planning technologies
for intelligent systems often take an autonomy-centered approach, with represen-
Chapter 2 89
tations, mechanisms, and algorithms that have been designed to accept a set of
goals, and to generate and execute a complete plan in the most efficient and sound
fashion possible. The teamwork-centered autonomy approach takes as a premise
that people are working in parallel alongside one or more autonomous systems,
and hence adopts the stance that the processes of understanding, problem solving,
and task execution are necessarily incremental, subject to negotiation, and forever
tentative [7]. That is, autonomy in teams requires close alignment to the current
work of other team members and the perception of the team’s goals.
Another major open issue is that of organizational adaptation and evolution.
Reorganization is needed in order to enable systems to enforce or adapt to changes
in the environment. This issue has been discussed by many researchers in both or-
ganizational theory and distributed systems, resulting mostly in domain-oriented
empiric solutions. The lack, in most cases, of a formal basis makes it difficult to
develop theories about reorganization, prevents the comparison of approaches and
results, and makes it difficult to adapt models to other domains or situations.
The view of agent organizations presented in this chapter posits that agent
organizations demand (i) the integration of organizational and individual perspec-
tives, (ii) the dynamic adaptation of models to organizational and environmental
changes, and (iii) rely significantly on the notions of openness and heterogene-
ity in MAS. Practical applications of agents to organizational modeling are being
widely developed but formal theories are needed to describe interaction and or-
ganizational structure. Furthermore, it is necessary to get a closer look at the
applicability of insights and theories from organization sciences to the develop-
ment of agent organizations. There is a need for a theoretic model to describe
organizations and their environments that enables the formal analysis of the fit
between organizational design and environment characteristics. This enables the
a priori comparison of designs and their consequences and therefore supports the
decision-making process on the choice of design.
Acknowledgments
We are grateful to Huib Aldewereld, Tina Balke, Frank Dignum, and Marina De
Vos for their assistance in writing this chapter.
8 Exercises
1. Level 1 You have been asked to design an organization model for an on-
line bookstore. The system must be able to handle both selling and buying
books by individuals as well as acting as front-end for a bookstore. Take
90 Chapter 2
References
[1] Huib Aldewereld. Autonomy vs. Conformity – An Institutional Perspective on Norms
and Protocols. PhD thesis, Universiteit Utrecht, 2007.
[2] Huib Aldewereld, Sergio Alvarez-Napagao, Frank Dignum, and Javier Vázquez-
Salceda. Engineering social reality with inheritance relations. In Proceedings of
the 10th International Workshop on Engineering Societies in the Agents World X,
ESAW ’09, pages 116–131, Berlin, Heidelberg, 2009. Springer-Verlag.
[4] Huib Aldewereld, Virginia Dignum, Catholijn Jonker, and M. van Riemsdijk. Agree-
ing on role adoption in open organisations. KI - Künstliche Intelligenz, pages 1–9,
2011. 10.1007/s13218-011-0152-5.
[5] Matteo Baldoni, Guido Boella, Valerio Genovese, Roberto Grenna, and Leendert
Torre. How to program organizations and roles in the JADE framework. In Proceed-
ings of the 6th German Conference on Multiagent System Technologies, MATES ’08,
pages 25–36, Berlin, Heidelberg, 2008. Springer-Verlag.
[6] Tina Balke. Towards the Governance of Open Distributed Systems: A Case Study
in Wireless Mobile Grids. PhD thesis, University of Bayreuth, November 2011.
Available via http://opus.ub.uni-bayreuth.de/volltexte/2011/
929/. Retrieved 20120109. Also available as ISBN-13: 978-1466420090, pub-
lished by Createspace.
[7] Jeffrey M. Bradshaw, Paul J. Feltovich, Matthew Johnson, Maggie R. Breedy, Larry
Bunch, Thomas C. Eskridge, Hyuckchul Jung, James Lott, Andrzej Uszok, and Ju-
rriaan van Diggelen. From tools to teammates: Joint activity in human-agent-robot
Chapter 2 93
teams. In Masaaki Kurosu, editor, HCI (10), volume 5619 of Lecture Notes in Com-
puter Science, pages 935–944. Springer, 2009.
[8] Paolo Bresciani, Paolo Giorgini, Fausto Giunchiglia, John Mylopoulos, and Anna
Perini. Tropos: An agent-oriented software development methodology. Journal of
Autonomous Agents and Multi-Agent Systems, 8:203–236, 2004.
[12] Owen Cliffe. Specifying and Analysing Institutions in Multi-Agent Systems Using
Answer Set Programming. PhD thesis, University of Bath, 2007.
[13] Owen Cliffe, Marina De Vos, and Julian Padget. Specifying and reasoning about
multiple institutions. In Coin, volume 4386 of LNAI, pages 67–85. Springer Berlin
/ Heidelberg, 2007.
[14] A. Colman and J. Han. Roles, players and adaptive organisations. Applied Ontol-
ogy: An Interdisciplinary Journal of Ontological Analysis and Conceptual Model-
ing, 2(2):105–126, 2007.
[15] L. Coutinho, J. Sichman, and O. Boissier. Modelling dimensions for agent organi-
zations. In V. Dignum, editor, Handbook of Research on Multi-Agent Systems: Se-
mantics and Dynamics of Organizational Models. Information Science Reference,
2009.
[16] M. Dastani, V. Dignum, and F. Dignum. Role assignment in open agent societies.
In AAMAS-03. ACM Press, July 2003.
[17] Mehdi Dastani, M. Birna van Riemsdijk, Joris Hulstijn, Frank Dignum, and J-J.
Meyer. Enacting and deacting roles in agent programming. In Proceedings of the
5th International Workshop on Agent-Oriented Software Engineering (AOSE2004),
volume LNCS 3382. Springer, 2004.
[18] Steven Davis. Speech acts and action theory. Journal of Pragmatics, 8(4):469 – 487,
1984.
[19] Francien Dechesne and Virginia Dignum. No smoking here: Compliance differences
between deontic and social norms. In Proceedings of AAMAS 2011. IFAAMAS.org,
2011.
94 Chapter 2
[20] F. Dignum. Autonomous agents with norms. AI & Law, 7(12):69–79, 1999.
[23] V. Dignum, J.J. Meyer, F. Dignum, and H. Weigand. Formal specification of inter-
action in agent societies. In Formal Approaches to Agent-Based Systems (FAABS),
volume F2699 of LNAI. Springer, 2003.
[25] R. Duncan. What is the right organizational structure: Decision tree analysis pro-
vides the answer. Organizational Dynamics, Winter:59–80, 1979.
[26] E. Durfee and J. Rosenschein. Distributed problem solving and multi-agent systems:
Comparisons and examples. In Proc. 13th Int. Distributed Artificial Intelligence
Workshop, pages 94–104, 1994.
[30] Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and
disjunctive databases. New Generation Computing, 9(3-4):365–386, 1991.
[34] Davide Grossi, Frank Dignum, Mehdi Dastani, and Lambèr Royakkers. Foundations
of organizational structures in multiagent systems. In AAMAS ’05: Proceedings of
the Fourth International Joint Conference on Autonomous Agents and Multiagent
Systems, pages 690–697, New York, NY, USA, 2005. ACM.
Chapter 2 95
[36] R. Harré and P.F. Secord. The Explanation of Social Behaviour. Blackwells, 1972.
ISBN 0-631-14220-7.
[37] Bryan Horling and Victor Lesser. A survey of multi-agent organizational paradigms.
The Knowledge Engineering Review, 19(4):281–316, 2004.
[41] Rosine Kitio, Olivier Boissier, Jomi Fred Hübner, and Alessandro Ricci. Organi-
sational artifacts and agents for open multi-agent organisations: Giving the power
back to the agents. In Proceedings of the 2007 International Conference on Co-
ordination, Organizations, Institutions, and Norms in Agent Systems III, COIN’07,
pages 171–186. Springer, 2008.
[42] S. Kumar, M. Huber, P. Cohen, and D. McGee. Towards a formalism for conver-
sation protocols using joint intention theory. Computational Intelligence Journal,
18(2), 2002.
[43] H.J. Levesque, F. Pirri, and R. Reiter. Foundations for the situation calculus. Elec-
tronic Transactions on Artificial Intelligence, 2(3–4):159–178, 1998. Retrieved
20110728 from http://www.ep.liu.se/ej/etai/1998/005/.
[44] R. Malyankar. A pattern template for intelligent agent systems. In Agents’99 Work-
shop on Agent-Based Decision Support for Managing the Internet-Enabled Supply
Chain, 1999.
[45] Maria Miceli, Amedo Cesta, and Paola Rizzo. Distributed artificial intelligence
from a socio-cognitive standpoint: Looking at reasons for interaction. AI & Society,
9:287–320, 1995.
[46] S. Miles, M. Joy, and M. Luck. Towards a methodology for coordination mechanism
selection in open systems. In P. Petta, R.Tolksdorf, and F. Zambonelli, editors,
Engineering Societies in the Agents World III, LNAI 2577. Springer-Verlag, 2003.
[49] D.C. North. Institutions, Institutional Change and Economic Performance. Cam-
bridge University Press, 1991.
[50] J. Odell, M. Nodine, and R. Levy. A metamodel for agents, roles, and groups. In
J. Odell, P. Giorgini, and J. Müller, editors, AOSE IV, LNCS, forthcoming. Springer,
2005.
[51] E. Ostrom. Governing the Commons: The Evolution of Institutions for Collective
Action. Cambridge University Press, Cambridge., 1990.
[54] P. Noriega. Agent Mediated Auctions: The Fishmarket Metaphor. PhD thesis, Uni-
versitat Autonoma de Barcelona, 1997.
[55] L. Padgham and M. Winikoff. Developing Intelligent Agent Systems. Wiley, 2004.
[57] Loris Penserini, Virginia Dignum, Athanasios Staikopoulos, Huib Aldewereld, and
Frank Dignum. Balancing organizational regulation and agent autonomy: An MDE-
based approach. In Huib Aldewereld, Virginia Dignum, and Gauthier Picard, edi-
tors, Engineering Societies in the Agents World X, volume 5881 of Lecture Notes in
Computer Science, pages 197–212. Springer Berlin / Heidelberg, 2009.
[58] A. Rice, editor. The Enterprise and Its Environment: A System Theory of Manage-
ment Organization. Routledge, 2001.
[60] John R. Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge
University Press, 1969.
[61] P. Selznick. TVA and the Grass Roots: A Study of Politics and Organization. Uni-
versity of California Press, 1953.
[62] Y. Shoham and M. Tennenholtz. On social laws for artificial agent societies: off-line
design. Artif. Intell., 73(1-2):231–252, 1995.
Chapter 2 97
[63] Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-
Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK,
2009.
[64] J. Sichman and R. Conte. On personal and role mental attitude: A preliminary
dependency-based analysis. In Advances in AI, LNAI 1515. Springer, 1998.
[65] Herbert A. Simon. The Sciences of the Artificial. MIT Press, Cambridge, MA, USA,
3rd edition, 1996.
[69] G. Valetto, G. Kaiser, and Gaurav S. Kc. A mobile agent approach to process-based
dynamic adaptation of complex software systems. In 8th European Workshop on
Software Process Technology, pages 102–116, 2001.
[70] M. Birna van Riemsdijk, Koen V. Hindriks, and Catholijn M. Jonker. Programming
organization-aware agents: A research agenda. In Proceedings of the Tenth Interna-
tional Workshop on Engineering Societies in the Agents’ World (ESAW’09), volume
5881 of LNAI, pages 98–112. Springer, 2009.
[71] M.B. van Riemsdijk, K. Hindriks, and C.M. Jonker. Programming organization-
aware agents. In Huib Aldewereld, Virginia Dignum, and Gauthier Picard, editors,
Engineering Societies in the Agents World X, volume 5881 of Lecture Notes in Com-
puter Science, pages 98–112. Springer Berlin / Heidelberg, 2009.
[74] Javier Vázquez-Salceda. The Role of Norms and Electronic Institutions in Multi-
Agent Systems Applied to Complex Domains. The HARMONIA Framework. PhD
thesis, Universitat Politècnica de Catalunya, 2003.
[78] O. E. Williamson. Why law, economics, and organization? Annual Review of Law
and Social Science, 1:369–396, 2005.
Communication
Chapter 3
Agent Communication
1 Introduction
Multiagent systems are distributed systems. Engineering a multiagent system
means rigorously specifying the communications among the agents by way of
interaction protocols. What makes specifying the protocols for agent interaction
especially interesting and challenging is that agents are autonomous and heteroge-
neous entities. These properties of agents have profound implications on the na-
ture of protocol specifications. As we shall see, protocols for multiagent systems
turn out to be fundamentally different from those for other kinds of distributed
systems such as computer networks and distributed databases.
We conceptualize all distributed systems in architectural terms – as consist-
ing of components and connectors between the components. The components of
the Internet are all nodes with IP addresses. The main connector is the Internet
protocol, which routes packets between the nodes. The components of the web
are the clients (such as browsers) and servers and the connector is the HTTP pro-
tocol. The components in a distributed database are the client databases and the
coordinator and a connector is the two-phase commit protocol. We can discern a
pattern here: the connectors are nothing but the interaction protocols among the
components. Further, we can associate protocols with the application it facilitates.
For example, the Internet protocol facilitates routing; HTTP facilitates access to
a distributed database of resources; and the two-phase commit protocol facilitates
distributed transactions.
102 Chapter 3
The same applies for multiagent systems except that the components are au-
tonomous and heterogeneous agents, and applications are typically higher-level –
for example, auctions, banking, shipping, and so on. Each application would have
its own set of requirements and therefore we would normally find different proto-
cols for each application. Below, the term traditional distributed systems refers to
non-multiagent distributed systems such as the Internet, the web, and so on.
The importance of protocols is not lost upon industry. Communities of prac-
tice are increasingly interested in specifying standard protocols for their respective
domains. RosettaNet [40] (e-business), TWIST [53] (foreign exchange transac-
tions), GDSN [33] (supply chains), and HITSP [34] and HL7 [31] (health care)
are just a few examples.
Our objectives in this chapter are to help the reader develop a clear sense of
the conceptual underpinnings of agent communication and to help the reader learn
to apply the concepts to the extent possible using available software. The chapter
is broadly structured according to the following sub-objectives.
Protocol specification approaches There are many diverse approaches for spec-
ifying protocols. We evaluate some approaches widely practiced in soft-
ware engineering and some historically significant ones from artificial in-
telligence. We also study an approach that is particularly promising.
Directions in agent communication research The last fifteen years have seen
some exciting developments in agent communication. However, many prac-
tical concerns remain to be addressed. We discuss these briefly.
roles in the protocol, they will be able to work together no matter how they are
implemented. Interoperation makes great engineering sense because it means that
the components are loosely coupled with each other; that is, we can potentially
replace a component by another conformant one and the modified system would
continue to function. You would have noticed that web browsers and servers often
advertise the versions of the HTTP standard with which they are conformant.
The same concepts and concerns apply to multiagent systems. However,
agents are not ordinary components. They are components that are autonomous
and heterogeneous. Below, we discuss exactly what we mean by these terms, and
how autonomy and heterogeneity naturally lead to requirements for agent interac-
tion protocols that go beyond protocols for traditional distributed systems.
Each agent is an autonomous entity in the sense that it itself is a domain of
control: other agents have no direct control over its actions (including its com-
munications). For instance, consider online auctions as they are conducted on
websites such as eBay. Sellers, bidders, and auctioneers are all agents, and none
of them exercises any control over the others. If an auctioneer had control over
bidders, then (if it chose to) it could force any of the bidders to bid any amount
by simply invoking the appropriate method. Such a setting would lack any resem-
blance to real life.
There is a subtle tension between the idea of a protocol and autonomy. With
protocols, we seek to somehow constrain the interaction among agents so that
they would be interoperable. Autonomy means that the agents are free to interact
as they please (more precisely, each agent acts according to the rationale of its
principal). From this observation follows our first requirement. We must design
protocols so that they do not overconstrain an agent’s interactions.
In traditional distributed systems, interoperation is achieved via low-level co-
ordination. The protocols there would specify the flow of messages between the
participants. In the case of the two-phase commit protocol, the controller co-
ordinates the commit outcome of a distributed transaction. In the first phase, a
controller component collects votes from individual databases about whether they
are each ready to commit their respective subtransactions. If they unanimously re-
spond positively, the controller, in the second phase, instructs each to commit its
respective subtransaction; otherwise, it instructs each to abort its subtransaction.
The above discussion of autonomy implies the following.
from you about the things you want, the maximum prices you are willing
to pay, and the reputation thresholds of the sellers and auctioneers you are
willing to deal with. On the other hand, you can design a sophisticated
bidding agent that mines your communications to discover the items you
desire and what you are willing to pay for them and can figure out on its
own which auctions to bid in on your behalf. From the agent communication
perspective, however, the latter’s sophistication does not matter – they are
both autonomous agents.
Logical versus physical distribution Because of their autonomy, agents are the
logical units of distribution: they can neither be aggregated nor decomposed
into processes. Whenever an application involves two or more agents, there
simply is no recourse but to consider their interactions. Constructs such
as processes, by contrast, are physical units of distribution. The choice of
whether an application is implemented as a single process or multiple ones
is often driven by physical considerations such as geographical distribution,
throughput, redundancy, number of available processors and cores, and so
on. An agent itself may be implemented via multiple physical units of dis-
tribution; that choice, however, is immaterial from a multiagent systems
perspective.
social commitment (more on social commitments later). Thus when the seller
offers some book to the buyer for some price, it would mean that the seller is
socially committed to the buyer for the offer. Consequently, updating an offer,
for instance, by raising the price of the book, counts as updating the commitment.
Specifically, it means that the old commitment is canceled and in its place a new
one is created. Clearly, a protocol that specifies only the flow of messages, such
as the one in Figure 3.1, does not capture such subtleties of meaning.
If the meanings of messages are not public, that would potentially make the
agent non-interoperable. For example, this would happen if the buyer interprets
the seller’s offer as a commitment, but the seller does not. Their interaction would
potentially break down. Accommodating semantic heterogeneity presupposes that
we make the meanings of messages public as part of the protocol specification.
In practice, many multiagent protocols are specified as flows without reference
to the message meanings. And they seem to work fairly well. In such cases, the
designers of the agents agree off-line on how to interpret and process the messages
and build this interpretation into the agents, thereby tightly coupling the agents.
Austin argued that all communications could be phrased in the above declar-
ative form through the use of appropriate performative verbs. Thus a simple in-
formative such as “the shipment will arrive on Wednesday” can be treated as if it
were “I inform you that the shipment will arrive on Wednesday.” A directive such
as “send me the goods” can be treated as if it were “I request that you send me
the goods” or “I demand that you send me the goods” or other such variations. A
commissive such as “I’ll pay you $5” can be treated as if it were “I promise that
I’ll pay you $5.”
The above stylized construction has an important ramification for us as stu-
dents of multiagent systems. It emphasizes that although what is being informed,
requested, or promised may or may not be within the control of the informer, re-
quester, or promiser, the fact that the agent chooses to inform, request, or promise
another agent is entirely within its control. The above construction thus coheres
with our multiagent systems thinking about autonomy and reflects the essence of
the autonomous nature of communication as we explained above.
The above stylized construction has another more practical and arguably more
nefarious ramification. Specifically, this is the idea that we can use the perfor-
mative verb in the above to identify the main purpose or illocutionary point of
a communication, separately from the propositional content of the communica-
tion. The underlying intuition is that the same propositional content could be
coupled with different illocutionary points to instantiate distinct communicative
acts. In computer science terms, the illocutionary points map to message types,
and may be thought of as being the value of a message header. Following the
shipment example above, we would associate the proposition “the shipment will
arrive on Wednesday” with different message types, for example, inform, request,
and query.
3.1 Choreographies
The service-oriented computing literature includes studies of the notion of a
choreography. A choreography is a specification of the message flow among the
participants. Typically, a choreography is specified in terms of roles rather than
the participants themselves. Involving roles promotes reusability of the chore-
ography specification. Participants adopt roles, that is, bind to the roles, in the
choreography.
A choreography is a description of an interaction from a shared or, more prop-
erly, a neutral perspective. In this manner, a choreography is distinguished from a
specification of a workflow, wherein one party drives all of the other parties. The
latter approach is called an orchestration in the services literature.
An advantage of adopting a neutral perspective, as in a choreography, is that
it better applies in settings where the participants retain their autonomy: thus
it is important to state what each might expect from the others and what each
might offer to the others. Doing so promotes loose coupling of the components:
centralized approaches could in principle be equally loosely coupled but there
is a tendency associated with the power wielded by the central party to make
the other partners fit its mold. Also, the existence of the central party and the
resulting regimentation of interactions leads to implicit dependencies and thus
tight coupling among the parties.
A neutral perspective yields a further advantage that the overall computation
becomes naturally distributed and a single party is not involved in mediating all
information flows. A choreography is thus a way of specifying and building
distributed systems that among the conventional approaches most closely agrees
with the multiagent systems’ way of thinking. But important distinctions remain,
which we discuss below.
WS-CDL [57] and ebBP [25] are the leading industry supported choreography
standardization efforts. WS-CDL specifies choreographies as message exchanges
among partners. WS-CDL is based on the pi-calculus, so it has a formal oper-
ational semantics. However, WS-CDL does not satisfy important criteria for an
agent communication formalism. First, WS-CDL lacks a theory of the mean-
ings of the message exchanges. Second, when two or more messages are per-
formed within a given WS-CDL choreography, they are handled sequentially by
default, as in an MSC. Third, WS-CDL places into a choreography actions that
Chapter 3 111
The most natural way to specify a protocol is through a message sequence chart
(MSC), formalized as part of UML as sequence diagrams [28]. The roles of a
protocol correspond to the lifelines of an MSC; each edge connecting two life-
lines indicates a message from a sender to a receiver. Time flows downward by
convention and the ordering of the messages is apparent from the chart. MSCs
support primitives for grouping messages into blocks. Additional primitives in-
clude alternatives, parallel blocks, or iterative blocks. Although we do not use
MSCs extensively, they provide a simple way to specify agent communication
protocols.
FIPA (Foundation of Intelligent Physical Agents) is a standards body, now part
of the IEEE Computer Society, which has formulated agent communication stan-
dards. FIPA defines a number of interaction protocols. These protocols involve
messages of the standard types in FIPA. Each FIPA protocol specifies the possi-
ble ordering and occurrence constraints on messages as a UML sequence diagram
supplemented with some informal documentation.
Figure 3.2 shows the FIPA request interaction protocol in FIPA’s variant of the
UML sequence diagram notation [26]. This protocol involves two roles, an INI -
TIATOR and a PARTICIPANT. The INITIATOR sends a request to the PARTICIPANT,
who either responds with a refuse or an agree. In the latter case, it follows up with
a detailed response, which could be a failure, an inform-done, or an inform-result.
The PARTICIPANT may omit the agree message unless the INITIATOR asked for a
notification.
The FIPA request protocol deals with the operational details of when certain
messages may or must be sent. It does not address the meanings of the messages
themselves. Thus it is perfectly conventional in this regard. Where it deviates
from traditional distributed computing is in the semantics it assigns to the mes-
sages themselves, which we return to below. However, the benefit of having a
protocol is apparent even in this simple example: it identifies the roles and their
mutual expectations and thus decouples the implementations of the associated
agents from one another.
112 Chapter 3
Initiator Participant
Request
Alt
Refuse
[REFUSED]
Agree
[AGREED and NOTIFICATION]
Alt
Fail
Inform-done
Inform-result
Figure 3.2: FIPA request interaction protocol, from the FIPA specification [26],
expressed as a UML sequence diagram.
tional executions that are not supported by the state machine in Figure 3.3. The
executions depict the scenarios where the customer sends the payment upon re-
ceiving an offer and after sending an accept, respectively. These additional execu-
tions are just as sensible as the original ones. However, in the context of the state
machine in Figure 3.3, these executions are trivially noncompliant. The reason
is that checking compliance with choreographies is purely syntactical – the mes-
sages have to flow between the participants exactly as prescribed. Clearly, this
curbs the participants’ autonomy and flexibility.
We can attempt to ameliorate the situation by producing ever larger FSMs that
include more and more paths. However, doing so complicates the implementation
of agents and the task of comprehending and maintaining protocols, while not
supporting any real run-time flexibility. Further, any selection of paths will remain
arbitrary.
Software engineering Because the protocols specify the set of possible enact-
114 Chapter 3
ments at a low level of abstraction, any but the most trivial are difficult
to design and maintain. It is difficult to map the business requirements of
stakeholders to the protocols produced.
Flexibility Agents have little flexibility at runtime; the protocols essentially dic-
tate agent skeletons. Any deviation from a protocol by an agent, no matter
how sensible from a business perspective, is a violation. Further, to enable
interoperation, the protocols are specified so that they produce lock-step
synchronization among agents, which also limits flexibility.
4 Traditional AI Approaches
The traditional AI approaches to agent communication begin from the opposite
extreme. These approaches presume that the agents are constructed based on cog-
nitive concepts, especially, beliefs, goals, and intentions. Then they specify the
communication of such agents in terms of how the communication relates to their
cognitive representations.
Chapter 3 115
The AI approaches came from two related starting points, which has greatly
affected how they were shaped. The first starting point was of human-computer
interaction broadly and natural language understanding specifically. The latter
includes the themes of discourse understanding from text or speech, and speech
understanding. What these approaches had in common was that they were geared
toward developing a tool that would assist a user in obtaining information from
a database or performing simple transactions such as booking a train ticket. A
key functionality of such tools was to infer what task the user needed to perform
and to help the user accordingly. These tools maintained a user model and were
configured with a domain model upon which they reasoned via heuristics to de-
termine how best to respond to their user’s request, and potentially to anticipate
the user’s request.
Such a tool was obviously cooperative: its raison d’être was to assist its user
and failure to be cooperative would be simply unacceptable. Further, it was an
appropriate engineering assumption that the user was cooperative as well. That is,
the tool could be based on the idea that the user was not purposefully misleading
it, because a user would gain nothing in normal circumstances by lying about its
needs and obtaining useless responses in return.
As the tools became more proactive they began to be thought of as agents.
Further, in some cases the agents of different users could communicate with one
another, not only with their users. The agents would maintain their models of
their users and others based on the communications exchanged. They could make
strong inferences regarding the beliefs and intentions of one another, and act and
communicate accordingly. These approaches worked for their target setting. To
AI researchers, the approaches these agents used for communicating with users
and other agents appeared to be applicable for agent communication in general.
The second body of work in AI that related to agent communication came
from the idea of building distributed knowledge-based systems (really just ex-
pert systems with an ability to communicate with each other). The idea was that
each agent would include a reasoner and a knowledge representation and com-
munication was merely a means to share such knowledge. Here, too, we see the
same two assumptions as for the human interaction work. First, that the member
agents were constructed with the same knowledge representations. Second, that
the agents were largely cooperative with each other.
4.1 KQML
Agent communication languages began to emerge in the 1980s. These were usu-
ally specific to the projects in which they arose, and typically relied on the specific
internal representations used within the agents in those projects.
116 Chapter 3
Somewhat along the same lines, but with some improved generality, arose the
Knowledge Query and Manipulation Language or KQML. KQML was created
by the DARPA Knowledge Sharing Effort, and was meant to be an adjunct to the
other work on knowledge representation technologies, such as ontologies. KQML
sought to take advantage of a knowledge representation based on the construct of a
knowledge base, such as had become prevalent in the 1980s. Instead of a specific
internal representation, KQML assumes that each agent maintains a knowledge
base described in terms of knowledge (more accurately, belief) assertions.
KQML proposed a small number of important primitives, such as query and
tell. The idea was that each primitive could be given a semantics based on the
effect it had on the knowledge bases of the communicating agents. Specifically,
an agent would send a tell for some content only if it believed the content, that is,
the content belonged in its knowledge base. And, an agent who received a tell for
some content would insert that content into its knowledge base, that is, it would
begin believing what it was told.
Even though KQML uses knowledge as a layer of abstraction over the detailed
data structures of the internal implementation of agents, it turns out to be overly
restricted in several ways. The main assumption of KQML is that the commu-
nicating agents are cooperative and designed by the same designers. Thus the
designers would make sure that an agent sent a message, such as a tell, only under
the correct circumstances and an agent who received such a message could imme-
diately accept its contents. When the agents are autonomous, they may generate
spurious messages – and not necessarily due to malice.
KQML did not provide a clear basis for agent designers to choose which of the
message types to use and how to specify their contents. As a result, designers all
too often resolved to using a single message type, typically tell, with all meanings
encoded (usually in some ad hoc manner) in the contents of the messages. That
is, the approach is to use different tell messages with arbitrary expressions placed
within the contents of the messages.
The above challenges complicated interoperability so that it was in general
difficult if not impossible for agents developed by different teams to be able to
successfully communicate with one another.
Like the KQML semantics, the FIPA ACL semantics is mentalist, although it has
a stronger basis in logic. The FIPA ACL semantics is based on a formalization of
the cognitive concepts such as the beliefs and intentions of agents.
Beliefs and intentions are suitable abstractions for designing and implement-
ing agents. However, they are highly unsuitable as a basis for an agent communi-
cation language. A communication language supports the interoperation of two or
more agents. Thus it must provide a basis for one agent to compute an abstraction
of the local state of another agent. The cognitive concepts provide no such basis in
a general way. They lead to the internal implementations of the interacting agents
to be coupled with each other. The main reason for this is that the cognitive con-
cepts are definitionally internal to an agent. For example, consider the case where
a merchant tells a customer that a shipment will arrive on Wednesday. When the
shipment fails to arrive on Wednesday, would it be any consolation to the cus-
tomer that the merchant sincerely believed that it was going to? The merchant
could equally well have been lying. The customer would never know without an
audit of the merchant’s databases. In certain legal situations, such audits can be
performed but they are far from the norm in business encounters.
One might hope that it would be possible to infer the beliefs and intentions of
another party, but it is easy to see with some additional reflection that no unique
characterization of the beliefs and intentions of an agent is possible. In the above
example, maybe the merchant had a sincere but false belief; or, maybe the mer-
chant did not have the belief it reported; or, maybe the merchant was simply un-
sure but decided to report a belief because the merchant also had an intention to
consummate a deal with the customer.
It is true that if one developer implements all the interacting agents correctly,
the developer can be assured that an agent would send a particular message only
in a particular internal state (set of beliefs and intentions). However such a multi-
agent system would be logically centralized and would be of severely limited
value.
It is worth pointing out that the FIPA specifications have ended up with a split
personality. FIPA provides the semiformal specification of an agent management
system, which underlies the well-regarded JADE system [7]. FIPA also provides
definitions for several interaction protocols (discussed in Section 3.2), which are
also useful and used in practice, despite their limitations. FIPA provides a formal
semantics for agent communication primitives based on cognitive concepts, which
gives a veneer of rigor, but is never used in multiagent systems.
5.1 Commitments
A commitment is an expression of the form
where debtor and creditor are agents, and antecedent and consequent are propo-
sitions. A commitment C(x, y, r, u) means that x is committed to y that if r holds,
then it will bring about u. If r holds, then C(x, y, r, u) is detached, and the com-
mitment C(x, y, , u) holds ( being the constant for truth). If u holds, then the
commitment is discharged and does not hold any longer. All commitments are
conditional; an unconditional commitment is merely a special case where the an-
tecedent equals . Examples 3.1–3.3 illustrate these concepts. In the examples
below, EBook is a bookseller, and Alice is a customer.)
Example 3.1 (Commitment) C(EBook, Alice, $12, BNW) means that EBook com-
mits to Alice that if she pays $12, then EBook will send her the book Brave New
World.
Chapter 3 119
Example 3.2 (Detach) If Alice makes the payment, that is, if $12 holds, then
C(EBook, Alice, $12, BNW) is detached. In other words, C(EBook, Alice, $12,
BNW) ∧ $12 ⇒ C(EBook, Alice, , BNW).
Example 3.3 (Discharge) If EBook sends the book, that is, if BNW holds, then
both C(EBook, Alice, $12, BNW) and C(EBook, Alice, , BNW) are discharged. In
other words, BNW ⇒ ¬C(EBook, Alice, $12, BNW)∧ ¬C(EBook, Alice, , BNW).
Importantly, commitments can be manipulated, which supports flexibility. The
commitment operations [45] are listed below; CREATE, CANCEL, and RELEASE
are two-party operations, whereas DELEGATE and ASSIGN are three-party opera-
tions.
Figure 3.5: Distinguishing message syntax and meaning: two views of the same
enactment.
merchant and customer. For instance, the message Offer(mer, cus, price, item)
means the creation of the commitment C(mer, cus, price, item), meaning
the merchant commits to delivering the item if the customer pays the
price; Reject(cus, mer, price, item) means a release of the commitment;
Deliver(mer, cus, item) means that the proposition item holds.
Figure 3.5 (left) shows an execution of the protocol and Figure 3.5 (right) its
meaning in terms of commitments. (The figures depicting executions use a nota-
tion similar to UML interaction diagrams. The vertical lines are agent lifelines;
time flows downward along the lifelines; the arrows depict messages between the
agents; and any point where an agent sends or receives a message is annotated
with the commitments that hold at that point. In the figures, instead of writing
CREATE , we write Create. We say that the Create message realizes the CREATE
operation. Likewise, for other operations and DECLARE.) In the figure, the mer-
chant and customer roles are played by EBook and Alice, respectively; cB and cUB
Chapter 3 121
are the commitments C(EBook, Alice, $12, BNW) and C(EBook, Alice, , BNW),
respectively.
Figure 3.6 shows some of the possible enactments based on the proto-
col in Table 3.1. The labels cA and cUA are C(Alice, EBook, BNW, $12) and
C(Alice, EBook, , $12), respectively. Figure 3.6(B) shows the enactment where
the book and payment are exchanged in Figure 3.3. Figures 3.6(A) and (C) show
the additional executions supported in Figure 3.4; Figure 3.6(D) reflects a new
execution that we had not considered before, one where Alice sends an Accept
even before receiving an offer. All these executions are compliant executions in
terms of commitments, and are thus supported by the protocol in Table 3.1.
Table 3.2 summarizes the three approaches.
122 Chapter 3
implementing the agents who would participate in the given protocol. Role spec-
ifications are sometimes termed role skeletons or endpoints, and the associated
problem is called role generation and endpoint projection.
The above motivation of implementing the agents according to the roles sug-
gests an important quality criterion. We would like the role specifications to be
such that agents who correctly implement the roles can interoperate successfully
without the benefit of any additional messages than those included in the proto-
col and which feature in the individual role specifications. In other words, we
would like the agents implementing the roles to only be concerned with satisfying
the needs of their respective roles without regard to the other roles: the overall
computation would automatically turn out to be correct.
Role generation is straightforward for two-party protocols. Any message ex-
change involves two agents (neglecting multicast across roles): the sender and
the receiver. Any message sent by the sender is received by the receiver. Thus it
is easy to ensure their joint computations generate correct outcomes. In systems
with three or more roles, however, whenever a message exchange occurs, one or
more of the other roles would be left unaware of what has transpired. As a result,
no suitable role skeletons may exist for a protocol involving three or more par-
ties. We take this non-existence to mean that the protocol in question is causally
ill-formed and cannot be executed in a fully distributed manner. Such a protocol
must be corrected, usually through the insertion of messages that make sure that
the right information flows to the right parties and that potential race conditions
are avoided.
In a practical setting, then, the role skeletons are mapped to a simple set of
method stubs. An agent implementing a role – in this metaphor, by fleshing out
its skeleton – provides methods to process each incoming message and attempts to
send only those messages allowed by the protocol. Role skeletons do not consider
the contents of the messages. As a result, they can be expressed in a finite-state
machine too. Notice this machine is different from a state machine that specifies a
protocol. A role’s specification is very much focused on the perspective of the role
whereas the machine of a protocol describes the progress of a protocol enactment
from a neutral perspective.
Explicit meanings The meaning ought to be made public, not hidden within
agent implementations.
Motivation It is not known in advance whether a party will fulfill its commit-
ments; compensation commitments provide some assurance to the creditor
in case of violations.
Example Compensate(mer, cus, price, item, discount); it means that the merchant
will offer the customer a discount on the next purchase if the item is paid
for but not delivered.
Intent One party makes an offer to another, who responds with a modified offer
of its own.
126 Chapter 3
Example Let’s say C(EBook, Alice, $12, BNW) holds. Alice can make the coun-
teroffer C(Alice, EBook, BNW ∧ Dune, $12), meaning that she wants Dune
in addition to BNW for the same price.
The above are some examples of patterns. For a more exhaustive list of pat-
terns, see [16].
• Identify the roles involved in the interaction. Let’s say the roles identified
are customer, merchant, shipper, and banker.
• Identify how each message would affect the commitments of its sender and
receiver. For example, the Offer message could be given a meaning similar
to the one in Table 3.1. The customer’s payment to the bank would effec-
tively discharge his or her commitment to pay the merchant. Similarly, the
delivery of the goods by the shipper would effectively discharge the mer-
chant’s commitment to pay, and so on.
128 Chapter 3
8 Conclusions
It should be no surprise to anyone that communication is at the heart of multiagent
systems, not only in our implementations but also in our conception of what a
multiagent system is and what an agent is.
To our thinking, an agent is inherently autonomous. Yet, autonomous, hetero-
geneously constructed agents must also be interdependent on each other if they
are to exhibit complex behaviors and sustain important real-world applications.
A multiagent system, if it is any good, must be loosely coupled and communica-
Chapter 3 131
tion is the highly elastic glue that keeps it together. Specifically, communication,
understood in terms of agents and based on high-level abstractions such as those
we explained above, provides the quintessential basis for the arms-length relation-
ships desired in all modern software engineering as it addresses the challenges of
large decentralized systems.
The foregoing provided a historical view of agent communication, identifying
the main historical and current ideas in the field. This chapter has only scratched
the surface of this rich and exciting area. We invite the reader to delve deeper
and to consider many of the fundamental research problems that arise in this area.
An important side benefit is that, faced with the challenges of open systems such
as on the web, in social media, in mobile computing, and cyberphysical systems,
traditional computer science is now beginning to appreciate the importance and
value of the abstractions of agent communication. Thus progress on the problems
of agent communication can have significant impact on much of computer science.
Further Reading
Agent communication is one of the most interesting topics in multiagent systems,
not only because of its importance to the field but also because of the large number
of disciplines that it relates to. In particular, it touches upon ideas in philosophy,
linguistics, social science (especially organizations and institutions), software en-
gineering, and distributed computing. The readings below will take the reader
deeper into these subjects.
their participants. Artikis, Jones, Pitt, and Sergot have developed formal-
izations of norms that are worth studying as influential papers [5, 37]. Jones
and Parent [36] formalize conventions as a basis for communication.
Singh proposed the notion of social commitments [43, 45] as an important
normative concept to be used for understanding social relationships. He
proposed commitments as a basis for a social semantics for communica-
tion [46]. A related idea has been developed by Colombetti [17]. A formal
semantics for commitments [48] and the proper reasoning about commit-
ments in situations with asynchronous communication among decoupled
agents [15] are significant to practice and promising as points of departure
for important research in this area.
Acknowledgments
We have benefited from valuable discussions about agent communication with
several colleagues, in particular, our coauthors on papers relating to agent com-
munication: Matteo Baldoni, Cristina Baroglio, Nirmit Desai, Scott Gerard, Elisa
Marengo, Viviana Patti, and Pınar Yolum.
Some parts of this chapter have appeared in previous works by the authors
[16, 38].
Amit Chopra was supported by a Marie Curie Trentino award. Munindar
Singh’s effort was partly supported by the National Science Foundation under
grant 0910868. His thinking on this subject has benefited from participation in
the OOI Cyberinfrastructure program, which is funded by NSF contract OCE-
0418967 with the Consortium for Ocean Leadership via the Joint Oceanographic
Institutions.
Chapter 3 133
9 Exercises
1. Level 1 Which of the following statements are true?
3. Level 1 Which of the following statements are true about interaction and
communication?
4. Level 1 Identify all of the following statements that are true about commit-
ments and commitment protocols.
(a) e0 , e1 , e5
Chapter 3 135
(b) e1 , e0 , e2
(c) e1 , e0 , e3
(d) e1 , e0 , e2
(e) e0 , e1 , e2
6. Level 2 Examine Figure 3.1. Now create an FSM for the commitment com-
pensate pattern discussed in the chapter.
7. Level 2 Examine Figure 3.1. Now specify a commitment pattern that cap-
tures the idea of updating commitments.
10. Level 3 Consider the following outline of a process for buying books. A
merchant offers an online catalog of books with price and availability infor-
mation. A customer can browse the catalog and purchase particular books
from the catalog or the merchant may contact the customer directly with of-
fers for particular books. However, the customer must arrange for shipment
on his or her own: in other words, the customer must arrange for a shipper
to pick up the books from the merchant’s store and deliver them to him or
her. All payments – to the merchant for the books and to the shipper for
delivery – are carried out via a payment agency (such as PayPal).
(a) List the roles and messages involved in the protocol underlying the
above business process.
(b) Specify the messages in terms of communicative acts.
(c) Specify the protocol in three different ways: as an FSM with messages
as the transitions, (2) as an MSC, and (3) as a commitment protocol.
(d) Show a simplified MSC representing one possible enactment where
the books have been delivered and the payments have been made.
(e) Based on the commitment protocol you specified above, annotate
points in the above-described enactment with commitments that hold
at those points.
11. Level 4 Suppose the business process described in Question 10 above also
supported returns and refunds for customers.
136 Chapter 3
12. Level 3 Specify role skeletons for the purchase process with returns and
refunds
14. Level 3 Compare the FSM and MSC from Question 13 to the commitment
protocol specification of Table 3.1 with respect to compliance, ease of cre-
ation, and ease of change.
15. Level 4 Implement the logic for practical commitments described in [48].
References
[1] Marco Alberti, Federico Chesani, Marco Gavanelli, Evelina Lamma, Paola Mello,
Marco Montali, and Paolo Torroni. Web service contracting: Specification and rea-
soning with SCIFF. In Proceedings of the 4th European Semantic Web Conference,
pages 68–83, 2007.
[2] Marco Alberti, Marco Gavanelli, Evelina Lamma, Paola Mello, and Paolo Torroni.
Modeling interactions using social integrity constraints: A resource sharing case
study. In Proceedings of the International Workshop on Declarative Agent Lan-
guages and Technologies (DALT), volume 2990 of LNAI, pages 243–262. Springer,
2004.
[3] Huib Aldewereld, Sergio Álvarez-Napagao, Frank Dignum, and Javier Vázquez-
Salceda. Making norms concrete. In Proceedings of the 9th International Con-
ference on Autonomous Agents and Multiagent Systems (AAMAS), pages 807–814,
Toronto, 2010. IFAAMAS.
Chapter 3 137
[4] Alexander Artikis, Marek J. Sergot, and Jeremy Pitt. An executable specification of
a formal argumentation protocol. Artificial Intelligence, 171(10–15):776–804, 2007.
[5] Alexander Artikis, Marek J. Sergot, and Jeremy V. Pitt. Specifying norm-governed
computational societies. ACM Transactions on Computational Logic, 10(1), 2009.
[6] John L. Austin. How to Do Things with Words. Clarendon Press, Oxford, 1962.
[7] Fabio Luigi Bellifemine, Giovanni Caire, and Dominic Greenwood. Developing
Multi-Agent Systems with JADE. Wiley-Blackwell, 2007.
[8] Boualem Benatallah, Fabio Casati, and Farouk Toumani. Analysis and management
of web service protocols. In Conceptual Modeling ER 2004, volume 3288 of LNCS,
pages 524–541. Springer, 2004.
[9] Carlos Canal, Lidia Fuentes, Ernesto Pimentel, José M. Troya, and Antonio Valle-
cillo. Adding roles to CORBA objects. IEEE Transactions on Software Engineering,
29(3):242–260, 2003.
[10] Christopher Cheong and Michael P. Winikoff. Hermes: Designing flexible and ro-
bust agent interactions. In Virginia Dignum, editor, Handbook of Research on Multi-
Agent Systems: Semantics and Dynamics of Organizational Models, chapter 5, pages
105–139. IGI Global, Hershey, PA, 2009.
[12] Amit K. Chopra, Alexander Artikis, Jamal Bentahar, Marco Colombetti, Frank
Dignum, Nicoletta Fornara, Andrew J. I. Jones, Munindar P. Singh, and Pınar
Yolum. Research directions in agent communication. ACM Transactions on In-
telligent Systems and Technology (TIST), 2011.
[15] Amit K. Chopra and Munindar P. Singh. Multiagent commitment alignment. In Pro-
ceedings of the 8th International Conference on Autonomous Agents and MultiAgent
Systems (AAMAS), pages 937–944, Budapest, May 2009. IFAAMAS.
138 Chapter 3
[16] Amit K. Chopra and Munindar P. Singh. Specifying and applying commitment-
based business patterns. In Proceedings of the 10th International Conference on
Autonomous Agents and MultiAgent Systems (AAMAS), Taipei, May 2011. IFAA-
MAS.
[17] Marco Colombetti. A commitment-based approach to agent speech acts and conver-
sations. In Proceedings of the Autonomous Agents Workshop on Agent Languages
and Communication Policies, pages 21–29, May 2000.
[18] R. Cost, Y. Chen, T. Finin, Y. Labrou, and Y. Peng. Modeling agent conversations
with colored petri nets. In Working Notes of the Workshop on Specifying and Imple-
menting Conversation Policies, pages 59–66, Seattle, Washington, May 1999.
[19] Nirmit Desai, Amit K. Chopra, Matthew Arrott, Bill Specht, and Munindar P. Singh.
Engineering foreign exchange processes via commitment protocols. In Proceedings
of the 4th IEEE International Conference on Services Computing, pages 514–521,
Los Alamitos, 2007. IEEE Computer Society Press.
[20] Nirmit Desai, Amit K. Chopra, and Munindar P. Singh. Representing and reasoning
about commitments in business processes. In Proceedings of the 22nd Conference
on Artificial Intelligence, pages 1328–1333, 2007.
[21] Nirmit Desai, Amit K. Chopra, and Munindar P. Singh. Amoeba: A methodol-
ogy for modeling and evolution of cross-organizational business processes. ACM
Transactions on Software Engineering and Methodology (TOSEM), 19(2):6:1–6:45,
October 2009.
[22] Nirmit Desai, Ashok U. Mallya, Amit K. Chopra, and Munindar P. Singh. Interac-
tion protocols as design abstractions for business processes. IEEE Transactions on
Software Engineering, 31(12):1015–1027, December 2005.
[23] Nirmit Desai and Munindar P. Singh. On the enactability of business protocols. In
Proceedings of the 23rd Conference on Artificial Intelligence (AAAI), pages 1126–
1131, Chicago, July 2008. AAAI Press.
[25] ebBP. Electronic business extensible markup language business process specifica-
tion schema v2.0.4, December 2006. docs.oasis-open.org/ebxml-bp/2.0.4/OS/.
[26] FIPA. FIPA interaction protocol specifications, 2003. FIPA: The Foundation for
Intelligent Physical Agents, http://www.fipa.org/repository/ips.html.
Chapter 3 139
[27] Nicoletta Fornara, Francesco Viganò, Mario Verdicchio, and Marco Colombetti. Ar-
tificial institutions: A model of institutional reality for open multiagent systems.
Artificial Intelligence and Law, 16(1):89–105, March 2008.
[28] Martin Fowler. UML Distilled: A Brief Guide to the Standard Object Modeling
Language. Addison-Wesley, Reading, MA, 3rd edition, 2003.
[29] Juan C. Garcia-Ojeda, Scott A. DeLoach, Robby, Walamitien H. Oyenan, and Jorge
Valenzuela. O-MaSE: A customizable approach to developing multiagent processes.
In Proceedings of the 8th International Workshop on Agent Oriented Software En-
gineering (AOSE), 2007.
[30] Scott N. Gerard and Munindar P. Singh. Formalizing and verifying protocol refine-
ments. ACM Transactions on Intelligent Systems and Technology (TIST), 2011.
[31] HL7 reference information model, version 1.19. www.hl7.org/ Library/ data-model/
RIM/ C30119/ Graphics/ RIM_billboard.pdf, 2002.
[32] Kohei Honda, Nobuko Yoshida, and Marco Carbone. Multiparty asynchronous ses-
sion types. In Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Prin-
ciples of Programming Languages (POPL), pages 273–284. ACM, 2008.
[35] Marc-Philippe Huget and James Odell. Representing agent interaction protocols
with agent UML. In Agent-Oriented Software Engineering V, volume 3382 of LNCS,
pages 16–30. Springer, 2005.
[36] Andrew J. I. Jones and Xavier Parent. A convention-based approach to agent com-
munication languages. Group Decision and Negotiation, 16(2):101–141, March
2007.
[38] Elisa Marengo, Matteo Baldoni, Amit K. Chopra, Cristina Baroglio, Viviana Patti,
and Munindar P. Singh. Commitments with regulations: Reasoning about safety
and control in R EGULA. In Proceedings of the 10th International Conference on
Autonomous Agents and MultiAgent Systems (AAMAS), Taipei, May 2011. IFAA-
MAS.
[41] John R. Searle. Speech Acts. Cambridge University Press, Cambridge, UK, 1969.
[42] John R. Searle. The Construction of Social Reality. Free Press, New York, 1995.
[46] Munindar P. Singh. A social semantics for agent communication languages. In Pro-
ceedings of the 1999 IJCAI Workshop on Agent Communication Languages, volume
1916 of Lecture Notes in Artificial Intelligence, pages 31–45, Berlin, 2000. Springer.
[50] Munindar P. Singh. LoST: Local state transfer—An architectural style for the dis-
tributed enactment of business protocols. In Proceedings of the 7th IEEE Interna-
tional Conference on Web Services (ICWS), pages 57–64, Washington, DC, 2011.
IEEE Computer Society.
[51] Munindar P. Singh, Amit K. Chopra, and Nirmit Desai. Commitment-based service-
oriented architecture. IEEE Computer, 42(11):72–79, November 2009.
[52] Munindar P. Singh, Amit K. Chopra, Nirmit Desai, and Ashok U. Mallya. Protocols
for processes: Programming in the large for open systems. ACM SIGPLAN Notices,
39(12):73–83, December 2004.
Chapter 3 141
[54] Wil M. P. van der Aalst and Maja Pesic. DecSerFlow: Towards a truly declarative
service flow language. In Proceedings of the 3rd International Workshop on Web
Services and Formal Methods, volume 4184 of LNCS, pages 1–23. Springer, 2006.
[55] Javier Vázquez-Salceda, Virginia Dignum, and Frank Dignum. Organizing multi-
agent systems. Autonomous Agents and Multi-Agent Systems, 11(3):307–360, 2005.
[56] Michael Winikoff, Wei Liu, and James Harland. Enhancing commitment machines.
In Proceedings of the 2nd International Workshop on Declarative Agent Languages
and Technologies (DALT), volume 3476 of LNAI, pages 198–220, Berlin, 2005.
Springer-Verlag.
[57] WS-CDL. Web services choreography description language version 1.0, November
2005. www.w3.org/TR/ws-cdl-10/.
[58] Daniel M. Yellin and Robert E. Strom. Protocol specifications and component adap-
tors. ACM Transactions on Programming Languages and Systems, 19(2):292–333,
1997.
[59] Pınar Yolum. Design time analysis of multiagent protocols. Data and Knowledge
Engineering Journal, 63:137–154, 2007.
[60] Pınar Yolum and Munindar P. Singh. Flexible protocol specification and execution:
Applying event calculus planning using commitments. In Proceedings of the 1st In-
ternational Joint Conference on Autonomous Agents and MultiAgent Systems, pages
527–534. ACM Press, July 2002.
Chapter 4
1 Introduction
Negotiation is a form of interaction in which a group of agents with conflicting
interests try to come to a mutually acceptable agreement over some outcome. The
outcome is typically represented in terms of the allocation of resources (commodi-
ties, services, time, money, CPU cycles, etc.). Agents’ interests are conflicting in
the sense that they cannot be simultaneously satisfied, either partially or fully.
Since there are usually many different possible outcomes, negotiation can be seen
as a “distributed search through a space of potential agreements” [36].
Negotiation is fundamental to distributed computing and multiagent systems.
This is because agents often cannot fulfill their design objectives on their own,
but instead need to exchange resources with others. After an informal discussion
of the different aspects of negotiation problems (Section 2), we turn to the use
of game theory to analyze strategic interaction in simple single-issue negotiation
(Section 3). Next, we talk about game-theoretic analysis of multi-issue negotiation
(Section 4).
After covering game-theoretic approaches, we describe various heuristic ap-
proaches for bilateral negotiation (Section 5). These approaches are necessary
when negotiation involves solving computationally hard problems, or when as-
sumptions underlying the game-theoretic approaches are violated. We also ex-
plore recent developments in agent-human negotiation (Section 6) and work on
logic-based argumentation in negotiation.
144 Chapter 4
We note that this chapter is concerned with the study of bilateral negotiation
in multiagent systems; that is, negotiation involving two agents. Multiparty nego-
tiation is often conducted under the banner of auctions and are outside the scope
of this chapter.
2 Aspects of Negotiation
Any negotiation problem requires defining the following main ingredients: (i)
the set of possible outcomes; (ii) the agents conducting the negotiation; (iii) the
protocol according to which agents search for a specific agreement in this space;
and (iv) the individual strategies that determine the agents’ behavior, in light of
their preferences over the outcomes.
The first ingredient in a negotiation scenario is the negotiation object, which
defines the set of possible outcomes. Abstractly, we can simply think of a space O
of possible outcomes (or deals or agreements), which can be defined in arbitrary
ways. There are many ways to define the set of possible outcomes concretely. The
simplest possible way is a single-issue negotiation scenario, in which outcomes
are described as members in discrete or continuous sets of outcomes. For example,
two agents may be negotiating over how to divide a tank of petrol (or some other
resource), and the question is who gets how much petrol. The set of possible
outcomes may be represented as a number in the interval [0, 1], each of which
represents a percentage that goes to the first agent, with the rest going to the
second agent. Petrol is an example of a continuous issue. Alternatively, the issue
may be defined in terms of a set of discrete outcomes, such as a set of 8 time slots
to allocate to multiple teachers.
Single-issue negotiation contrasts with multi-issue negotiation, in which the
set of outcomes is defined in terms of multiple (possibly independent) issues. For
example, two people may negotiate over both the time and place of dinner. Each
of these represents one of the issues under negotiation, and an outcome is defined
in terms of combinations of choices over these issues (i.e., a specific restaurant
at a specific time). In general, given a set of issues (or attributes) A1 , . . . , An ,
each ranging over a (discrete or continuous) domain Dom(Ai ), then the space of
possible outcomes is the Cartesian product ∏ni=1 Dom(Ai ).
There are other approaches to defining the space of possible outcomes of ne-
gotiation. In their classic book, Rosenschein and Zlotkin distinguished between
three different types of domains in terms of the nature of the negotiation object
[66]:
ent task allocations; each agent tries to minimize the cost of the tasks it has
to execute.
In general, it may be possible to use any suitable approach to define the space of
possible outcomes, including the use of expressive logical languages for describ-
ing combinatorial structures of negotiation objects.
The very nature of competition over resources means that different agents pre-
fer different allocations of the resources in question. Hence, we need to capture
the individual agent preference over the set Ψ of possible deals. Preferences of
agent i can be captured using a binary preference relation i over Ψ, and we de-
note by o1 i o2 that for agent i, outcome o1 is at least as good as outcome o2 .
It is also common to use o1 i o2 to denote that o1 i o2 and it is not the case
that o2 i o1 (this is called strict preference). Economists and decision theorists
consider a preference relation to be rational if it is both transitive and complete
[53].
It is worth noting that agents may already have a particular allocation of re-
sources before they begin negotiation. Negotiation becomes an attempt to real-
locate the resources in order to reach a new allocation that is more preferable to
both. In this case, the conflict deal (also known as the no negotiation alternative)
refers to the situation in which agents do not reach an agreement in negotiation.
One way to define the preference relation of agent i is in terms of a utility
function U i : O → R+ , which assigns a real number to each possible outcome.
The utility function U i (.) represents the relation i if we have U i (o1 ) ≥ U i (o2 ) if
and only if o1 i o2 .
In a sense, the utility function (and corresponding preference relation) cap-
tures the level of satisfaction of an agent with a particular deal. A rational agent
attempts to reach a deal that maximizes the utility it receives.
In the case of multi-issue negotiation, it may be possible to define a multi-
attribute utility function, U i : A1 × · · · × An → R+ which maps a vector of at-
tribute values to a real number. And if the attributes are preferentially independent,
then the utility function can be defined using a linear combination of sub-utility
functions over the individual attributes. In other words, the utility of outcome
146 Chapter 4
a
deny confess
deny −1, −1 −3, 0
b
confess 0, −3 −2, −2
to deviate from an agreement in order to improve its utility. Thus the outcome
of a game when agreements are binding may be different from the outcome of
the same game when they are not [7]. This difference can be illustrated with
the Prisoner’s Dilemma game given in Table 4.1. Assume that this game is non-
cooperative. Then the dominant strategy for both players will be to confess. The
equilibrium outcome would be (−2, −2) which is not Pareto optimal. In contrast,
if the same game was played as a cooperative game, and the players agreed not
to confess, then both players would benefit. The agreement (deny, deny) would
be binding and the resulting outcome (−1, −1) would be Pareto optimal. This
outcome would also be better from each individual player’s perspective as it would
increase each player’s utility from −2 to −1.
Note that the outcome of a game with binding agreements need not neces-
sarily be better than the outcome for the same game with agreements that are
not binding. The former are aimed at reaching agreements that are a reasonable
compromise. However, from an individual player’s perspective, the latter may be
better than the former. In more detail, cooperative and non-cooperative bargaining
is modeled as follows.
Nash defined a solution without modeling the details of the negotiation process.
In this approach, there is a set of possible or feasible outcomes, some of which
are acceptable or reasonable outcomes. The problem then is to find a bargaining
function that maps the set of possible outcomes to the set of acceptable ones.
Nash idealized the bargaining problem by assuming that the two individuals
are perfectly rational, that each can accurately compare its preferences for the
possible outcomes, that they are equal in bargaining skill, and that each has com-
plete knowledge of the preferences of the other. Under these assumptions, Nash
formed a mathematical model of the situation. In this model, he employed numer-
ical utilities to express the preferences of each individual, and each individual’s
desire to maximize its own gain.
More formally, Nash defined a two-person bargaining problem as follows.
There are two players (say a and b) who want to come to an agreement over
the alternatives in an arbitrary set A. Failure to reach an agreement, i.e., disagree-
ment, is represented by a designated outcome denoted {D}. Agent i ∈ {a, b} has
a von Neumann-Morgenstern utility function U i defined as follows:
U i : {A ∪ {D}} → R for i = a, b
The set of all utility pairs that result from an agreement is called the bargaining
set. This set is denoted S where
and the utility pair that results from disagreement is denoted d = (d a , d b ), where
d i = U i (D). The point d ∈ R2 is called the disagreement point or threat point.
Thus, if the players reach an agreement z ∈ A, then a gets a utility of U a (z) and b
gets U b (z). But if they do not reach an agreement, then the game ends in the dis-
agreement point d, where a gets utility d a and b gets d b . Given this, the bargaining
problem is defined as follows:
is non-empty and bounded. The assumption that S is convex is the same as as-
suming that players can agree on jointly randomized strategies, such that, if the
utility allocations x = (xa , xb ) and y = (ya , yb ) are feasible and 0 ≤ θ ≤ 1, then the
expected utility allocation θx + (1 − θ)y can be achieved by planning to imple-
ment x with probability θ and y otherwise. Closure of S is a natural topological
requirement. The non-emptiness and boundedness condition means that not all
feasible allocations are worse than disagreement for both players, and unbounded
gains over the disagreement point are impossible.
The bargaining problem is solved by stating general properties (or axioms)
that a reasonable solution should possess. By specifying enough such properties
one can exclude all but one solution. For example, a reasonable solution must
be individual rational, i.e., it must give each player at least as much utility as it
would get in the event of no agreement. So individual rationality is an axiom. The
term “reasonable solution” has no standard definition. Different axiomatic mod-
els define this term differently [68]. Nash’s [55] idea of a reasonable solution is
based on the assumption that when two players negotiate or an impartial arbitrator
arbitrates, the payoff allocations that the two players ultimately get should depend
only on the following two factors:
1. the set of payoff allocations that are jointly feasible for the two players in
the process of negotiation or arbitration, and
Axiom 1 (Individual Rationality) This axiom asserts that the bargaining solu-
tion should give neither player less than what it would get from disagree-
ment, i.e., f (S, d) ≥ d.
Axiom 3 (Strong Efficiency) This axiom asserts that the bargaining solution
should be feasible and Pareto optimal.
150 Chapter 4
Axiom 4 (Invariance) According to this axiom, the solution should not change
as a result of linear changes to the utility of either player. So, for example,
if a player’s utility function is multiplied by 2, this should not change the
solution. Only the player will value what it gets twice as much.
Nash proved that the bargaining solution that satisfies the above five axioms is
given by:
f (S, d) ∈ argmax (xa − d a )(xb − d b )
x∈S,x≥d
the Nash fixed threat game, the Nash variable threat game does not, in general,
guarantee existence of a solution.
Following Nash’s work, several other bargaining solution concepts were pro-
posed using other systems of axioms [68]. Nash’s work was also extended to
bargaining with incomplete information [31, 76]. In general, for these axiomatic
models of bargaining, the solution depends only on two factors: the set of possi-
ble agreements and the disagreement point. However, in many practical scenarios,
the outcome of bargaining depends on other factors, such as the tactics employed
by the bargainers, the procedure through which negotiation is conducted, and the
players’ information. Non-cooperative models of bargaining [69, 70, 77] incorpo-
rate these factors.
U a = xa δt−1
a and U b = xb δt−1
b .
If this discounted game is played infinitely over time, then Rubinstein [69] showed
that there is a unique perfect equilibrium outcome in which the players’ immedi-
ately reach an agreement on the following shares:
1 − δb δb − δa δb
xa = and xb =
1 − δa δb 1 − δa δb
where player a is the first mover. The properties of uniqueness and immediate
agreement are especially desirable in the context of agents [43, 45]. However, this
infinite horizon model may not be immediately applicable to multiagent systems
since agents are typically constrained by a deadline (we will deal with deadlines
shortly).
Although Rubinstein’s model may not be directly applicable to the design of
automated negotiating agents, it provides two key intuitive insights. First, in fric-
tionless1 bargaining, there is nothing to prevent the players from haggling for as
long as they wish. It seems intuitive that the cost of haggling serves as an incen-
tive for the players to reach an agreement. Second, a player’s bargaining power
depends on the relative magnitude of the players’ respective costs of haggling.
The absolute magnitudes of these costs are irrelevant to the bargaining outcome.
It is now clear how the discount factor can influence negotiation. Apart from
the discount factor, a deadline can also impact on negotiation. Deadlines are
important because, in many applications [37], the agents must reach an agreement
within a time limit. So let us now study negotiations that are constrained by both
a discount factor and a deadline.
Work on negotiation with deadlines and discount factors includes [18, 25, 73].
While Fatima et al. [18] and Gatti et al. [25] use the alternating offers protocol,
Sandholm et al. [73] use a simultaneous offers protocol. Also, while [18] con-
siders negotiation over a pie, [25] considers the players’ reserve prices. Since the
1 If it does not cost the players anything to make offers and counteroffers, the bargaining process
where UA(t) (UB(t)) denotes a’s (b’s) equilibrium utility for time t. An agree-
ment takes place at t = 1.
The above model was also analyzed in [18] in an incomplete information set-
ting with uncertainty about utility functions. In contrast, [25, 73] consider uncer-
tainty over the negotiation deadline. Also, as we mentioned earlier, a key differ-
ence between [18] and [73] is that the former uses the alternating offers protocol,
while the latter uses a simultaneous offers protocol and treats time as a continuous
variable. These differences in the setting result in the following differences in the
outcomes. First, for the former, an agreement can occur in the first time period
and the entire surplus does not necessarily go to just one of the players. For the
latter, an agreement only occurs at the earlier deadline and the entire surplus goes
to the player with the later deadline (different players have different deadlines).
Second, unlike the former, the deadline effect in the latter completely overrides
the effect of the discount factor. For the former, an agent’s share of the surplus
depends on both the deadline and the discount factor. These differences show that
the protocol is a key determinant of the outcome of negotiation.
However, apart from the protocol, the parameters of negotiation (such as the
deadline and the discount factor) also influence an agent’s share. For the alter-
nating offers protocol, [19] shows how the deadline and discount factor affects an
agent’s share.
Before closing this section, we will provide a brief overview of some of the
key approaches for analyzing games with incomplete information. Incomplete
information games are those where either or both players are uncertain about some
parameters of the game, such as the utility functions, the strategies available to the
players, the discount factors, etc. Furthermore, for sequential games, such as the
alternating offers game described in Section 3.2, the players may acquire new
information, and so, their information may change during the course of play.
There is a widely used approach for dealing with such incomplete information
cases. This approach was originated by Harsanyi [28, 29, 30] in the context of
simultaneous move games. In this approach, a player is assumed to have beliefs,
in the form of a random variable, about an uncertain parameter. Thus, there is a set
of possible values for the parameter, and a probability distribution over these pos-
sible values. And the uncertain parameter is determined by the realization of this
random variable. Although the random variable’s actual realization is observed
only by the player, its ex-ante probability distribution is assumed to be common
knowledge to the players. Such a game is called a Bayesian game and the related
equilibrium notion is Bayesian Nash equilibrium.
Although a Bayesian game deals with incomplete information, it is a simul-
taneous moves game. However, most multiagent negotiations require agents to
choose actions over time. Dynamic games [53] are a way of modeling such nego-
Chapter 4 155
Having looked at both the axiomatic and the non-cooperative models, let us
now examine the similarities and differences between them. A key similarity
is that, in both cases, there is a degree of conflict between the agents’ interest,
but there is also room for them to cooperate by resolving the conflict. Another
similarity is between the solutions for the Nash bargaining model and Rubinstein’s
model: as the discount factor approaches 1 (i.e., as the players become more
patient), the solution to the latter converges to the solution to the former. In this
solution, the pie is split almost equally between the players.
A key difference between these two models is the way in which players are
modeled and the way in which cooperation is enforced. In non-cooperative game
theory, the basic modeling unit is the individual, and cooperation between individ-
uals is self-enforcing. In contrast, in cooperative game theory the basic modeling
unit is the group, and players can enforce cooperation in the group through a third
party.
1. Global bargaining: Here, the bargaining agents directly tackle the global
problem in which all the issues are addressed at once. In the context of
non-cooperative theory, the global bargaining procedure is also called the
package deal procedure. In this procedure, an offer from one agent to the
other would specify how each one of the issues is to be resolved.
The dependence of the outcome of bargaining on the procedure has been rec-
ognized in the context of both axiomatic and non-cooperative models. We will
describe some of the key results of the former in Section 4.1 and of the latter in
Section 4.2.
The strategic behavior for the package deal procedure was analyzed in [18].
Below, we will describe this model in the context of the complete information set-
ting (see [18] for details regarding strategic behavior in an incomplete information
setting). Here, a and b negotiate over m > 1 divisible issues. These issues are m
distinct pies and the agents want to determine how to split each of them. The set
S = {1, 2, . . . , m} denotes the set of m pies. As before, each pie is of size 1. For
both agents, the discount factor is δ for all the issues (in [18], the discount factor
is different for different issues, but for ease of discussion we will let it be the same
for all the issues). For each issue, n denotes each agent’s deadline.
In the offer for time period t (where 1 ≤ t ≤ n), agent a’s (b’s) share for each of
the m issues is represented as an m element vector xa (xb ) such that, for i ≤ i ≤ m,
0 ≤ xia ≤ 1 and xia + xib = 1. Thus, if agent a’s share for issue c at time t is xca , then
agent b’s share is xcb = (1 − xca ). The shares for a and b are together represented as
the package (xa , xb ).
An agent’s cumulative utility is linear and additive [41]. The functions U a and
U b give the cumulative utilities for a and b respectively at time t and are defined
as follows:
Σmc=1 wc δ
a t−1 xa if t ≤ n
c
U a ((xa , xb ),t) = (4.1)
0 otherwise
c=1 wc δ
Σm b t−1 xb if t ≤ n
c
U b ((xa , xb ),t) = (4.2)
0 otherwise
such a vector for b. These vectors indicate how the agents prefer different issues.
For example, if wac > wac+1 , then agent a values issue c more than issue c + 1.
Likewise for agent b.
It is clear from the above definition of utility functions that the parties may
have different preferences over the issue. So, during the process of negotiation, it
might be possible for an agent to perform trade-offs across the issues to improve
its utility. Since the utilities are linear, the problem of making trade-offs becomes
computationally tractable.4 .
For this setting, let us see how we can determine an equilibrium for the pack-
age deal procedure. Since there is a deadline, we can find an equilibrium using
backward induction (as was done for single-issue negotiation). However, since an
4 For a non-linear utility function, an agent’s trade-off problem becomes a non-linear optimiza-
tion problem Due to the computational complexity of such an optimization problem, a solution
can only be found using approximation methods [4, 32, 42]. Moreover, these methods are not gen-
eral in that they depend on how the cumulative utilities are actually defined. Since we use linear
utilities, the trade-off problem will be a linear optimization problem, the exact solution to which
can be found in polynomial time.
Chapter 4 159
offer for the package deal must include a share for all the m issues, an agent must
now make trade-offs across the issues in order to maximize its cumulative utility.
For a time t, agent a’s trade-off problem (TA(t)) is to find an allocation (xa , xb )
that solves the following optimization problem:
Maximize Σm a a
c=1 wc xc
subject to δt−1 Σm
c=1 wc (1 − xc ) ≥ UB(t + 1)
b a
The problem TA(t) is nothing but the well-known fractional knapsack problem.5
Let SA(TA(t)) denote a solution to TA(t). For agent b, T B(t) and SB(T B(t)) are
analogous. Given this, Theorem 4.1 (taken from [18]) characterizes an equilib-
rium for the package deal procedure. Here, STRATA(t) (STRATB(t)) denotes a’s
(b’s) equilibrium strategy for time t.
Theorem 4.1 For the package deal procedure, the following strategies form a
subgame perfect equilibrium. The equilibrium strategy for t = n is:
One can easily verify that the outcome of the package deal procedure will be
Pareto optimal.
For the separate procedure, the m issues are negotiated independently of each
other. So the equilibrium for the individual issues will be the same as that for
single-issue negotiation.
For the sequential procedure with independent implementation, the m issues
are negotiated sequentially one after another. But since the agreement on an is-
sue goes into effect immediately, the equilibrium for the individual issues can be
obtained in the same way as that for single-issue negotiation. An issue will be
negotiated only after the previous one is settled.
For the sequential procedure with simultaneous implementation, the m issues
are negotiated sequentially one after another. But since the agreement on an issue
goes into effect only after all the m issues are settled, the basic idea for obtaining
an equilibrium for this procedure will be the same as that for the package deal
procedure.
Busch and Horstman [6] consider two issues and show how the outcome for
sequential negotiation with independent implementation can differ from the out-
come for simultaneous negotiations.
For the sequential procedure with independent implementation, a key determi-
nant of the outcome of negotiation is the agenda. In the context of this procedure,
the term agenda means the order in which the issues are settled. The importance of
the agenda was first recognized by Schelling [75]. This initiated research on agen-
das for multi-issue negotiation. The existing literature on agendas can broadly be
divided into two types: those that treat the agenda as an endogenous variable
[3, 33, 34], and those that treat it as an exogenous [20] variable.
The difference between these two types of agendas is that, for the former, an
agenda is selected during the process of negotiating over the issues. For the latter,
an agenda is decided first, and then the parties negotiate over the issues on the
agenda.
Research on endogenous agendas includes [3, 33, 34]. Bac and Raff [3] deal
with two divisible issues in the context of an incomplete information setting. Here,
the uncertainty is about the discount factor. For this setting, they study the prop-
erties of resulting equilibrium. While [3] deals with two divisible issues, Inderst
[34] deals with multiple divisible issues and potentially infinite discrete time pe-
riods. Here, the parties are allowed to make an offer on any subset of a given
set of issues but can only accept/reject a complete offer. In this sense, an agenda
is selected “endogenously.” For this setting, he showed existence and unique-
ness conditions for equilibrium under the complete information assumption. In
and Serrano [33] generalized these results to a larger class of utility functions by
considering a complete information setting. Similar work in the context of an
Chapter 4 161
incomplete information setting was later dealt with by Fatima et al. in [17] .
Fershtman [20] dealt with exogenous agendas. For the complete information
setting, he considered two divisible issues, and showed how the order in which
they are negotiated affects the outcome.
a(t)
1
Conceder
Linear
0.5
Boulware
1
0.5
/ amax
t/t
j
negotiation. Assume that issue j represents the “price” of a product. If xa→b (t)
denotes the price offered by a at time t, then we have the following equation for
computing an offer:
where maxa and mina denote a’s reserve prices. Here, αa (t) is a function such that
0 ≤ αa (t) ≤ 1, αa (0) = κa , and αa (tmax
a ) = 1. So at the beginning of negotiation,
a a ) is reached, the offer
the offer will be a constant (κ ), and when the deadline (tmax
will be a’s reserve price.
A family of time dependent strategies can be defined by varying the defini-
tion of the function αa (t). For instance, αa (t) could be defined as a polynomial
function parameterized by β ∈ R+ as follows:
αa (t) = κa + (1 − κa )(min(t,tmax
a
)/tmax
a
)1/β
to make all deliveries, while minimizing transportation cost (i.e., driven mileage),
subject to the following constraints:
• Each vehicle has to begin and end its tour at the depot of its center (but
neither the pickup nor the drop-off locations of the orders need to be at the
depot).
• Each vehicle has a maximum load weight constraint. These may differ
among vehicles.
• Each vehicle has a maximum load volume constraint. These may differ
among vehicles.
choices are framed [78], are averse to inequity [5], and are willing to engage in
irrational behavior such as costly punishment [13].
Given this, one often cannot assume that humans will follow the equilib-
rium strategy, or even be utility maximizers. Instead, one must endow software
agents with both (i) predictive behavioral models of how humans negotiate, and
(ii) decision-making algorithms that take these models into account as they guide
the agent’s behavior. Modeling human negotiation is a non-trivial problem, how-
ever. As one can imagine, modeling human negotiation behavior is a non-trivial
endeavor, and a field of study in its own right in the fields of psychology and
business study [49].
The above challenge is further complicated by the fact that the purpose of
agents that negotiate with humans can vary. Following are some possible objec-
tives of such agents:
• Outperform human negotiators in a web-based market.
• Train people in negotiation skills to help them negotiate with other people.
Depending on the purpose of the agent, the types of behavioral and decision-
making models may differ substantially. For example, an agent that is designed
to train people in negotiation skills would need to mimic other humans, while an
agent trying to make money in an online market simply needs to maximize profit.
One of the earliest agents capable of negotiating with humans was designed
by Kraus and Lehmann to play the game Diplomacy using a variety of heuristics
[44]. Surprisingly, humans were unable to discern whether they were playing with
a human or an agent. More recently, Katz and Kraus introduced agents that use
reinforcement learning to participate in single-shot auctions or games, and have
been shown to achieve higher payoffs than humans [39]. Building on this work,
they later introduced gender-sensitive learning, which achieves even better results
[40].
Another significant line of work builds on the Colored Trails (CT) platform,
which is a software infrastructure for investigating decision making in groups
comprising people, computer agents, and a mix of these two [27]. Although the
CT framework is highly customizable, most work focused on an n × m board of
colored squares, around which individual players can move. Each player has a
designated goal square, and can move to it provided it possesses chips that match
the colors of the squares along the path. Each player is initially endowed with
different colored chips, which may or may not be sufficient for reaching its in-
dividual goal. Thus, players may need to negotiate with one another in order to
redistribute those chips. Figure 4.2 shows a screen shot of a CT game [50], dis-
playing the board, the player’s location (marked “me”), the player’s goal (marked
“G”), the negotiation counterpart (marked as a square), the chips each player has,
Chapter 4 167
and a proposal panel that the player is using to prepare an offer to send to the coun-
terpart. Note that, in this case, the player cannot see the counterpart’s goal. Other
variants are possible, for example in which the players do not see each other’s
current chip endowment.
7 Argumentation-Based Negotiation
Game-theoretic and heuristics-based approaches to automated negotiation are
characterized by the exchange of offers between parties with conflicting positions
and are commonly referred to as proposal-based approaches. That is, agents ex-
change proposed agreements – in the form of bids or offers – and when proposed
168 Chapter 4
deals are not accepted, the possible response is either a counterproposal or with-
drawal. Argumentation-based negotiation (ABN) approaches, on the other hand,
enable agents to exchange additional meta-information (i.e., arguments) during
negotiation [63].
Consider the case in which an agent may not be aware of some alternative
plans of achieving some goal. Exchanging this information may enable agents
to reach agreements not previously possible. This was shown through the well-
known painting/mirror hanging example presented by Parsons et al. [57]. The
example concerns two home-improvement agents – agent i trying to hang a paint-
ing, and agent j trying to hang a mirror. There is only one way to hang a painting,
using a nail and a hammer. But there are two ways of hanging a mirror, using a
nail and a hammer or using a screw and a driver, but j is only aware of the former.
Agent i possesses a screw, a screw driver, and a hammer, but needs a nail in addi-
tion to the hammer to hang the painting. On the other hand, j possesses a nail, and
believes that to hang the mirror, it needs a hammer in addition to the nail. Now,
consider the dialogue depicted in Figure 4.3 (described here in natural language)
between the two agents.
Can you sell No. Can you sell me Why do you need I need it to
me the nail? the hammer? to keep the nail? hang a mirror
(a) (b)
(c) (b)
As the figure shows, at first, j was not willing to give away the nail because
it needed it to achieve its goal. But after finding out the reason for rejection, i
Chapter 4 169
managed to persuade j to give away the nail by providing an alternative plan for
achieving the latter’s goal.
This type of negotiation dialogue requires a communication protocol that en-
ables agents to conduct a discussion about a domain of interest using a shared
vocabulary (i.e., ontology). Furthermore, it requires the ability to present justifi-
cations of one’s position, as well as counterarguments that influence the counter-
part’s mental state (e.g., its goals, beliefs, plans) [46]. Consequently, much work
on ABN builds on logic-based argumentation protocols. For example, Parsons et
al. [57] present a framework based on the logic-based argumentation framework
of Elvang-Gøransson et al. [12]. The framework of Sadri et al. [71] uses abduc-
tive logic programming [21]. Other frameworks allow the exchange of a variety
of other information relevant to negotiation, such as threats and rewards [2, 65],
or claims about social rights and obligations [38].
There has been some work on characterizing the outcomes of argument-based
negotiation (e.g., see [62] or [1]). However, the connection between ABN for-
malisms and classical models of negotiation is still not well-developed. For exam-
ple, it is not clear whether and how a particular argumentation protocol achieves
properties like Pareto optimality or independence of irrelevant alternatives. In ad-
dition, most existing models of argumentation lack an explicit model of strategic
behavior. This is a major drawback, since agents may withhold or misreport argu-
ments in order to influence the negotiation outcome to their own advantage. For
a recent discussion on strategic behavior in argumentation, and initial attempts at
grounding it in game theory, see [60, 61].
8 Conclusions
In this chapter, we studied some of the key concepts and techniques used for mod-
eling bargaining. Specifically, we looked at the axiomatic and non-cooperative
models of bargaining. Each of these two approaches has strengths and limita-
tions. For example, a main strength of the Nash bargaining model is its simplicity
and the uniqueness of its solution. However, this approach can be criticized be-
cause it ignores the whole process of making offers and counteroffers, and the
possibility of a breakdown. The model may therefore be more relevant to bar-
gaining with arbitration. In non-cooperative bargaining, the process of making
offers and counteroffers is modeled as an alternating offers game. But the solu-
tion to this model assumes that both players are able to apply backward induc-
tion logic. This assumption has been criticized because performing backward in-
duction can sometimes require complex computations regarding events that never
actually take place [47]. Despite the differences, the axiomatic and strategic ap-
proaches can sometimes be complementary in that each can help to justify and
170 Chapter 4
Acknowledgments
The first author acknowledges the EPSRC grant EP/G000980/1 as some parts of
the chapter are based on results of this project. Some parts are also based on joint
work with Michael Wooldridge and Nicholas Jennings. We are very thankful to
both of them. The topic of bargaining is too broad to cover in a single chapter.
Our apologies for not being able to include a lot of important work due to space
limitations.
9 Exercises
1. Level 1 Suppose that x ∈ S, y ∈ S, xa = d a , yb = d b , and 0.5x + 0.5y is
a strongly efficient allocation in S. Find the Nash bargaining solution of
(S, d).
2. Level 1 Suppose that there are two divisible pies (x and y) of unit size. Two
players (a and b) bargain over the division of these pies using the package
deal procedure. Let the negotiation deadline be 3 and the discount factor
be δ = 0.25. Assume that a has utility function U a = 2xa + ya and b has
U b = xb + 3yb , where xa and ya denote a’s allocation of the two pies and xb
and yb denote b’s allocation. Under the complete information assumption,
when will an agreement be reached and what will each player’s equilibrium
allocation be?
Chapter 4 171
3. Level 2 Develop a program that will take the number of issues, the negoti-
ation deadline, the discount factor, and the two players’ utility functions as
input, and generate the equilibrium allocation for the package deal proce-
dure under the complete information assumption.
4. Level 3 Design a software negotiating agent that can generate offers using
the negotiation decision functions described in Section 5.1. Do this first for
single-issue negotiation and then for multi-issue negotiation.
References
[1] L. Amgoud, Y. Dimopoulos, and P. Moraitis. A unified and general framework for
argumentation-based negotiation. In AAMAS ’07: Proceedings of the 6th Interna-
tional Joint Conference on Autonomous Agents and Multiagent Systems, New York,
NY, USA, 2007. ACM.
[2] L. Amgoud and H. Prade. Handling threats, rewards and explanatory arguments in
a unified setting. International Journal of Intelligent Systems, 20(12):1195–1218,
2005.
[3] M. Bac and H. Raff. Issue-by-issue negotiations: the role of information and time
preference. Games and Economic Behavior, 13:125–134, 1996.
[5] G.E. Bolton. A comparative model of bargaining: Theory and evidence. The Amer-
ican Economic Review, pages 1096–1136, 1991.
[11] S. D’souza, Y. Gal, P. Pasquier, S. Abdallah, and I. Rahwan. Reasoning about goal
revelation in human negotiation. Intelligent Systems, IEEE, (99).
[14] P. Faratin, C. Sierra, and N. R. Jennings. Negotiation decision functions for au-
tonomous agents. International Journal of Robotics and Autonomous Systems, 24(3-
4):159–182, 1998.
[15] P. Faratin, C. Sierra, and N. R. Jennings. Using similarity criteria to make trade-offs
in automated negotiations. Artificial Intelligence Journal, 142(2):205–237, 2002.
[16] S. S. Fatima and A. Kattan. Evolving optimal agendas for package deal negotiation.
In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO–
2011), pages 505–512, Dublin, Ireland, July 2011.
[20] C. Fershtman. The importance of the agenda in bargaining. Games and Economic
Behavior, 2:224–238, 1990.
[21] T. Fung and Robert Kowalski. The IFF proof procedure for abductive logic pro-
gramming. Journal of Logic Programming, 33(1):151–165, 1997.
[25] N. Gatti, F. Giunta, and S. Marino. Alternating offers bargaining with one-sided un-
certain deadlines: An efficient algorithm. Artificial Intelligence Journal, 172:1119–
1157, 2008.
[26] G. Gigerenzer and R. Selten, editors. Bounded Rationality: The Adaptive Toolbox.
Dahlem Workshop Reports. MIT Press, Cambridge MA, USA, 2002.
[27] B.J. Grosz, S. Kraus, S. Talman, B. Stossel, and M. Havlin. The influence of so-
cial dependencies on decision-making: Initial investigations with a new game. In
Proceedings of the Third International Joint Conference on Autonomous Agents and
Multiagent Systems-Volume 2, pages 782–789. IEEE Computer Society, 2004.
[31] J. C. Harsanyi and R. Selten. A generalized Nash solution for two-person bargaining
games with incomplete information. Management Science, 18(5):80–106, January
1972.
[34] R. Inderst. Multi-issue bargaining with endogenous agenda. Games and Economic
Behavior, 30:64–82, 2000.
[35] T. Ito, H. Hattori, and M. Klein. Multi-issue negotiation protocol for agents: Explor-
ing nonlinear utility spaces. In Proc. of the Twentieth Int. Joint Conf. on Artificial
Intelligence, pages 1347–1352, 2007.
[37] N.R. Jennings, P. Faratin, T.J. Norman, P. O’Brien, B. Odgers, and J. L. Alty. Imple-
menting a business process management system using ADEPT: A real-world case
study. Int. Journal of Applied Artificial Intelligence, 14(5):421–465, 2000.
[39] R. Katz and S. Kraus. Efficient agents for cliff-edge environments with a large set
of decision options. In Proceedings of the Fifth International Joint Conference on
Autonomous Agents and Multiagent Systems, pages 697–704. ACM, 2006.
[41] R. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value
Trade-offs. New York: John Wiley, 1976.
[43] S. Kraus. Strategic Negotiation in Multiagent Environments. The MIT Press, Cam-
bridge, Massachusetts, 2001.
[44] S. Kraus and D. Lehmann. Designing and building a negotiating automated agent.
Computational Intelligence, 11(1):132–171, 1995.
[45] S. Kraus, J. Wilkenfeld, and G. Zlotkin. Negotiation under time constraints. Artifi-
cial Intelligence Journal, 75(2):297–345, 1995.
[46] Sarit Kraus, Katia Sycara, and Amir Evenchik. Reaching agreements through ar-
gumentation: A logical model and implementation. Artificial Intelligence, 104(1–
2):1–69, 1998.
[47] D. M. Kreps. Game Theory and Economic Modeling. Oxford: Clarendon Press,
1993.
[49] Roy J. Lewicki, David M. Saunders, John W. Minton, and Bruce Barry. Negotiation.
McGraw-Hill/Irwin, New York NY, USA, fourth edition, 2003.
[50] R. Lin and S. Kraus. Can automated agents proficiently negotiate with humans?
Communications of the ACM, 53(1):78–88, 2010.
Chapter 4 175
[51] R. Lin, S. Kraus, Y. Oshrat, Y.K. Gal, et al. Facilitating the evaluation of auto-
mated negotiators using peer designed agents. Proc. of the 24th Association for the
Advancement of Artificial Intelligence (AAAI-2010), 2010.
[54] R.B. Myerson. Two-person bargaining problems and comparable utility. Economet-
rica, 45(7):1631–1637, 1977.
[57] Simon Parsons, Carles Sierra, and Nick Jennings. Agents that reason and negotiate
by arguing. Journal of Logic and Computation, 8(3):261–292, 1998.
[58] C. Ponsati and J. Watson. Multiple-issue bargaining and axiomatic solutions. Inter-
national Journal of Game Theory, 26:501–524, 1997.
[60] I. Rahwan and K. Larson. Argumentation and game theory. Argumentation in Arti-
ficial Intelligence, pages 321–339, 2009.
[64] H. Raiffa. The Art and Science of Negotiation. Harvard University Press, Cambridge,
USA, 1982.
[70] A. Rubinstein. A bargaining model with incomplete informations about time pref-
erences. Econometrica, 53:1151–1172, January 1985.
[71] F. Sadri, F. Toni, and P. Torroni. Logic agents, dialogues and negotiation: An ab-
ductive approach. In K. Stathis and M. Schroeder, editors, Proceedings of the AISB
2001 Symposium on Information Agents for E-Commerce, 2001.
[72] T.W. Sandhlom and V.R.T. Lesser. Coalitions among computationally bounded
agents. Artificial Intelligence, 94(1-2):99–137, 1997.
[73] T. Sandholm and N. Vulkan. Bargaining with deadlines. In AAAI-99, pages 44–51,
Orlando, FL, 1999.
[74] Tuomas Sandholm. eMediator: A next generation electronic commerce server. Com-
putational Intelligence, 18(4):656–676, 2002.
[76] A. Shaked and J. Sutton. Two-person bargaining problems with incomplete infor-
mation. Econometrica, 52:461–488, 1984.
[78] A. Tversky and D. Kahneman. The framing of decisions and the psychology of
choice. Science, 211:453–458, 1981.
Chapter 5
Iyad Rahwan
1 Introduction
The theory of argumentation is a rich, interdisciplinary area of research spanning
philosophy, communication studies, linguistics, and psychology [53]. Argumen-
tation can be seen as a reasoning process consisting of the following four steps:
2 What Is an Argument?
Among argumentation theorists in philosophy, the term “argument” usually refers
to “the giving of reasons to support or criticize a claim that is questionable, or open
to doubt” [59]. This distinguishes argumentation from deductive mathematical
inference, in which the conclusions follow necessarily from the premises. Here,
I give a very brief overview of the major approaches to formalizing this notion in
the AI literature.
an extensible ontology for describing argument structures [11]. Yet another for-
malization of Walton’s scheme was presented by Gordon et al. in their Carneades
model [18].
Į3 Į2
Į1
Į5 Į4
3 Evaluating an Argument
To evaluate whether an argument is acceptable (according to some logical se-
mantics), we need to take into account how it interacts with other arguments.
This turns out to be a non-trivial problem, and the source of much research [2]
and controversy [9]. Let S+ = {β ∈ A | α β for some α ∈ S}. Also let
182 Chapter 5
Example 5.5 In Figure 5.1, the sets 0, / {α3 }, {α5 }, and {α3 , α5 } are all admis-
sible simply because they do not have any defeaters. The set {α1 , α3 , α5 } is also
admissible since it defends itself against both defeaters α2 and α4 .
An admissible set S is a complete extension if and only if all arguments defended
by S are also in S (that is, if S is a fixed point of the operator F).
Example 5.6 In Figure 5.1, the admissible set {α3 , α5 } is not a complete ex-
tension, since it defends α1 but does not include α1 . Similarly, sets {α3 } and
{α5 } are not complete extensions, since F({α3 }) = {α3 , α5 } and F({α5 }) =
{α3 , α5 }. The admissible set {α1 , α3 , α5 } is the only complete extension, since
F({α1 , α3 , α5 }) = {α1 , α3 , α5 }.
Chapter 5 183
Example 5.7 Consider the graph in Figure 5.2. Here, we have three complete
extensions: {α3 }, {α1 , α3 }, and {α2 , α3 }.
Į2 Į1 Į2 Į1 Į2 Į1
LG Į3 L1 Į3 L2 Į3
in out undec
/ = F(F1 (0))
– F2 (0) / = {α1 , α3 , α5 };
/ = F(F2 (0))
– F3 (0) / = {α1 , α3 , α5 } = F2 (0).
/
Similarly, in Figure 5.2, the grounded extension is {α3 }, which is the minimal
complete extension w.r.t. set inclusion.
A preferred extension is a bolder, more committed position that cannot be ex-
tended – by accepting more arguments – without causing inconsistency. Thus
a preferred extension can be thought of as a maximal consistent set of hypothe-
ses. There may be multiple preferred extensions, and the grounded extension is
included in all of them.
Example 5.8 In Figure 5.1, {α1 , α3 , α5 } is the only preferred extension. But in
Figure 5.2, there are two preferred extensions: {α1 , α3 } and {α2 , α3 }, which are
the maximal complete extension w.r.t. set inclusion.
Finally, a set of arguments is a stable extension if it is a preferred extension that
defeats every argument that does not belong to it. As expected, stable extensions
may now always exist. An alternative definition is a semi-stable extension, which
satisfies the weaker condition that the set of arguments defeated is maximal.
Caminada [10] established a correspondence between properties of labelings
and the different extensions. Let AF = A,
be an argumentation framework,
and L a labeling over AF. Define in(L) = {α ∈ A | L(α) = in}; out(L) = {α ∈
A | L(α) = out}; and undec(L) = {α ∈ A | L(α) = undec}. These are summa-
rized in Table 5.1.
Chapter 5 185
Now that the acceptability of sets of arguments is defined, we can define the
status of any individual argument.
4 Argumentation Protocols
So far, I outlined some methods for evaluating an argument given an existing col-
lection of arguments. In a multiagent system, however, the arguments are not all
available a priori. Instead, they are presented by the agents during their argumen-
tative dialogue. This raises the question of how such argumentation dialogues
are to be regulated. For example, one should not be able to make statements that
are completely irrelevant, or to contradict oneself. An argumentation protocol is,
therefore, a set of rules that govern the argumentation process.
186 Chapter 5
Figure 5.3: Argumentation framework and dispute tree. (i) shows an argumenta-
tion framework, (ii) shows the dispute tree induced in a, and (iii) shows the dispute
tree induced by a under protocol G, with the winning strategy encircled.
Example 5.9 Consider two agents arguing the argument graph shown in Figure
5.3, taken from [30]. In Figure 5.3(i), we see the underlying (usually implicit) ar-
gument graph. Figure 5.3(ii) shows the corresponding dispute tree in which PRO
presents argument a, OPP counters with argument b, PRO then counters once
with argument c and once with argument d, and so on. Note that this dispute tree
is infinite, since agents are able to repeat counterarguments due to the presence
of cycles in the argument graph. As shown in the figure, arguments in the dispute
tree can be indexed to capture repetition of the same argument from the graph.
Chapter 5 187
1. The set of disputes DT in T is a non-empty and finite set such that each
d ∈ DT is finite and is won by PRO (terminates in an argument moved by
PRO).
PRO is guaranteed to win if it plays the moves described in the winning strategy
subtree. The second requirement ensures that all possible objections that can pos-
sibly be raised by OPP in one dispute can be neutralized successfully by PRO in
the same or some other dispute in the subtree.
Note that creating a dispute tree (or subtree) only requires that every argument
presented is a defeater of an argument that has already been presented. Adding
further restrictions on the moves of different players can be captured by subtrees
of the dispute tree. By adding such restrictions carefully, we can generate dia-
logue outcomes that correspond to well-known semantics. To illustrate the above,
consider the following simple protocol:
• If the dispute length is odd (next move is by OPP), then the possible next
moves are {y | y x}.
• If the dispute length is even (next move is by PRO), then the possible next
moves are {y | y x and y ∈ / PRO(D)}.
By simply prohibiting PRO from repeating himself, we ensure that the dispute
tree is finite. A more striking consequence of this simple restriction is that PRO
can only win if the argument at the root is in the grounded extension.
Going back to Figure 5.3, note that the grounded extension of the graph in Fig-
ure 5.3(i) is {a, c, e}. Figure 5.3(iii) shows the dispute tree induced by argument
a under protocol G, with the winning strategy {(a1 − b2 − c3 − d4 − e6 )}.
As it turns out, it is possible to use different protocol restrictions in order to
capture different semantics. Suppose we wish to create dialogues in which PRO
wins if and only if the argument in question is in at least one preferred extension
(i.e., is credulously accepted under preferred semantics, recalling Definition 5.8).
This can be achieved by a protocol in which two restrictions apply. OPP is not
allowed to repeat its arguments. And PRO is allowed to repeat its arguments but
cannot present a self-attacking argument, or an argument that conflicts (attacks or
is attacked by) with another argument it already stated in the dispute. For a more
comprehensive discussion of abstract argument games, including protocols that
implement other semantics, refer to [30].
suppose agent P has knowledge base {p, p →r1 q, p →r2 r, p ∧ s →r3 r2 > r4 } and
agent O has knowledge base {t,t →r4 ¬r}. These knowledge bases are specified
in Prakken and Sartor’s argument-based, prioritized, extended logic programming
language, in which rules are annotated, allowing for rules that support preferences
between other rules [38].2 The following dialogue is consistent with the above
protocol [37], with the target of each move indicated between square brackets:
P1 [−]: claim r O2 [P1 ]: why r
P3 [O2 ]: r since q, q ⇒ r O4 [P3 ]: why q
P5 [O4 ]: q since p, p ⇒ q O6 [P5 ]: concede p ⇒ q
O7 [P5 ]: why p
Note that at this point, player P has many possible moves. It can retract its
claim or premises of its argument, or give an argument in support of p. It was also
possible for player O to make the following against P’s original claim:
O7 [P3 ]: ¬r since t,t ⇒ ¬r
To this, P may respond with a priority argument, supporting the claim that rule r2
takes precedence over rule r4 , thus showing that P3 strictly defeats O7 :
P8 [O7 ]: r2 > r4 since p, s, p ∧ s ⇒ r2 > r4
At this point, P1 is in, but the dialogue may continue, with O conceding or making
further challenges, and so on.
Various other dialogue systems have been presented in the literature, such as
Walton and Krabbe’s PDD [57] and McBurney and Parsons’s Agent Dialogue
Framework [28]. A longer discussion and comparison of different dialogue sys-
tems for persuasion was presented by Prakken [35].
The above approaches are related to the earlier work on so-called game seman-
2 Note that these implications are not classical, thus they do not satisfy contraposition.
190 Chapter 5
tics for logic, which was pioneered by logicians such as Paul Lorenzen [24] and
Jaakko Hintikka [20]. Although many specific instantiations of this notion have
been presented in the literature, the general idea is as follows. Given some spe-
cific logic, the truth value of a formula is determined through a special-purpose,
multi-stage dialogue game between two players, the verifier and falsifier. The for-
mula is considered true precisely when the verifier has a winning strategy, while
it will be false whenever the falsifier has the winning strategy. Similar ideas have
been used to implement dialectical proof-theories for defeasible reasoning (e.g.,
by Prakken and Sartor [38]).
It is worth mentioning that, in addition to the generic protocols presented in
the last two sections, various domain-specific protocols have appeared in the liter-
ature. These protocols are usually more complex, and are linked to argumentation
schemes from specific domains, enabling agents to argue about what action to
take [1], the risks involved in various decisions [27], or about the properties of
items involved in negotiation [29].
2. design rules of the game in such a way that self-interested agents behave in
some desirable way (e.g., tell the truth); this is called mechanism design.
Both these approaches are quite useful for the study of argumentation in multi-
agent systems. On the one hand, an agent may use game theory to analyze a given
argumentative situation in order to choose the best strategy. On the other hand,
we may use mechanism design to design the rules (e.g., argumentation protocol)
in such a way as to promote good argumentative behavior.
and thus s = (si , s−i ). We then interpret ui ((si , s−i ), θi ) to be the utility of
agent i with type θi when all agents play strategies specified by strategy profile
(si (θi ), s−i (θ−i )). Similarly, we also define:
Since the agents are all self-interested, they will try to choose strategies that
maximize their own utility. Since the strategies of other agents also play a role
in determining the outcome, the agents must take this into account. The solution
concepts in game theory determine the outcomes that will arise if all agents are
rational and strategic. The most well-known solution concept is the Nash equilib-
rium. A Nash equilibrium is a strategy profile in which each agent is following
a strategy that maximizes its own utility, given its type and the strategies of the
other agents.
mechanism where every agent reveals its true type [25]. In such a situation, we
say that the social choice function is incentive compatible.
Definition 5.16 (Incentive compatible) The social choice function f (·) is in-
centive compatible (or truthfully implementable) if the direct mechanism M =
(Θ, g(·)) has an equilibrium s∗ such that s∗i (θi ) = θi .
Theorem 5.2 (Revelation principle) If there exists some mechanism that imple-
ments social choice function f in dominant strategies, then there exists a direct
mechanism that implements f in dominant strategies and is truthful.
and g : Σ1 × . . . × ΣI → 2A .
Note that in the above definition, the notion of dialogue strategy is broadly
construed and would depend on the protocol used. In a direct mechanism, how-
ever, the strategy spaces of the agents are restricted so that they can only reveal a
subset of arguments. Due to the revelation principle, this will be sufficient for the
analysis in the rest of the chapter.
The following mechanism calculates the grounded extension given the union of
all arguments revealed by agents.
4 In the remainder of the chapter, I will use the term skeptical to refer to skeptical grounded.
Chapter 5 199
Į4 Į4
Į1 Į2 Į3 Į2 Į3
Į5 Į5
(a) Argument graph in case of full revelation (b) Argument graph with Į1 withheld
grnd
The following example explores incentives with mechanism MAF .
Theorem 5.3 Let AF be an arbitrary argumentation framework, and let EGR (AF)
grnd
denote its grounded extension. Mechanism MAF is strategyproof for agents
with acceptability maximizing preferences if and only if AF satisfies the follow-
ing condition: ∀i ∈ I, ∀S ⊆ Ai and ∀A−i , we have |Ai ∩ EGR (Ai ∪ A−i , R
)| ≥
|Ai ∩ EGR ((Ai \S) ∪ A−i , R
)|.
Note that → is over all arguments in A. Intuitively, the condition in the the-
orem states that all arguments of every agent must be conflict-free (i.e., consis-
tent), both explicitly and implicitly. Explicit consistency implies that no argument
defeats another. Implicit consistency implies that other agents cannot possibly
present a set of arguments that reveal an indirect defeat among one’s own argu-
ments. More concretely, in Example 5.10 and Figure 5.4, while agent x’s argument
set Ax = {α1 , α4 , α5 } is conflict-free, when agents y and z presented their own ar-
Chapter 5 201
Į6
Į4
Į1 Į2 Į3
Į5
Example 5.11 Consider the variant of Example 5.10 with the additional argu-
ment α6 and defeat (α6 , α3 ). Let the agent types be Ax = {α1 , α4 , α5 , α6 },
Ay = {α2 }, and Az = {α3 }, respectively. The full argument graph is depicted
in Figure 5.5. Under full revelation, the mechanism outcome rule produces the
outcome o = {α1 , α4 , α5 , α6 }.
Note that in Example 5.11, truth revelation is now a dominant strategy for x
(since it gets all its arguments accepted) despite the fact that α1 → α4 and α1 →
α5 . This hinges on the presence of an argument (namely α5 ) that cancels out the
negative effect of the (in)direct self-defeat among x’s own arguments.
The core AIF has two types of nodes: information nodes (or I-nodes) and
scheme nodes (or S-nodes). These are represented by two disjoint sets, NI ⊂ N and
NS ⊂ N, respectively. Information nodes are used to represent passive information
contained in an argument, such as a claim, premise, data, etc. S-nodes capture the
application of schemes (i.e., patterns of reasoning). Such schemes may be domain-
independent patterns of reasoning, which resemble rules of inference in deductive
logics but broadened to include non-deductive inference. The schemes themselves
belong to a class, S, and are classified into the types: rule of inference scheme,
conflict scheme, and preference scheme. We denote these using the disjoint sets
SR , SC , and SP , respectively. The predicate (uses : NS × S) is used to express the
fact that a particular scheme node uses (or instantiates) a particular scheme. The
AIF thus provides an ontology for expressing schemes and instances of schemes,
and constrains the latter to the domain of the former via the function uses, i.e.,
∀n ∈ NS , ∃s ∈ S such that uses(n, s).
The present ontology has three different types of scheme nodes: rule of in-
ference application nodes (or RA-nodes), preference application nodes (or PA-
nodes) and conflict application nodes (or CA-nodes). These are represented as
three disjoint sets: NSRA ⊆ NS , NSRA ⊆ NS , and NCA
S ⊆ NS , respectively. The word
“application” on each of these types was introduced in the AIF as a reminder that
these nodes function as instances, not classes, of possibly generic inference rules.
Intuitively, NSRA captures nodes that represent (possibly non-deductive) rules of in-
ference, NCAS captures applications of criteria (declarative specifications) defining
conflict (e.g., among a proposition and its negation, etc.), and NSRA are applications
of (possibly abstract) criteria of preference among evaluated nodes.
The AIF specification does not type its edges. The (informal) semantics of
edges can be inferred from the types of nodes they connect. One of the restrictions
is that no outgoing edge from an I-node can be directed directly to another I-node.
This ensures that the type of any relationship between two pieces of information
must be specified explicitly via an intermediate S-node.
Definition 5.23 (Argument network) An argument network Φ is a graph with: (i)
edge
a set N = NI ∪ NS of vertices (or nodes); and (ii) a binary relation −−→: N × N
edge
representing edges, where (i, j) ∈−−→ where both i ∈ NI and j ∈ NI .
A simple argument can be represented by linking a set of premises to a conclusion.
Definition 5.24 (Simple argument) A simple argument, in network Φ and
schemes S, is a tuple P, τ, c
where: (i) P ⊆ NI is a set of nodes denoting premises;
(ii) τ ∈ NSRA is a rule of inference application node; (iii) c ∈ NI is a node denoting
edge
the conclusion, such that τ −−→ c, uses(τ, s) where s ∈ S, and ∀p ∈ P we have
edge
p −−→ τ.
Chapter 5 203
p p
A1
neg1 neg2
p MP2 rĺ p
A2 r
Figure 5.6: Examples of simple arguments. S-nodes denoted with a thicker border.
Example 5.12 (Simple argument) The tuple A1 = {p, p → q}, MP1 , q
is a sim-
ple argument in propositional language L, where p, (p → q) ∈ NI are nodes
representing premises, and q ∈ NI is a node representing the conclusion. In be-
tween them, the node MP1 ∈ NSRA is a rule of inference application node (i.e.,
RA-node) that uses the modus ponens natural deduction scheme, which can be
formally written as follows: uses(MP1 , ∀A, B ∈ L A B A→B ).
An attack or conflict from one information or scheme node to another infor-
mation or scheme node is captured through a CA-node, which captures the type of
conflict. The attacker is linked to the CA-node, and the CA-node is subsequently
linked to the attacked node. Note that since edges are directed, each CA-node
captures attack in one direction. Symmetric attack would require two CA-nodes,
one in each direction. The following example describes a conflict between two
simple arguments (see Figure 5.6(b)).
logical arguments. Moreover, the AIF was deliberately given only semi-formal
semantics, allowing for it to be adapted according to one’s need, for example with
a particular language for describing the internal contents of information nodes, or
by committing to edges with specific formal semantics. It has been shown that
the AIF can be adapted for creating ontologies using Semantic Web standards
for annotating natural language arguments using Walton-style schemes (see for
example [47]). Having said that, the AIF is still a young effort in need of further
refinement and proof-of-concept applications.
7 Conclusion
I gave an overview of the emerging field of argumentation in multiagent systems.
I introduced some basic definitions of the argument and relationships between
arguments. I then gave an overview of protocols that govern argumentation di-
alogues among agents, before moving to the issue of strategic argumentation. I
closed the discussion with a brief overview of efforts toward a common ontology
for enabling the exchange of arguments.
This field of study is still in its infancy. While much is understood about
the properties of argumentation semantics [2] and the termination and complexity
properties of dialogue protocols [13], strategic aspects are still underexplored.
Game-theoretic tools have proven indispensable to the understanding of other
forms of interaction in multiagent systems, such as auction and voting protocols,
as illustrated elsewhere in this book. Yet the connection between argumentation
processes and game theory still has a long way to go.
Another important challenge is understanding how computational models of
argument relate to how people actually argue. This is crucial to the task of
programming agents capable of arguing with, and successfully persuading peo-
ple [26]. The models explored in this chapter mostly overlook questions of psy-
chological plausibility [40]. There is an opportunity for cross-fertilization be-
tween computational argumentation and human reasoning research [50].
Acknowledgment
Some of the contents of this chapter are based on excerpts from my previous work.
I would like to thank all my coauthors on those articles, especially Kate Larson
and Chris Reed. I would also like to apologize for the omission of much important
work, which I simply did not have space for due to a limited number of pages.
Chapter 5 205
8 Exercises
1. Level 1 Consider the argument framework A,
with A = {a, b, c, d} and
= {(a, b), (b, a), (a, c), (b, c), (c, d), (d, c)}. Draw the argument graph,
then produce all legal labelings of this graph. List all complete extensions,
and identify which of these is grounded, preferred, stable (if it exists), and
semi-stable.
3. Level 1 Pick an argument from today’s newspaper, and try to model it using
the argument interchange format. Are there multiple ways to do this?
4. Level 2 Consider the following situation involving the couple Alice (A) and
Brian (B), who want to decide on an activity for the day. Brian thinks they
should go to a soccer match (argument α1 ) while Alice thinks they should
attend the ballet (argument α2 ). There is time for only one activity, however
(hence α1 and α2 defeat one another). Moreover, while Alice prefers the
ballet to the soccer, she would still rather go to a soccer match than stay
at home. Likewise, Brian prefers the soccer match to the ballet, but also
prefers the ballet to staying home. Formally, we can write uA (ballet) >
uA (soccer) > uA (home) and uB (soccer) > uB (ballet) > uB (home). Alice
has a strong argument which she may use against going to the soccer match,
namely by claiming that she is too sick to be outdoors (argument α3 ). Brian
simply cannot attack this argument (without compromising his marriage at
least). Likewise, Brian has an irrefutable argument against the ballet; he
could claim that his ex-wife will be there too (argument α4 ). Alice cannot
stand her! Draw the corresponding abstract argument graph, and identify
the strategic (normal-form) game being played, together with the equilibria
of this game.
5. Level 3 Recall that Definition 5.10 provides a dialogue protocol such that
the proponent wins the dialogue if and only if the argument at the root is
in the grounded set. Produce variants of this protocol corresponding to the
skeptical and credulous version of all semantics described in Section 3.
11. Level 4 Using tools from automated natural language processing, program
a system capable of annotating natural language arguments.
References
[1] K. Atkinson and T. Bench-Capon. Practical reasoning as presumptive argumentation
using action based alternating transition systems. Artificial Intelligence, 171(10-
15):855–874, 2007.
[7] Philippe Besnard and Anthony Hunter. Elements of Argumentation. MIT Press,
Cambridge MA, USA, 2008.
[8] A. Bondarenko, P.M. Dung, R.A. Kowalski, and F. Toni. An abstract, argumentation-
theoretic approach to default reasoning. Artificial intelligence, 93(1-2):63–101,
1997.
[12] Phan Minh Dung. On the acceptability of arguments and its fundamental role in
nonmonotonic reasoning, logic programming and n-person games. Artificial Intelli-
gence, 77(2):321–358, 1995.
[15] M.A. Falappa, G. Kern-Isberner, and G.R. Simari. Explanations, belief revision and
defeasible reasoning. Artificial Intelligence, 141(1-2):1–28, 2002.
[16] John Fox, David Glasspool, Dan Grecu, Sanjay Modgil, Matthew South, and Vivek
Patkar. Argumentation-based inference and decision making – a medical perspec-
tive. IEEE Intelligent Systems, 22(6):34–41, 2007.
[17] Jacob Glazer and Ariel Rubinstein. Debates and decisions: On a rationale of argu-
mentation rules. Games and Economic Behavior, 36:158–173, 2001.
[18] Thomas F. Gordon, Henry Prakken, and Douglas Walton. The Carneades model of
argument and burden of proof. Artificial Intelligence, 171(10–15):875–896, 2007.
208 Chapter 5
[20] Jaakko Hintikka and Gabriel Sandu. Game-theoretical semantics. In Johan van
Benthem and Alice ter Meulen, editors, Handbook of Logic and Language, pages
361–410. Elsevier, Amsterdam, The Netherlands, 1997.
[21] Nishan C. Karunatillake, Nicholas R. Jennings, Iyad Rahwan, and Peter McBurney.
Dialogue games that agents play within a society. Artificial Intelligence, 173(9-
10):935–981, 2009.
[22] Sarit Kraus, Katia Sycara, and Amir Evenchik. Reaching agreements through argu-
mentation: A logical model and implementation. Artificial Intelligence, 104(1-2):1–
69, 1998.
[23] F. Lin and Y. Shoham. A logic of knowledge and justified assumptions. Artificial
Intelligence, 57(2-3):271–289, 1992.
[26] Irene Mazzotta, Fiorella de Rosis, and Valeria Carofiglio. Portia: A user-adapted
persuasion system in the healthy-eating domain. IEEE Intelligent Systems, 22(6):42–
51, 2007.
[27] P. McBurney and S. Parsons. Risk agoras: Using dialectical argumentation to debate
risk. Risk Management, 2(2):17–27, 2000.
[28] P. McBurney and S. Parsons. Games that agents play: A formal framework for di-
alogues between autonomous agents. Journal of Logic, Language and Information,
11(3):315–334, 2002.
[29] Peter McBurney, Rogier M. van Eijk, Simon Parsons, and Leila Amgoud. A
dialogue-game protocol for agent purchase negotiations. Journal of Autonomous
Agents and Multi-Agent Systems, 7(3):235–273, 2003.
[30] S. Modgil and M. Caminada. Proof theories and algorithms for abstract argumenta-
tion frameworks. In Iyad Rahwan and Guillermo R. Simari, editors, Argumentation
in Artificial Intelligence, pages 105–129. Springer, 2009.
[32] Simon Parsons, Michael J. Wooldridge, and Leila Amgoud. Properties and complex-
ity of formal inter-agent dialogues. Journal of Logic and Computation, 13(3):347–
376, 2003.
Chapter 5 209
[33] Chaim Perelman and Lucie Olbrechts-Tyteca. The New Rhetoric: A Treatise on
Argumentation. University of Notre Dame Press, 1969.
[37] Henry Prakken. Coherence and flexibility in dialogue games for argumentation.
Journal of Logic and Computation, 15(6):1009–1040, 2005.
[38] Henry Prakken and Giovanni Sartor. Argument-based extended logic programming
with defeasible priorities. Journal of Applied Non-classical Logics, 7:25–75, 1997.
[39] I. Rahwan and K. Larson. Argumentation and game theory. Argumentation in Arti-
ficial Intelligence, pages 321–339, 2009.
[40] I. Rahwan, M.I. Madakkatel, J.F. Bonnefon, R.N. Awan, and S. Abdallah. Behav-
ioral experiments for assessing the abstract argumentation semantics of reinstate-
ment. Cognitive Science, 34(8):1483–1502, 2010.
[41] Iyad Rahwan and Leila Amgoud. An argumentation-based approach for practical
reasoning. In Gerhard Weiss and Peter Stone, editors, 5th International Joint Con-
ference on Autonomous Agents and Multiagent Systems (AAMAS), pages 347–354,
2006.
[42] Iyad Rahwan and Kate Larson. Mechanism design for abstract argumentation. In
L. Padgham, D. Parkes, J. Mueller, and S. Parsons, editors, 7th International Joint
Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1031–
1038, 2008.
[43] Iyad Rahwan, Kate Larson, and Fernando Tohmé. A characterisation of strategy-
proofness for grounded argumentation semantics. In Proceedings of the 21st Inter-
national Joint Conference on Artificial Intelligence (IJCAI), pages 251–256, 2009.
[44] Iyad Rahwan and Peter McBurney. Guest editors’ introduction: Argumentation
technology. IEEE Intelligent Systems, 22(6):21–23, 2007.
[45] Iyad Rahwan, Sarvapali D. Ramchurn, Nicholas R. Jennings, Peter McBurney, Si-
mon Parsons, and Liz Sonenberg. Argumentation-based negotiation. Knowledge
Engineering Review, 18(4):343–375, 2003.
210 Chapter 5
[46] Iyad Rahwan and Guillermo R. Simari, editors. Argumentation in Artificial Intelli-
gence. Springer, 2009.
[47] Iyad Rahwan, Fouad Zablith, and Chris Reed. Laying the foundations for a world
wide argument web. Artificial Intelligence, 171(10–15):897–921, 2007.
[48] John Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge
University Press, New York, USA, 1969.
[49] G.R. Simari and R.P. Loui. A mathematical treatment of defeasible reasoning and
its implementation. Artificial intelligence, 53(2-3):125–157, 1992.
[50] Keith Stenning and Michiel van Lambalgen. Human Reasoning and Cognitive Sci-
ence. MIT Press, Cambridge MA, USA, 2008.
[51] Stephen Toulmin. The Uses of Argument. Cambridge University Press, Cambridge,
UK, 1958.
[52] W. van der Hoek, M. Roberts, and M. Wooldridge. Social laws in alternating time:
Effectiveness, feasibility, and synthesis. Synthese, 156(1):1–19, 2007.
[53] Frans H. van Eemeren, Rob Flanery Grootendorst, and Francisca Snoeck Henke-
mans, editors. Fundamentals of Argumentation Theory: A Handbook of Historical
Backgrounds and Contemporary Applications. Lawrence Erlbaum Associates, Mah-
wah NJ, USA, 1996.
[55] J. von Neuman and O. Morgenstern. The Theory of Games and Economic Behavior.
Princeton University Press, Princeton NJ, USA, 1944.
[58] D.N. Walton, D.C. Reed, and F. Macagno. Argumentation Schemes. Cambridge
university press, 2008.
Basic Coordination
Chapter 6
1 Introduction
Social choice theory concerns the design and formal analysis of methods for ag-
gregating the preferences of multiple agents. Examples of such methods include
voting procedures, which are used to aggregate the preferences of voters over a set
of candidates standing for election to determine which candidate should win the
election (or, more generally, to choose an alternative from a set of alternatives), or
protocols for deciding on a fair allocation of resources given the preferences of a
group of stakeholders over the range of bundles they might receive. Originating
in economics and political science, social choice theory has since found its place
as one of the fundamental tools for the study of multiagent systems. The reasons
for this development are clear: if we view a multiagent system as a “society” of
autonomous software agents, each of which has different objectives, is endowed
with different capabilities, and possesses different information, then we require
clearly defined and well-understood mechanisms for aggregating their views so as
to be able to make collective decisions in such a multiagent system.
Computational social choice, the subject of this chapter, adds an algorithmic
perspective to the formal approach of social choice theory. More broadly speak-
ing, computational social choice deals with the application of methods usually
associated with computer science to problems of social choice.
214 Chapter 6
4 3 2
milk beer wine
wine wine beer
beer milk milk
Now, which drink should be served based on these individual preferences? Milk
could be chosen on the grounds that it has the most agents ranking it first (the
Dutch). That is, it is the winner according to the plurality rule, which only con-
siders how often each alternative is ranked in first place. However, a majority
of agents (the Germans and the French) will be dissatisfied with this choice as
they prefer any other drink to milk. In fact, it turns out that wine is preferred to
both beer and milk by a 6:3 and a 5:4 majority of voters, respectively. An alter-
native with this property (defeating every other alternative in pairwise majority
comparisons) is called a Condorcet winner. Yet another method of determining
a collective choice would be to successively eliminate those beverages that are
ranked first by the lowest number of agents (known as Single Transferable Vote,
or STV). This would result in wine being eliminated first because only two agents
(the French) rank it first. Between the remaining two options, beer is ranked
higher by the Germans and the French, and will eventually be chosen. In sum-
mary, this example shows that collective choice is not a trivial matter, as different,
seemingly reasonable, voting rules can yield very different results.
Another important lesson that can be learned from this example concerns
strategic manipulation. Assume the collective choice is determined using the plu-
rality rule. Since preferences are private and each agent only knows its own pref-
erences with certainty, nobody can prevent the Germans from claiming that their
most-preferred drink is wine. This will result in a more preferable outcome to
them than reporting their preferences truthfully, because they get wine rather than
milk, their least-preferred alternative. A seminal result in social choice theory, the
1 Thisis based on an example used by Donald G. Saari at a conference in Rotterdam, where
only milk was served for lunch.
Chapter 6 215
them first, then vote again over the remaining alternatives, and to continue in this
fashion until all alternatives have been ranked.
We shall revisit several of these ideas again later on, when we define the frame-
works for social choice outlined here in more formal detail.
for making this unwanted behavior so difficult that it can be neglected in practice.3
This groundbreaking work was followed by a small number of isolated pub-
lications throughout the 1990s. In the first few years of the twenty-first century,
as the relevance of social choice to artificial intelligence, multiagent systems, and
electronic commerce became apparent, the frequency of contributions on prob-
lems related to social choice with a computational flavor suddenly intensified. Al-
though the field was still lacking a name, by 2005 contributions in what we would
now call “computational social choice” had become a regular feature at several
of the major conferences in artificial intelligence. The first workshop specifically
dedicated to computational social choice, and the first event to explicitly use this
name, took place in 2006 [102]. Around the same time, Chevaleyre et al. [56]
attempted the first classification of research in the area by distinguishing (a) the
nature of the social choice problem addressed, and (b) the type of formal or com-
putational technique used.
1.3 Applications
Social choice theory was originally developed as an abstraction of problems that
arise in political science and economics. More generally, social choice theory
provides a useful theoretical framework for the precise mathematical study of the
normative foundations of collective decision making, in a wide range of areas,
involving not only human decision makers but also autonomous software agents.
This chapter will focus on the theoretical foundations of computational social
choice. But before we delve into the theory, let us briefly cite a few examples
of actual and potential application domains, going beyond political elections and
collective decision making in multiagent systems, where the methods we shall
cover in this chapter can be put to good use.
The first such example comes from the domain of Internet search engines.
Imagine you want to design a metasearch engine that combines the search results
of several engines. This problem has a lot in common with preference aggrega-
tion. Aggregating preferences means asking each individual agent for a ranking
over the set of alternatives and then amalgamating this information into a sin-
gle such ranking that adequately represents the preferences of the group. For the
metasearch engine, we ask each individual search engine for a ranking of its own,
say, 20 top results and then have to aggregate this information to produce our
metaranking. Of course, the problems are not exactly the same. For instance,
some website may not have been ranked at all by one search engine, but be in the
3 As we shall see,this approach of using computational complexity as a barrier against strategic
manipulation has its limitations, but conceptually this has nevertheless been an important idea that
has inspired a good deal of exciting research.
218 Chapter 6
top five for another. Also, the general principles that we might want to adhere
to when performing the aggregation might differ: in preference aggregation, fair-
ness will play an important role; when aggregating search results fairness is not
a goal in itself. Nevertheless, it is clear that insights from social choice theory
can inform possible approaches for designing our metasearch engine. In fact, this
situation is rather typical in computational social choice: for many modern appli-
cations, we can rely on some of the basic insights from social choice theory, but
to actually develop an adequate solution, we do have to alter some of the classical
assumptions.
There is also a less obvious application of principles of social choice to search
engines. One way of measuring the importance of a web page is the number of
other web pages linking to it. In fact, this is a recursive notion: the importance
of our web page also depends on the importance of the pages linking to it, which
in turn depends on the importance of the pages linking to those. This idea is the
basis for the PAGE R ANK algorithm at the core of Google’s search engine [170].
We may think of this as an election where the set of the voters and the set of the
candidates coincide (both are the set of all web pages). In this sense, the ranking
of the importance of web pages may be considered as a social choice problem.
This perspective has led to a deeper understanding of the problem, for instance,
by providing an axiomatic characterization of different ranking algorithms [3].
Another example of an application domain for which the perspective of social
choice theory can provide fruitful new insights is that of recommender systems. A
recommender system is a tool for helping users choose attractive products on the
basis of choices made by other users in the past. An important technique in this
field is collaborative filtering. By reinterpreting collaborative filtering as a pro-
cess of preference aggregation, the axiomatic method developed in social choice
theory has proven helpful in assessing and comparing the quality of different col-
laborative filtering approaches [171].
Yet another example is the problem of ontology merging, which arises in the
context of the Semantic Web. Suppose different information providers on the Se-
mantic Web provide us with different ontologies describing the same set of con-
cepts. We would like to combine this information so as to arrive at the best possi-
ble ontology representing the available knowledge regarding the problem domain.
This is a difficult problem that will require a combination of different techniques.
Social choice theory can make a contribution in those cases where we have little
information regarding the reliability of the individual providers and can only re-
sort to aggregating whatever information they provide in a “fair” (and logically
consistent) manner [174].
We shall allude to further areas of application along the way. However, our
focus will be on theoretical foundations from here on.
Chapter 6 219
2 Preference Aggregation
One of the most elementary questions in social choice theory is how the prefer-
ence relations of individual agents over some abstract set of alternatives can be
aggregated into one collective preference relation. Apart from voting, this ques-
tion is of broad interest in the social sciences, because it studies whether and how
a society of autonomous agents can be treated as a single rational decision maker.
As we point out in Section 2.1, results in this framework are very discouraging.
In many practical settings, however, one is merely interested in a set of socially
acceptable alternatives rather than a collective preference relation. In Section 2.2,
we discuss the relationship between both settings and present some positive results
for the latter framework.
tives (sometimes also called candidates). Each agent i entertains preferences over
the alternatives in U, which are represented by a transitive and complete pref-
erence relation i . Transitivity requires that a i b and b i c imply a i c for
all a, b, c ∈ U, and completeness requires that any pair of alternatives a, b ∈ U is
comparable, i.e., it holds that either a i b or b i a or both. In some cases, we
will assume preferences to be linear, i.e., also satisfying antisymmetry (a i b and
b i a imply that a = b), but otherwise we impose no restrictions on preference
relations. We have a i b denote that agent i likes alternative a at least as much
as alternative b and write i for the strict part of i , i.e., a i b if a i b but
not b i a. Similarly, ∼i denotes i’s indifference relation, i.e., a ∼i b if both a i b
and b i a. The set of all preference relations over the universal set of alterna-
tives U will be denoted by R(U). The set of preference profiles, associating one
preference relation with each individual agent, is then given by R(U)n .
Economists often also consider cardinal (rather than ordinal) preferences,
which are usually given in the form of a utility function that assign numerical
values to each alternative. It is easy to show that, for a finite number of alter-
natives, a preference relation can be represented by a utility function if and only
if it satisfies transitivity and completeness (see Exercise 1). Still, a utility func-
tion may encode much more information than a preference relation, such as the
intensity of preferences. In the absence of a common numeraire such as money,
the meaning of individual utility values and especially the interpersonal compar-
isons between those values is quite controversial. Therefore, the ordinal model
based on preference relations is the predominant model in abstract social choice
theory. In special domains such as fair division (see Section 5), however, cardinal
preferences are also used.
A social welfare function is a function that maps individual preference rela-
tions to a collective preference relation.
are more than two alternatives. To see this, consider the preference relations of
three voters given in Figure 6.1. A majority of voters (two out of three) prefers a
to b. Another majority prefers b to c and yet another one c to a. Clearly, the pair-
wise majority relation in this example is cyclic and therefore not a well-formed
preference relation. Hence, the majority rule does not constitute an SWF.
1 1 1
a
a b c
b c a
c a b c b
Figure 6.1: Condorcet’s paradox [86]. The left-hand side shows the individual
preferences of three agents such that the pairwise majority relation, depicted on
the right-hand side, is cyclic.
In what is perhaps the most influential result in social choice theory, Arrow [5]
has shown that this “difficulty in the concept of social welfare” (as he calls it) is
not specific to the majority rule, but rather applies to a very large class of SWFs.
Arrow’s theorem states that a seemingly innocuous set of desiderata cannot be
simultaneously met when aggregating preferences. These desiderata are Pareto
optimality, independence of irrelevant alternatives, and non-dictatorship; they are
defined as follows.
Theorem 6.1 (Arrow, 1951) There exists no SWF that simultaneously satisfies
IIA, Pareto optimality, and non-dictatorship whenever |U| ≥ 3.
According to Paul Samuelson, who is often considered the founding father of
modern economics, Arrow’s theorem is one of the significant intellectual achieve-
ments of the twentieth century [188]. A positive aspect of such a negative result
is that it provides boundaries on what can actually be achieved when aggregating
preferences. In particular, Arrow’s theorem shows that at least one of the required
conditions has to be omitted or relaxed in order to obtain a positive result. For in-
stance, if |U| = 2, IIA is trivially satisfied by any SWF and reasonable SWFs (such
as the majority rule) also satisfy the remaining conditions. In a much more elab-
orate attempt to circumvent Arrow’s theorem, Young [231] proposed to replace
IIA with local IIA (LIIA), which only requires IIA to hold for consecutive pairs
of alternatives in the social ranking. By throwing in a couple of other conditions
(such as anonymity and neutrality, which will be defined in Section 3) and restrict-
ing attention to linear individual preferences, Young completely characterizes an
aggregation function known as Kemeny’s rule.
Kemeny’s rule. Kemeny’s rule [140] yields all strict rankings that agree with as
many pairwise preferences of the agents as possible. That is, it returns
arg max ∑ | ∩ i | .
i∈N
Since there can be more than one ranking that satisfies this property, Kemeny’s
rule is not really an SWF but rather a multi-valued SWF. (Young refers to these as
social preference functions.) Alternatively, Kemeny’s rule can be characterized
using maximum likelihood estimation [231, 232].4 Over the years, Kemeny’s rule
has been reinvented by many scholars in different fields. It is also known as the
median or linear ordering procedure [15, 53]. Kemeny’s rule is not only very
interesting from an axiomatic but also from a computational point of view. The
problem of computing a Kemeny ranking, as well as the closely related problem
of computing a Slater ranking (a ranking that agrees with the outcomes of as many
pairwise elections as possible), correspond to a computational problem on graphs
known as the minimum feedback arc set problem (in the case of Kemeny’s rule, the
weighted version of this problem). It has been shown that computing a Kemeny
4 This is done under a model where there exists a “correct” ranking of the alternatives, and the
agents’ preferences are noisy estimates of this correct ranking. This result relies on a particular
noise model; if the noise model is changed, the maximum likelihood solution can result in other
SWFs, though for yet other SWFs, it can be proved that no noise model would yield that SWF as
the solution [70, 76, 90, 209]. However, the Kemeny result is robust to some other generalizations
of the model [66, 222].
Chapter 6 223
ranking is NP-hard [20], even when there are just four voters [97]. Moreover,
p
deciding whether a given alternative is ranked first in a Kemeny ranking is Θ2 -
complete [131]. Nevertheless, under certain conditions, there is a polynomial-time
approximation scheme (PTAS) for the Kemeny problem [141]. For further details
on these problems, we refer to the works of Alon [2], Betzler et al. [23, 26], Brandt
et al. [48], Charon and Hudry [53], Conitzer [61], Conitzer et al. [73], Davenport
and Kalagnanam [84], Hudry [135, 136], and Ali and Meila [1].5
Rather than relaxing the explicit conditions in Arrow’s theorem, one may call
its implicit assumptions into question. For instance, in many applications, a full
social preference relation is not needed; rather, we just wish to identify the socially
most desirable alternatives. This corresponds to the framework considered in the
following section.6
a standard tool for political and group decision making” [232]. This has not yet happened, but the
website www.votefair.org provides an interface to use Kemeny’s rule for surveys, polls, and
elections at no charge.
6 This effectively reduces the codomain of the aggregation function. As we will see in Sec-
tion 3.2.2, a common technique to avoid negative results in social choice theory is to reduce the
domain of the function.
224 Chapter 6
Pareto optimality now requires that an alternative should not be chosen if there
exists another feasible alternative that all agents unanimously prefer to the for-
mer – more precisely, a ∈ f (R, A) if there exists some b ∈ A such that b i a for
all i ∈ N. An SCF f is non-dictatorial if there is no agent i such that for all prefer-
ence profiles R and alternatives a, a i b for all b ∈ A \ {a} implies a ∈ f (R, A).7
Independence of irrelevant alternatives reflects the idea that choices from a set of
feasible alternatives should not depend on preferences over alternatives that are
infeasible, i.e., f (R, A) = f (R , A) if R|A = R |A . Interestingly, in the context of
SCFs, IIA constitutes no more than a framework requirement for social choice
and is not the critical assumption it used to be in the context of SWFs.
Finally, the weak axiom of revealed preference (WARP) demands that choice
sets from feasible sets are strongly related to choice sets from feasible subsets. Let
A and B be feasible sets such that B ⊆ A. WARP requires that the choice set of B
consists precisely of those alternatives in B that are also chosen in A, whenever this
set is non-empty. Formally, for all feasible sets A and B and preference profiles R,
Theorem 6.2 (Arrow, 1951, 1959) There exists no SCF that simultaneously sat-
isfies IIA, Pareto optimality, non-dictatorship, and WARP whenever |U| ≥ 3.
As the Arrovian conditions – Pareto optimality, IIA, non-dictatorship, and
WARP – cannot be satisfied by any SCF, at least one of them needs to be ex-
cluded or relaxed to obtain positive results. Clearly, dropping non-dictatorship
is unacceptable and, as already mentioned, IIA merely states that the SCF repre-
sents a reasonable model of preference aggregation [see, e.g., 29, 193]. Wilson
[215] has shown that without Pareto optimality only SCFs that are constant (i.e.,
completely unresponsive) or fully determined by the preferences of a single agent
are possible. Moreover, it could be argued that not requiring Pareto optimality
runs counter to the very idea of social choice. Accordingly, the only remaining
possibility is to relax WARP.
The intuition behind expansion is that if alternative a is chosen from some set
that contains another alternative b, then it will also be chosen in all supersets in
which b is chosen. Formally, SCF f satisfies expansion if for all A, B and R,
Top cycle. The top cycle is the smallest majoritarian SCF satisfying expansion
[28]. It consists of the maximal elements of the transitive closure of the weak ma-
jority relation [45, 88] and can be computed in linear time by using standard al-
gorithms for identifying strongly connected components in digraphs such as those
due to Kosaraju or Tarjan [see, e.g., 80].
8 Moreover, due to the inclusive character of expansion consistency conditions, they are easily
satisfied by very undiscriminatory SCFs. For instance, the trivial SCF, which always yields all
feasible alternatives, trivially satisfies expansion (and all of its weakenings).
226 Chapter 6
Uncovered set. The uncovered set is the smallest majoritarian SCF satisfying
a weak version of expansion [164]. Interestingly, the uncovered set consists pre-
cisely of those alternatives that reach every other alternative on a majority rule
path of length at most two [202]. Based on this characterization, computing the
uncovered set can be reduced to matrix multiplication and is thus feasible in al-
most linear time [42, 134].
Banks set. The Banks set is the smallest majoritarian SCF satisfying a weaken-
ing of weak expansion, called strong retentiveness [40]. In contrast to the previous
two SCFs, the Banks set cannot be computed in polynomial time unless P equals
NP. Deciding whether an alternative is contained in the Banks set is NP-complete
[47, 216]. Interestingly, some alternatives (and thus subsets) of the Banks set can
be found in linear time [133]. A very optimized (exponential-time) algorithm for
computing the Banks set was recently proposed by Gaspers and Mnich [122].9
Two other SCFs, namely the minimal covering set [95] and the bipartisan
set [147], have been axiomatized using a variant of contraction, which is im-
plied by WARP [43]. While the bipartisan set can be computed using a single
linear program, the minimal covering set requires a slightly more sophisticated,
yet polynomial-time algorithm [42]. In addition to efficient computability, the
minimal covering set and the bipartisan set satisfy a number of other desirable
properties [39, 151] (see also Section 3.2.5).
3 Voting
In the previous section, we started our formal treatment of social choice and en-
countered some of the fundamental limitations that we face. The purpose of pre-
senting these limitations at the outset is of course not to convince the reader that
social choice is hopeless and we should give up on it; it is too important for that.
(One is reminded of Churchill’s quote that “democracy is the worst form of gov-
ernment except for all those other forms that have been tried from time to time.”)
Rather, it is intended to get the reader to think about social choice in a precise
manner and to have realistic expectations for what follows. Now, we can move on
to some more concrete procedures for making decisions based on the preferences
of multiple agents.
9 Another SCF, the tournament equilibrium set [194], was, for more than 20 years, conjectured
to be the unique smallest majoritarian SCF satisfying retentiveness, a weakening of strong reten-
tiveness. This was recently disproven by Brandt et al. [49]. Deciding whether an alternative is
contained in the tournament equilibrium set of a tournament is NP-hard [47]. This problem is not
known to be in NP and may be significantly harder.
Chapter 6 227
Of course, every SCF can also be seen as a voting rule. There are two reasons
we distinguish SCFs from voting rules. First, from a technical perspective, the
SCFs defined in the previous section were axiomatized using variable feasible
sets in order to salvage some degree of collective rationality. Second, some of
these SCFs (e.g., the top cycle) can hardly be considered voting rules because
they are not discriminatory enough. Of course, the latter is merely a gradual
distinction, but there have been attempts to formalize this [see, e.g., 10, 117, 195,
207]. When ignoring all conditions that relate choices from different feasible sets
with each other, we have much more freedom in defining aggregation functions.
For simplicity, we assume throughout this section that preferences are linear, i.e.,
there are no ties in individual preference relations.
An important property that is often required of voting rules in practice, called
resoluteness, is that they should always yield a unique winner. Formally, a voting
rule f is resolute if | f (R)| = 1 for all preference profiles R. Two natural symmetry
conditions are anonymity and neutrality. Anonymity requires that the outcome of
a voting rule is unaffected when agents are renamed (or more formally, when the
individual relations within a preference profile are permuted). In a similar vein,
neutrality requires that a voting rule is invariant under renaming alternatives.
Unfortunately, in general, anonymous and neutral voting rules cannot be
single-valued. The simplest example concerns two agents and two alternatives,
each of which is preferred by one of the voters. Clearly, a single alternative can
only be chosen by breaking anonymity or neutrality.10
In the remainder of this section, we will define some of the most common
voting rules.
Borda’s rule. Under Borda’s rule alternative a gets k points from voter i if i
prefers a to k other alternatives, i.e., the score vector is (|U| − 1, |U| − 2, . . . , 0).
Borda’s rule takes a special place within the class of scoring rules as it chooses
those alternatives with the highest average rank in individual rankings. While
Borda’s rule is not a Condorcet extension, it is the only scoring rule that never
gives a Condorcet winner the lowest accumulated score [204]. Another appealing
axiomatic characterization of Borda’s rule was given by Young [228].
Plurality rule. The score vector for the plurality rule is (1, 0, . . . , 0). Hence,
the cumulative score of an alternative equals the number of voters by which it is
ranked first.
Anti-plurality rule. The score vector for the anti-plurality rule (which is some-
times also called veto) is (1, . . . , 1, 0). As a consequence, it chooses those alterna-
tives that are least-preferred by the lowest number of voters.
Due to their simplicity, scoring rules are among the most used voting rules in
the real world. Moreover, there are various elegant characterizations of scoring
rules. In Section 2, we introduced axioms that impose consistency restrictions
on choice sets when the set of feasible alternatives varies. Alternatively, one can
focus on changes in the set of voters. A very natural consistency property with
respect to a variable electorate, often referred to as reinforcement, was suggested
independently by Smith [204] and Young [228]. It states that all alternatives that
are chosen simultaneously by two disjoint sets of voters (assuming that there is at
least one alternative with this property) should be precisely the alternatives chosen
by the union of both sets of voters. When also requiring anonymity, neutrality, and
a mild technical condition, Smith [204] and Young [229] have shown that scoring
rules are the only voting rules satisfying these properties simultaneously.
A voting procedure, popularized by Brams and Fishburn [36], that is closely
related to scoring rules is approval voting. In approval voting, every voter can
approve any number of alternatives and the alternatives with the highest number of
approvals win. We deliberately called approval voting a voting procedure, because
technically it is not really a voting rule (unless we impose severe restrictions on the
domain of preferences by making them dichotomous). Various aspects of approval
voting (including computational ones) are analyzed in a recent compendium by
Laslier and Sanver [152].
Chapter 6 229
6 3 4 4
a’s score : 6 + 7s2
a c b b b’s score : 8 + 6s2
b a a c c’s score : 3 + 4s2
c b c a
Figure 6.2: Example due to Fishburn [118], which shows that no scoring rule
is a Condorcet extension. Scores for the score vector (1, s2 , 0) are given on the
right-hand side.
Maximin. Under the maximin rule, we consider the magnitude of pairwise elec-
tion results (by how many voters one alternative was preferred to the other). We
evaluate every alternative by its worst pairwise defeat by another alternative; the
winners are those who lose by the lowest margin in their worst pairwise defeats.
(If there are any alternatives that have no pairwise defeats, then they win.)
Dodgson’s rule. Dodgson’s rule yields all alternatives that can be made a Con-
dorcet winner by interchanging as few adjacent alternatives in the individual
rankings as possible. Deciding whether an alternative is a Dodgson winner is
p
Θ2 -complete and thus computationally intractable [20, 130]. Various computa-
tional properties of Dodgson’s rule such as approximability and fixed-parameter
tractability have been studied [see, e.g., 24, 51, 52, 158]. Unfortunately, Dodg-
son’s rule violates various mild axioms that almost all other Condorcet extensions
satisfy [see, e.g., 38].
Ranked pairs. The ranked pairs rule generates a ranking of all alternatives (and
the first-ranked alternative can be considered the winner). It first sorts all pairwise
elections by the magnitude of the margin of victory. Then, starting with the pair-
wise election with the largest margin, it “locks in” these results in this order, so
that the winner of the current pairwise election must be ranked above the loser in
the final ranking – unless this would create a cycle due to previously locked-in
11 Young [230] actually defined his rule using weak Condorcet winners (see Exercise 15). Brandt
et al. [46] have shown that the hardness result by Rothe et al. [185] carries over to Young’s original
definition.
Chapter 6 231
results, in which case we move on to the next pairwise election. A similar voting
rule was proposed by Schulze [192].
All SCFs mentioned in Section 2.2 (e.g., the top cycle, the uncovered set,
and the Banks set) also happen to be Condorcet extensions. This is because the
Condorcet criterion can be seen as a very weak variant of expansion consistency:
whenever an alternative is chosen in all two-element subsets, then it should also
be chosen from the union of all these sets. Many of the proposed Condorcet
extensions can be seen as refinements of these SCFs because they always yield
elements of, say, the top cycle or the uncovered set. Other prominent Condorcet
extensions are Kemeny’s rule and Slater’s rule (see Section 2.1).
STV. We have already mentioned the STV rule: it looks for the alternatives that
are ranked in first place the least often, removes them from all voters’ ballots (so
that some of them may now rank a different alternative first), and repeats. The
alternatives removed in the last round (which results in no alternatives being left
at all) win.
Bucklin’s rule. In the (simple version of) Bucklin’s rule, we first check whether
there is any alternative that is ranked first by more than half the voters; if so,
this alternative wins. If not, we check whether there are any alternatives that are
ranked in either first or second place by more than half the voters; if so, they win.
If not, we consider the first three positions, etc. When multiple alternatives cross
the n/2 threshold simultaneously, it is common to break ties by the margin by
which they crossed the threshold.
In order to gain more insight into the huge zoo of voting rules, various ax-
ioms that may or may not be satisfied by a voting rule have been put forward.
Sometimes a certain set of axioms completely characterizes a single voting rule
(such as the SCFs proposed in Section 2.2.2) or an interesting class of voting rules
(such as the class of scoring rules in Section 3.1.1). Another stream of research
studies the rationalization of voting rules by measuring the distance (according to
various metrics) of a given preference profile to the nearest preference profile that
satisfies certain consensus properties (e.g., being completely unanimous or ad-
mitting a Condorcet winner). This approach goes back to Dodgson’s voting rule
232 Chapter 6
mentioned in Section 3.1.2 and covers many of the rules proposed in this section
[99, 100, 161].
3.2 Manipulation
So far, we have assumed that the preferences of all voters are known. In reality,
generally the voters need to report their preferences. A significant problem is
that a voter may be incentivized to report preferences other than its true ones.
For example, consider a plurality election between three alternatives, a, b, and
c. Consider voter i with preferences a i b i c. Moreover, suppose that voter
i believes that almost nobody else will rank a first, but it will be a close race
between b and c. Then, i may be best off casting a vote in which b is ranked first:
it has little hope of getting a to win, so it may be better off focusing on ensuring
that at least b will win.
One may wonder why manipulation is something to be avoided. First, the
possibility of manipulation leads to fairness issues since manipulative skills are
usually not spread evenly across the population. Second, energy and resources are
wasted on determining how best to manipulate. Third, it makes it difficult to eval-
uate whether the resulting outcome is in fact one that makes sense with respect to
the true preferences (as opposed to the reported ones). As we will see, the question
of how to manipulate is not only computationally but also conceptually problem-
atic. It raises various fundamental game-theoretic questions and makes it very
difficult to make predictions or theoretical statements about election outcomes.12
There is also a result in the theory of mechanism design known as the revelation
principle, which can be very informally described as saying that anything that
can be achieved by a mechanism in which agents play strategically, can also be
achieved by a mechanism in which agents are best off telling the truth, underlin-
ing again the importance of truthful voting (for more details see the chapter on
“Mechanism Design and Auctions” in this book). Unfortunately, as we shall see,
the problem of manipulation cannot be avoided in general as every single-valued
12 One reason for this is that voting games can have many different equilibria. For example, in
a plurality election, it can be an equilibrium for all voters to vote for either b or c, even though all
voters rank a first in their true preferences! This is so because if nobody else is expected to vote
for a, then it does not make sense to waste one’s vote on a. If such an equilibrium seems artificial,
imagine a society in which two parties dominate the political scene and put forward candidates b
and c, whereas a is a third-party candidate. Of course, there are other equilibria as well, which
will in general result in different winners. This makes it difficult to make any predictions about
strategic voting. One context in which we can make a sharp game-theoretic prediction of the
winner is the one in which the agents vote in sequence, one after the other, observing what the
earlier agents have voted (see also Section 4.2). Unfortunately, in this context, paradoxical results
can be exhibited where the game-theoretic outcome does not reflect the voters’ true preferences
well. For more detail, see the work of Desmedt and Elkind [89] and Xia and Conitzer [220].
Chapter 6 233
than its own top choice b. If it manipulates and instead of truthfully declaring b
as its top choice decides to report a “smaller” alternative b , then either the win-
ner will not change or the winner will become even “smaller” and thus even less
attractive. On the other hand, if it reports a “larger” alternative b instead, then
it will not affect the median (and thus the winner) at all. Hence, any form of
manipulation will either damage its interests or have no effect at all. The exact
same argument would continue to hold even if instead of choosing the median, or
50th-percentile, voter, we chose the (say) 60th-percentile voter, even though this
clearly would not necessarily choose the Condorcet winner. The argument is also
easily modified to prove the stronger property of group-strategyproofness (where
a group of agents can join forces in attempting to manipulate the outcome).
Single-peakedness has also been studied from a computational point of view.
It is very easy to check whether a preference profile is single-peaked according to
a specific given ordering <. However, it is less obvious whether it can be checked
efficiently whether a preference profile is single-peaked according to some or-
dering <. Building on previous work by Bartholdi, III and Trick [18], Escoffier
et al. [106] proposed a linear-time algorithm for this problem. In other work,
Conitzer [63] and Farfel and Conitzer [116] investigated how to elicit the voters’
preferences by asking as few queries as possible when preferences are known to
be single-peaked (with the latter paper focusing on settings where agents have
most-preferred ranges of alternatives). The computational hardness of manip-
ulation (which will be introduced in the next section) for voting rules other than
median voting has also been examined in the context of single-peaked preferences
[46, 111, 213].
Another important domain of restricted preferences is that of value-restricted
preferences, which also guarantees the existence of a Condorcet winner and sub-
sumes many other domains such as that of single-peaked preferences [197, 201].
Definition 6.4 In the manipulation problem for a given resolute voting rule, we
are given a set of alternatives, a set of (unweighted) votes, and a preferred alter-
native p. We are asked whether there exists a single vote that can be added so that
p wins.14
One may object to various aspects of this definition. First of all, one may argue
that what the manipulator seeks to do is not to make a given alternative the winner,
but rather to get an alternative elected that is as high in its true ranking as possible.
This does not pose a problem: if the manipulator can solve the above problem, it
can simply check, for every alternative, whether it can make that alternative win,
and subsequently pick the best of those that can win. (Conversely, to get an alter-
native elected that is as high in its true ranking as possible, the manipulator needs
to check first of all whether it can make its most-preferred alternative win, which
comes down to the above problem.) Another objection is that the manipulator
generally does not know the votes of all the other voters. This is a reasonable
objection, though it should be noted that as long as it is possible that the manipu-
lator knows the votes of all the other voters, the above problem remains a special
case (and thus, any (say) NP-hardness results obtained for the above definition
still apply).
Inspired by early work by Bartholdi, III et al. [19], recent research in com-
puter science has investigated how to use computational hardness – primarily NP-
hardness – as a barrier against manipulation [see, e.g., 68, 74, 98, 110, 129]. Find-
ing a beneficial manipulation is known to be NP-hard for several rules, including
second-order Copeland [19], STV [17], ranked pairs [224], and Nanson and Bald-
win’s rules [166]. Many variants of the manipulation problem have also been
considered. In the coalitional manipulation problem, the manipulators can cast
multiple votes in their joint effort to make p win. Because the single-manipulator
14 Often,the problem is defined for irresolute voting rules; in this case, the question is either
whether p can be made one of the winners, or whether p can be made the unique winner. These
questions can be interpreted to correspond to the cases where ties are broken in favor of p, and
where they are broken against p, respectively.
Chapter 6 237
case is a special case of this problem where the coalition happens to have size 1,
this problem is NP-hard for all the rules mentioned above. However, other rules
are also NP-hard to manipulate in this sense, including Copeland [109, 115],
maximin [224], and Borda [25, 85]. Finally, in the weighted version of this
problem, weights are associated with the voters, including the manipulators (a
vote of weight k counts as k unweighted votes). Here, many rules become NP-
hard to manipulate even when the number of alternatives is fixed to a small con-
stant [74, 129, 166].
In the destructive version of the problem, the goal is not to make a given alter-
native a win, but rather to make a given alternative a not win [74]. For contrast,
the regular version is called the constructive version. If the constructive version is
easy, then so is the destructive version, because to solve the destructive version it
suffices to solve the constructive version for every alternative other than a; but in
some cases, the destructive version is easy while the constructive version is not.
Computational hardness has also been considered as a way of avoiding other
undesirable behavior. This includes control problems, where the chair of the elec-
tion has (partial) control over some aspects of the election (such as which alterna-
tives are in the running or which voters get to participate) and tries to use this to
get a particular alternative to win [16, 75, 113, 132, 179]. Another example is the
bribery problem, where some interested party attempts to bribe the voters to bring
about a particular outcome [108, 112, 113].
One downside of using NP-hardness to prevent undesirable behavior – whether
it be manipulation, control, or bribery, but let us focus on manipulation – is that
it is a worst-case measure of hardness. This means that if the manipulation prob-
lem is NP-hard, it is unlikely that there is an efficient algorithm that solves all
instances of the manipulation problem. However, there may still be an efficient
algorithm that solves many of these instances fast. If so, then computational hard-
ness provides only partial protection to manipulation, at best. It would be much
more desirable to show that manipulation is usually hard. Recent results have cast
doubt on whether this is possible at all. For instance, it was shown that when pref-
erences are single-peaked, many of the manipulation problems that are known to
be NP-hard for general preferences become efficiently solvable [46, 111]. In other
work on certain distributions of unrestricted preferences, both theoretical and ex-
perimental results indicate that manipulation is often computationally easy [e.g.,
71, 176, 177, 214, 217, 218]. Extending a previous result by Friedgut et al. [119],
Isaksson et al. [139] have recently shown that efficiently manipulable instances
are ubiquitous under fairly general conditions.
Theorem 6.4 (Isaksson et al., 2010) Let f be a neutral resolute voting rule and
assume that preferences are uniformly distributed. The probability that a random
preference profile can be manipulated by a random voter by submitting random
238 Chapter 6
Barberà [12] and Procaccia [175] provide further examples and characterizations.
However, all of these SDSs require an extreme degree of randomization.15
inconsistent lotteries like always picking b from {a, b} and a from {a, b, c}.
240 Chapter 6
However, for weaker (incomplete) preference relations over sets, more posi-
tive results can be obtained [e.g., 39, 41]. Brandt [39], for instance, has shown that
the minimal covering set and the bipartisan set (mentioned in Section 2.2.2) are
non-manipulable when one set of alternatives is preferred to another if and only if
everything in the former is preferred to everything in the latter.
nonmanipulator votes is only partially known to the manipulator [79], which is another problem
that is closely related to the possible/necessary winner problem.
Chapter 6 241
4 Combinatorial Domains
So far we have presented the classical mathematical framework for studying dif-
ferent variants of the problem of social choice and we have seen examples of
questions regarding the computational properties of this framework. Next, we will
consider a social choice problem where computational considerations already play
a central role at the level of defining the formal framework to study this problem.
The problem in question is the problem of social choice in combinatorial do-
mains. To simplify matters, we will focus specifically on voting in combinatorial
domains.
Let us begin with an example. Suppose three agents need to agree on a menu
for dinner. The options for the starter are salad and oyster; the options for the
main course are trout and veal; and the options for the wine are red and white.
The favorite menus of our three agents are as follows.
Agent 1: salad-trout-white
Agent 2: salad-veal-red
Agent 3: oyster-veal-white
Agent 1 likes trout and naturally wants to combine this with a white wine; agent 2
likes veal (which may be paired with either red or white wine) and has a preference
for red wine; and agent 3 likes oyster and veal, which calls for a white wine. Now,
what menu should our agents choose as a group, and how should they make that
choice? Maybe the most natural approach is to use the plurality rule on each of the
three issues: there is a majority for salad, there is a majority for veal, and there is
a majority for white wine. That is, the group menu will be salad-veal-white. But
this very conceivably could be one of the worst possible choices for our agents:
like agent 2, they may very well all prefer to have a red wine with salad and veal.
19 The
complexity of the possible winner problem for scoring rules has been completely charac-
terized by Betzler and Dorn [22] and Baumeister and Rothe [21].
242 Chapter 6
What went wrong here? The problem is that the preferences of the agents over
the choices made for each of the three issues are not independent. For instance,
our little story suggested that for all of them their preferred choice of wine depends
on what starter and main course they will actually get served. But voting issue-by-
issue completely ignores this dependency, and so we should not be too surprised
if we get a paradoxical outcome.
Note also that the next most obvious approach, which would be to directly
vote on full menus, does not work very well either. If we ask each agent only
for its most preferred menu (as we have done above), we will typically get three
different answers, and the best we can do is to randomly select one of the three.
We could refine this approach further, and ask, say, for their five most preferred
menus and apply, say, the Borda rule. This might lead to an acceptable solution
in our little example, but imagine we are dealing with a choice problem with 10
binary issues and thus 210 = 1, 024 alternatives: the most preferred alternatives of
our three agents might very well be entirely disjoint again.
A full description of our example should actually list the full preferences of
each of our three agents over the combinatorial domain D = {salad, oyster} ×
{trout, veal} × {red, white}, i.e., over a set of eight alternatives. Note that the
number of alternatives is exponential in the number of issues. But this means that
even for examples with a slightly larger number of issues it can quickly become
practically infeasible for the agents to rank all the alternatives and communicate
this ranking. That is, there is a fundamental computational challenge hidden at the
very heart of voting in combinatorial domains: even a small problem description
immediately gives rise to a very large choice problem.
In our little example there actually is a good solution: For all three agents,
their preferences regarding the wine depend on the choices made for the starter
and the main course, while their preferences for those two issues do not depend
on anything else (we have not actually described our example in enough detail
before to be sure about the latter fact, but let us now assume that this is indeed
the case). We can use these dependencies to determine a good order in which to
vote on each of the three issues in sequence. As long as we vote on the wine at
the very end, there will not be any paradoxical outcome (nor will there be any
computational difficulty).20
So, if we first use the plurality rule to choose a starter and a main course, our
agents are likely to choose the salad and the veal. If we then fix these choices
and ask the agents to vote on the wine, they will select the red wine, yielding an
outcome (salad-veal-red) that is ideal for agent 2 and not unreasonable for the
other two.
20 Thisis assuming that agents do not vote strategically; we will discuss this point more at the
end of Section 4.2.
Chapter 6 243
The kind of paradox we have seen has long been observed and studied in
political science, typically under the name of “multiple-election paradoxes” [35].
As a problem that is inherently computational in nature it was first formulated by
Lang [148].
As the representation of an agent’s preferences plays a central role in social
choice in combinatorial domains, we will first review the most important knowl-
edge representation languages that have been used in the literature to this end.
We will then focus on two types of promising approaches: sequential voting and
voting by means of compactly represented preferences.
R
X -Y - Z
x x̄ x : y ȳ xy : z z̄
x̄ : ȳ y xȳ : z̄ z
x̄y : z̄ z
x̄ȳ : z̄ z
A CP-net induces a partial order: if two alternatives differ only in the instantiation
of a single variable, then we can look up the corresponding entry in the table for
that variable to find how the two alternatives should be ranked. The full partial
order is the transitive closure of the relations we obtain by interpreting the indi-
vidual preference statements in this manner. For instance, given the CP-net above,
we prefer xyz̄ to xȳz̄, because the two differ only in their assignment to Y , and the
first statement in the table for variable Y says that when X = x, then we should
prefer Y = y over Y = ȳ, everything else being equal (i.e., Z = z̄ in both cases).
The full preference relation induced by the CP-net above is the following partial
order (where an arrow represents and the rankings obtained by transitivity are
not shown explicitly):
x̄yz̄
xyz → xyz̄ → xȳz̄ → x̄ȳz̄ → x̄ȳz → x̄yz
xȳz
Note that, for instance, x̄ȳz̄ and xȳz are incomparable: the CP-net does not specify
which of the two the agent prefers.
Another important family of languages for preference representation is that of
prioritized goals [81, 148]. Prioritized goals are applicable when each of the
variables defining the combinatorial domain has exactly two possible values (e.g.,
true and false, or 1 and 0). The basic idea is to describe the goals of the agent
whose preferences we are modeling as formulas of propositional logic. For exam-
ple, the formula X ∨ Y expresses the goal of having at least one of the variables
X and Y take the value true, while X → ¬(Y ∧ Z) says that whenever X is true,
then it should not be the case that both Y and Z are true as well. Usually not all
goals will be satisfiable. An agent can indicate the importance of each of its goals
by labeling it with a number, its priority level (suppose a higher number indicates
higher importance). Different interpretations of this kind of language are possible.
One choice is the lexicographic interpretation, under which we prefer alternative
a to alternative b if there exists a k such that for every priority level above k both
Chapter 6 245
a and b satisfy the same number of goals, while a satisfies more goals of priority
level k than b does. For example, if an agent has the goals X and ¬Y and the
former has higher priority than the latter, then this induces the preference order
xȳ xy x̄ȳ x̄y.
Both CP-nets and prioritized goals define preferences that are ordinal: either
linear or weak orders, as commonly used in classical social choice theory, or par-
tial orders and preorders, which can be particularly appropriate for applications
in multiagent systems, where we may want to model explicitly the fact that an
agent has bounded rationality and is lacking either the computational resources or
the necessary information to completely rank all possible pairs of alternatives. In
Section 5.1, in the context of discussing fair division problems, we will see further
examples of preference representation languages, namely languages for modeling
cardinal preferences (i.e., valuation functions).
we assume that each issue can take only two possible values and that the agents’
true preferences are common knowledge, then it is clear how to vote strategically.
This is because in the last round (when voting over the last issue) the agents are
effectively voting over the two remaining alternatives, so each agent is best off
voting for its preferred one; based on this, in the second-to-last round, the agents
can predict which alternative will end up chosen in the last round as a function
of which value the current (second-to-last) issue ends up taking, so effectively
the agents are deciding between the corresponding two alternatives; and so on.
Specifically, this means that under these circumstances, strategic sequential voting
is bound to result in the election of the Condorcet winner, whenever it exists [145].
On the other hand, Xia et al. [226] show that, unfortunately, for some profiles
without a Condorcet winner, strategic sequential voting results in very poor out-
comes; in fact, this happens even when the agents’ preferences for earlier issues
never depend on the agents’ preferences for later issues, because they will not
necessarily vote truthfully. (Incidentally, the strategic sequential voting process is
a special case of multistage sophisticated voting [128, 159, 162].) Xia et al. [226]
also show that the outcome can be very sensitive to the order in which the issues
are voted on, potentially giving the agenda setter a significant amount (or even
complete) control over the outcome. The complexity of this control problem has
been studied in a non-strategic context [75].
How can we address such preferences, without falling back on making the
agents rank all the exponentially many alternatives, but rather by making use of a
compact preference representation language, such as CP-nets? This problem has
been introduced by Lang [148] under the name of “combinatorial voting,” i.e., vot-
ing by means of ballots that are compactly represented preference structures. Un-
fortunately, the computational complexity of this approach is often prohibitively
high. For example, Lang [148] shows that computing the election winners – when
each voter specifies its preferences using the language of prioritized goals and (a
suitable generalized form of) the plurality rule is used – is coNP-hard, even when
each voter only states a single goal. Similarly, deciding whether there exists a
Condorcet winner is coNP-hard under the same conditions. For languages that
can express richer preference structures, the complexity of winner determination
will typically be beyond NP.
One useful property of preferences represented by a CP-net is that, if we hold
the values of all but one issue fixed, then the CP-net specifies the agent’s prefer-
ences over that remaining issue. While it is not computationally efficient to do
so, conceptually, we can consider, for every issue and for every possible setting
of the other issues, all agents’ preferences. We can then choose winners based on
these “local elections” [78, 153, 223]. For example, we can look for an alternative
that defeats all of its neighboring alternatives (that is, the alternatives that differ
on only one issue from it) in pairwise elections. Of course, there may be more
than one such alternative, or none. The maximum likelihood approach mentioned
earlier in this chapter has also been pursued in this context [225].
Developing practical algorithms for voting in combinatorial domains is one of
the most pressing issues on the research agenda for computational social choice
in the near future.
5 Fair Division
So far we have discussed social choice, in its most general and abstract form, as the
problem of choosing one or several “alternatives,” or as the problem of ranking
them. An alternative might be a candidate to be elected to political office or it
might be a joint plan to be executed by a group of software agents. In principle,
it might also be an allocation of resources to a group of agents. In this section,
we specifically focus on this latter problem of multiagent resource allocation.
To underline our emphasis on fairness considerations we shall favor the term
fair division. In the economics literature, the broad research area concerned with
determining a fair and economically efficient allocation of resources in society is
known as welfare economics. We will introduce some of the fundamental concepts
from this literature, discuss them from an algorithmic point of view, and review
248 Chapter 6
allocation. For a more extensive review of the variety of multiagent resource al-
location problems and computational aspects of fairness than is possible here we
refer to the survey article by Chevaleyre et al. [54].
/
v(0) = 0 v({p, q} = 5
v({p}) = 5 v({p, r}) = 5
v({q}) = 5 v({q, r}) = 8
v({r}) = 0 v({p, q, r}) = 8
That is, obtaining one of p and q (or both) has value 5 for the agent, and in the
event it obtains the latter, obtaining r on top of it results in an additional value
of 3.
250 Chapter 6
• Under the utilitarian CUF, fu (u1 , . . . , un ) := ∑i∈N ui , i.e., the social welfare
of an allocation is the sum of the utilities of the individual agents. This
is a natural way of measuring the quality of an allocation: the higher the
average utility enjoyed by an agent, the higher the social welfare. On the
other hand, this CUF hardly qualifies as fair: an extra unit of utility awarded
to the agent currently best off cannot be distinguished from an extra unit of
utility awarded to the agent currently worst off. Note that authors simply
writing about “social welfare” are usually talking about utilitarian social
welfare.
• Under the egalitarian CUF, fe (u1 , . . . , un ) := min{ui | i ∈ N}, i.e., the so-
cial welfare of an allocation is taken to be the utility of the agent worst off
under that allocation. This CUF clearly does focus on fairness, but it is
less attractive in view of economic efficiency considerations. In the special
252 Chapter 6
case where we are only interested in allocations where each agent receives
(at most) one item, the problem of maximizing egalitarian social welfare is
also known as the Santa Claus problem [9].
Any CUF gives rise to a social welfare ordering (SWO), a transitive and com-
plete order on the space of utility vectors (in the same way as an individual utility
function induces a preference relation). We can also define SWOs directly. The
most important example in this respect is the leximin ordering. For the following
definition, suppose that all utility vectors are ordered, i.e., u1 ≤ u2 ≤ · · · ≤ un .
Under the leximin ordering, (u1 , . . . , un ) is socially preferred to (v1 , . . . , vn ) if and
only if there exists a k ≤ n such that ui = vi for all i < k and uk > vk . This is a re-
finement of the idea underlying the egalitarian CUF. Under the leximin ordering,
we first try to optimize the well-being of the worst-off agent. Once our options in
this respect have been exhausted, we try to optimize the situation for the second
worst-off agent, and so forth.
SWOs have been studied using the axiomatic method in a similar manner as
SWFs and SCFs. Let us briefly review three examples of axioms considered in
this area.
in this context. Instead, let us assume that agents are individually rational and my-
opic. This means that we assume that an agent will agree to participate in a deal
if and only if that deal increases the agent’s utility. On the other hand, it will not
try to optimize its payoff in every single deal and it does not plan ahead beyond
the next deal to be implemented.
Formally, a deal is a pair of allocations δ = (A, A ) with A = A , describing the
situation before and after the exchange of goods. Note that this definition permits
exchanges involving any number of agents and goods at a time. Bilateral deals,
involving only two agents, or simple deals, involving only one item, are special
cases. For the result we want to present in some detail here, we assume that a deal
may be combined with monetary side payments to compensate some of the agents
for a loss in utility. This can be modeled via a function p : N → R, mapping each
agent to the amount it has to pay (or receive, in case p(i) is negative), satisfying
p(1) + · · · + p(n) = 0, i.e., the sum of all payments made must equal the sum of
all payments received. A deal δ = (A, A ) is individually rational if there exists
a payment function p such that vi (A (i)) − vi (A(i)) > p(i) for all agents i ∈ N,
with the possible exception of p(i) = 0 in case A(i) = A (i). That is, a deal is
individually rational if payments can be arranged in such a way that for each
agent involved in the deal its gain in valuation exceeds the payment it has to make
(or its loss in valuation is trumped by the money it receives). We shall assume that
every deal made is individually rational in this sense. Note that we do not force
agents to make deals under these conditions; we simply assume that any deal that
is implemented is (at least) individually rational.
Now, by a rather surprising result due to Sandholm [189], any sequence of
individually rational deals must converge to an allocation with maximal utilitarian
social welfare. That is, provided all agents are individually rational and continue
to make individually rational deals as long as such deals exist, we can be certain
that the resulting sequence of deals must be finite and that the final allocation
reached must be socially optimal in the sense of maximizing utilitarian social
welfare. For a detailed discussion and a full proof of this result we refer to the
work of Endriss et al. [104]. The crucial step in the proof is a lemma that shows
that, in fact, a deal is individually rational if and only if it increases utilitarian
social welfare. Convergence then follows from the fact that the space of possible
allocations is finite.
How useful is this convergence result in practice? Of course, the complexity
results discussed in Section 5.3 did not just go away: finding an allocation that
maximizes utilitarian social welfare is still NP-hard. Indeed, to decide whether it
is possible to implement yet another individually rational deal, our agents do have
to solve an NP-hard problem (in practice, most of the time they might find it easy
to identify a good deal, but in the worst case this can be very hard). Also the struc-
Chapter 6 257
tural complexity of the deals required (in the worst case) is very high. Indeed, if
our agents use a negotiation protocol that excludes deals involving a certain num-
ber of agents or goods, then convergence cannot be guaranteed any longer [104].
On the other hand, a simple positive result shows that, if all valuation functions
are additive, then we can get away with a protocol that only allows agents to make
deals regarding the reallocation of one item at a time. Unfortunately, this is the
best result we can hope for along these lines: for no strictly larger class of valu-
ation functions will a simple protocol of one-item-at-a-time deals still suffice to
guarantee convergence [60].
Similar results are also available for other fairness and efficiency criteria,
such as Pareto optimality [104], egalitarian social welfare [104], and envy-
freeness [55]. Most of the work in the field is of a theoretical nature, but the
convergence problem has also been studied using agent-based simulations [4, 50].
Some of the results in the literature are based on the same notion of myopic
individual rationality used here and others rely on other rationality assumptions.
In fact, there are two natural perspectives regarding this point. First, we might
start by postulating reasonable assumptions regarding the rational behavior of in-
dividual agents and then explore what convergence results can be proven. Second,
we might start with a convergence property we would like to be able to guarantee,
and then design appropriate rationality assumptions that will allow us to prove
the corresponding theorem. That is, we may think of a multiagent system as,
first, a system of self-interested agents we cannot control (but about which we can
make certain assumptions) or, second, as a system of agents the behavior of which
we can design and program ourselves, as a tool for distributed problem solving.
Which perspective is appropriate depends on the application at hand.
Finally, the distributed approach to multiagent resource allocation also gives
rise to new questions regarding computational complexity. For instance, we might
ask how hard it is to decide whether a given profile of valuation functions and
a given initial allocation admit a path consisting only of individually rational
deals involving the exchange of a single item each to a socially optimal allo-
cation. Questions of this kind have been studied in depth by Dunne and col-
leagues [93, 94].
6 Conclusion
This chapter has been an introduction to classical social choice theory and an ex-
position of some of the most important research trends in computational social
choice. We have argued in the beginning that social choice theory, the mathe-
matical theory of collective decision making, has a natural role to play when we
think about the foundations of multiagent systems. As we are concluding the
258 Chapter 6
chapter, we would like to relativize this statement somewhat: it is true that many
decision problems in multiagent systems are naturally modeled as problems of
social choice, but it is also true that many of the problems that one is likely to
encounter in practice will not fit the template provided by the classical formal
frameworks introduced here exactly, or will have additional structure that can be
exploited. More research is required to improve our understanding of best prac-
tices for adapting the elegant mathematical tools that computational social choice
can provide to the problems encountered by practitioners developing real multi-
agent systems. We hope that readers of this chapter will feel well equipped to
participate in this undertaking.
Let us conclude with a brief review of additional topics in computational social
choice, which we have not been able to cover in depth here, as well as with a few
pointers to further reading.
design side, there has been interest in designing voting rules that are false-name-
proof [227], that is, robust to a single voter participating under multiple identities.
While this is not an inherently computational topic, it is motivated by applica-
tions such as elections that take place on the Internet. The design of such rules
has been studied both in general [62] and under single-peaked preferences [208].
Unfortunately, the results are rather negative here. To address this, other work has
extended the model, for example by making it costly to obtain additional identi-
ties [212] or by using social network structure to identify “suspect” identities [77].
An overview of work on false-name-proofness is given by Conitzer and Yokoo
[72]. Another exciting new direction in the intersection of computational social
choice and mechanism design is that of approximate mechanism design without
money [178], where the goal is to obtain formal approximation ratios under the
constraint of strategyproofness, without using payments.
In terms of techniques, we have focused on the axiomatic method, on algo-
rithms, and on computational complexity. We have also discussed the use of tools
from knowledge representation (for the representation of preferences in combina-
torial domains). A further important research trend in computational social choice
has considered communication requirements in social choice. This includes top-
ics such as the amount of information that voters have to supply before we can
compute the winner of an election [69, 196], the most efficient form of storing an
intermediate election result that will permit us to compute the winner once the re-
maining ballots have been received [59, 219], whether voters can jointly compute
the outcome of a voting rule while preserving the privacy of their individual pref-
erences [44], and the number of deals that agents have to forge before a socially
optimal allocation of goods will be found [103].
Another technique we have not discussed concerns the use of tools developed
in automated reasoning to verify properties of social choice mechanisms and to
confirm or discover theorems within social choice theory. Examples in this line
of work include the verification of proofs of classical theorems in social choice
theory in higher-order theorem proofs [168], a fully automated proof of Arrow’s
theorem for the special case of three alternatives [205], and the automated discov-
ery of theorems pertaining to the problem of ranking sets of objects [123].
(i.e., fair division) and cooperative game theory. Two highly recommended sur-
veys are those of Plott [173] and Sen [200].
The area of computational social choice (or certain parts thereof) has been
surveyed by several authors. Chevaleyre et al. [56] provide a broad overview of
the field as a whole. The literature on using computational complexity as a bar-
rier against manipulation in voting is reviewed by Faliszewski et al. [114] and
Faliszewski and Procaccia [107]; Faliszewski et al. [110] also discuss the com-
plexity of winner determination and control problems in depth. Chevaleyre et al.
[58] give an introduction to social choice in combinatorial domains. The survey
on multiagent resource allocation by Chevaleyre et al. [54] covers the basics of
fair division and also discusses connections to other areas relevant to multiagent
systems, particularly combinatorial auctions. Conitzer [64, 65] compares research
directions across mechanism design, combinatorial auctions, and voting. Endriss
[101] gives concise proofs of classical results such as Arrow’s theorem and the
Gibbard-Satterthwaite theorem, and then discusses application of logic in social
choice theory, e.g., in judgment aggregation and to model preferences in combina-
torial domains. Rothe et al. [186] provide a book-length introduction to computa-
tional social choice (in German), covering topics in voting, judgment aggregation,
and fair division, and focusing particularly on complexity questions. Finally, the
short monograph of Rossi et al. [183] on preference handling includes extensive
discussions of voting and matching from the point of view of computational social
choice.
Acknowledgments
Felix Brandt gratefully acknowledges DFG grants BR 2312/6-1, BR 2312/7-1,
BR 2312/9-1, and BR 2312/10-1, for support. Vincent Conitzer gratefully ac-
knowledges NSF awards IIS-0812113, IIS-0953756, and CCF-1101659, as well
as an Alfred P. Sloan fellowship, for support. The work of Ulle Endriss has been
partially supported by an NWO Vidi grant (639.022.706). All three authors fur-
thermore acknowledge the support of the ESF EUROCORES project “Computa-
tional Foundations of Social Choice.”
7 Exercises
1. Level 1 A utility function u : U → R is said to represent a preference re-
lation on U if, for all a, b ∈ U, u(a) ≥ u(b) if and only if a b. Show
that, when U is finite, a preference relation can be represented by a utility
function if and only if it is transitive and complete.
Chapter 6 261
(a) Level 1 Show that Pareto optimality is strictly stronger than non-
imposition. That is, show that every Pareto optimal SWF is non-
imposing and that there exists a non-imposing SWF that is not Pareto
optimal.
(b) Level 2 Show that Arrow’s theorem ceases to hold when we replace
Pareto optimality by non-imposition. That is, show that there ex-
ists a SWF that satisfies IIA and that is both non-imposing and non-
dictatorial.
3. Level 2 Prove that the four conditions in Theorem 6.2 are logically inde-
pendent by providing, for each of the conditions, an SCF that violates this
property but satisfies the other three.
4. Level 2 Show that every Copeland winner lies in the uncovered set and
hence reaches every other alternative on a majority rule path of length at
most two (assuming an odd number of voters).
5. Level 1 Consider the following preference profile for 100 voters (due to
Michel Balinski).
33 16 3 8 18 22
a b c c d e
b d d e e c
c c b b c b
d e a d b d
e a e a a a
Determine the winners according to plurality, Borda’s rule, Copeland’s rule,
STV, and plurality with runoff (which yields the winner of a pairwise com-
parison between the two alternatives with the highest plurality score).
10. Level 2 Assume there is an odd number of voters, and rank the alternatives
by their Copeland scores. Prove that there are no cycles in the pairwise
majority relation if and only if no two alternatives are tied in this Copeland
ranking.
(a) Every preference profile for two voters and three alternatives is single-
peaked.
(b) Every preference profile for two voters and more than three alterna-
tives is single-peaked.
(c) Every single-peaked preference profile is single-peaked with respect
to the linear order given by the preferences of one of the voters.
(d) Plurality and Condorcet winners coincide for single-peaked prefer-
ences.
(e) Plurality and Condorcet winners coincide for single-caved prefer-
ences.
(f) Borda and Condorcet winners coincide for single-peaked preferences.
12. Level 4 We have seen that any non-dictatorial voting rule can be manipu-
lated when we want that rule to operate on all possible preference profiles.
We have also seen that this problem can be avoided when we restrict the
domain of possible profiles appropriately, e.g., to single-peaked profiles.
What we have not discussed is the frequency of manipulability: how often
will we encounter a profile in which a voter has an incentive to manipu-
late? One way of studying this problem is by means of simulations: gener-
ate a large number of profiles and check for which proportion of them the
problem under consideration (here, manipulability) occurs. The standard
Chapter 6 263
(a) Level 2 When there are n voters and m alternatives, how many bits of
communication does this protocol require in the worst case? Hints:
• If there are i alternatives left, how many bits does an agent need
to communicate to indicate its most-preferred one among them?
264 Chapter 6
• If there are i alternatives left and we remove the one with the
fewest votes, what is an upper bound on how many agents need to
indicate a new most-preferred alternative among the i − 1 remain-
ing ones?
(b) Level 4 Using tools from communication complexity [144], a lower
bound of Ω(n log m) bits of information in the worst case has been
shown to hold for any communication protocol for the STV rule [69].
This leaves a gap with the result from (a). Can you close the gap,
either by giving a better protocol or a better lower bound?
(a) First state your answer (and your proof) with respect to the explicit
form of representing valuation functions (where the size of the rep-
resentation of a function is proportional to the number of bundles to
which it assigns a non-zero value).
(b) Then repeat the same exercise, this time assuming that valuation func-
tions are expressed using the language of weighted goals (without
restrictions to the types of formulas used). Hint: You might expect
that the complexity will increase, because now the input will be repre-
sented more compactly (on the other hand, as discussed in Section 5.3,
there was no such increase in complexity for the utilitarian CUF).
Note that both of these languages can express valuation functions that need
not be monotonic (that is, simply giving all the items to one agent will
usually not yield an allocation with maximal elitist social welfare).
18. Level 4 Consider a fair division problem with an odd number of agents.
Under the median-rank dictator CUF the social welfare of an allocation is
equal to the utility of the middle-most agent: fmrd (u1 , . . . , un ) := ui∗ , where
i∗ is defined as the (not necessarily unique) agent for which |{i ∈ N | ui ≤
ui∗ }| = |{i ∈ N | ui ≥ ui∗ }|. This is an attractive form of measuring social
welfare: it associates social welfare with the individual utility of a repre-
sentative agent, while being less influenced by extreme outliers than, for
instance, the utilitarian CUF. At the time of writing, most of the problems
discussed in the section on fair division have not yet been investigated for
the median-rank dictator CUF.
References
[1] A. Ali and M. Meila. Experiments with Kemeny ranking: What works when?
Mathematical Social Sciences, 2012. Forthcoming.
[3] A. Altman and M. Tennenholtz. Axiomatic foundations for ranking systems. Jour-
nal of Artificial Intelligence Research, 31:473–495, 2008.
[4] M. Andersson and T. Sandholm. Contract type sequencing for reallocative negotia-
tion. In Proceedings of the 20th International Conference on Distributed Comput-
ing Systems (ICDCS-2000), pages 154–160. IEEE Computer Society Press, 2000.
[5] K. J. Arrow. Social Choice and Individual Values. New Haven: Cowles Founda-
tion, 1st edition, 1951. 2nd edition 1963.
[8] E. Baharad and Z. Neeman. The asymptotic strategyproofness of scoring and Con-
dorcet consistent rules. Review of Economic Design, 4:331–340, 2002.
[9] N. Bansal and M. Sviridenko. The Santa Claus problem. In Proceedings of the 38th
Annual ACM Symposium on Theory of Computing (STOC), pages 31–40. ACM
Press, 2006.
[10] S. Barberà. The manipulation of social choice mechanisms that do not leave “too
much” to chance. Econometrica, 45(7):1573–1588, 1977.
[15] J. P. Barthelemy and B. Monjardet. The median procedure in cluster analysis and
social choice theory. Mathematical Social Sciences, 1:235–267, 1981.
[16] J. Bartholdi, III, C. Tovey, and M. Trick. How hard is it to control an election?
Mathematical and Computer Modelling, 16(8-9):27–40, 1992.
[17] J. Bartholdi, III and J. B. Orlin. Single transferable vote resists strategic voting.
Social Choice and Welfare, 8(4):341–354, 1991.
[18] J. Bartholdi, III and M. Trick. Stable matching with preferences derived from a
psychological model. Operations Research Letters, 5(4):165–169, 1986.
[20] J. Bartholdi, III, C. A. Tovey, and M. A. Trick. Voting schemes for which it can
be difficult to tell who won the election. Social Choice and Welfare, 6(3):157–165,
1989.
[21] D. Baumeister and J. Rothe. Taking the final step to a full dichotomy of the pos-
sible winner problem in pure scoring rules. Information Processing Letters, 2012.
Forthcoming.
[22] N. Betzler and B. Dorn. Towards a dichotomy for the possible winner problem in
elections based on scoring rules. Journal of Computer and System Sciences, 76(8):
812–836, 2010.
[27] D. Black. The Theory of Committees and Elections. Cambridge University Press,
1958.
[33] S. Bouveret and J. Lang. Efficiency and envy-freeness in fair division of indivisible
goods: Logical representation and complexity. Journal of Artificial Intelligence
Research, 32(1):525–564, 2008.
[35] S. Brams, D. Kilgour, and W. Zwicker. The paradox of multiple elections. Social
Choice and Welfare, 15(2):211–236, 1998.
[37] S. J. Brams and A. D. Taylor. Fair Division: From Cake-cutting to Dispute Reso-
lution. Cambridge University Press, 1996.
[38] F. Brandt. Some remarks on Dodgson’s voting rule. Mathematical Logic Quarterly,
55(4):460–463, 2009.
[40] F. Brandt. Minimal stable sets in tournaments. Journal of Economic Theory, 146
(4):1481–1499, 2011.
[41] F. Brandt and M. Brill. Necessary and sufficient conditions for the strategyproof-
ness of irresolute social choice functions. In K. Apt, editor, Proceedings of the 13th
Conference on Theoretical Aspects of Rationality and Knowledge (TARK), pages
136–142. ACM Press, 2011.
[42] F. Brandt and F. Fischer. Computing the minimal covering set. Mathematical
Social Sciences, 56(2):254–268, 2008.
[44] F. Brandt and T. Sandholm. Unconditional privacy in social choice. In R. van der
Meyden, editor, Proceedings of the 10th Conference on Theoretical Aspects of Ra-
tionality and Knowledge (TARK), pages 207–218. ACM Press, 2005.
[53] I. Charon and O. Hudry. A survey on the linear ordering problem for weighted or
unweighted tournaments. 4OR, 5(1):5–60, 2007.
[60] Y. Chevaleyre, U. Endriss, and N. Maudet. Simple negotiation schemes for agents
with simple preferences: Sufficiency, necessity and maximality. Journal of Au-
tonomous Agents and Multiagent Systems, 20(2):234–259, 2010.
[64] V. Conitzer. Making decisions based on the preferences of multiple agents. Com-
munications of the ACM, 53(3):84–94, 2010.
[66] V. Conitzer. Should social network structure be taken into account in elections?
Mathematical Social Sciences, 2012. Forthcoming.
[68] V. Conitzer and T. Sandholm. Universal voting protocol tweaks to make manipu-
lation hard. In Proceedings of the 18th International Joint Conference on Artificial
Intelligence (IJCAI), pages 781–788, 2003.
[70] V. Conitzer and T. Sandholm. Common voting rules as maximum likelihood esti-
mators. In Proceedings of the 21st Annual Conference on Uncertainty in Artificial
Intelligence (UAI), pages 145–152, 2005.
[71] V. Conitzer and T. Sandholm. Nonexistence of voting rules that are usually hard
to manipulate. In Proceedings of the 21st National Conference on Artificial Intel-
ligence (AAAI), pages 627–634. AAAI Press, 2006.
[72] V. Conitzer and M. Yokoo. Using mechanism design to prevent false-name ma-
nipulations. AI Magazine, 31(4):65–77, 2010. Special Issue on Algorithmic Game
Theory.
[74] V. Conitzer, T. Sandholm, and J. Lang. When are elections with few candidates
hard to manipulate? Journal of the ACM, 54(3), 2007.
[75] V. Conitzer, J. Lang, and L. Xia. How hard is it to control sequential elections via
the agenda? In Proceedings of the Twenty-First International Joint Conference on
Artificial Intelligence (IJCAI), pages 103–108. AAAI Press, 2009.
[76] V. Conitzer, M. Rognlie, and L. Xia. Preference functions that score rankings and
maximum likelihood estimation. In Proceedings of the Twenty-First International
Joint Conference on Artificial Intelligence (IJCAI), pages 109–115. AAAI Press,
2009.
272 Chapter 6
[79] V. Conitzer, T. Walsh, and L. Xia. Dominating manipulations in voting with partial
information. In Proceedings of the National Conference on Artificial Intelligence
(AAAI), pages 638–643. AAAI Press, 2011.
[83] C. d’Aspremont and L. Gevers. Equity and the informational basis of collective
choice. Review of Economic Studies, 44(2):199–209, 1977.
[91] L. E. Dubins and E. H. Spanier. How to cut a cake fairly. American Mathematical
Monthly, 68(1):1–17, 1961.
[95] B. Dutta. Covering sets and a new Condorcet choice correspondence. Journal of
Economic Theory, 44:63–80, 1988.
[96] B. Dutta, H. Peters, and A. Sen. Strategy-proof cardinal decision schemes. Social
Choice and Welfare, 28(1):163–179, 2007.
[97] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for
the web. In Proceedings of the 10th International Conference on World Wide Web,
pages 613–622. ACM Press, 2001.
[98] E. Elkind and H. Lipmaa. Hybrid voting protocols and hardness of manipulation. In
Proceedings of the 16th International Symposium on Algorithms and Computation
(ISAAC), volume 3827 of Lecture Notes in Computer Science (LNCS), pages 206–
215. Springer-Verlag, 2005.
[99] E. Elkind, P. Faliszewski, and A. Slinko. On the role of distances in defining voting
rules. In Proceedings of the 9th International Joint Conference on Autonomous
Agents and Multi-Agent Systems (AAMAS), pages 375–382. IFAAMAS, 2010.
[101] U. Endriss. Logic and social choice theory. In A. Gupta and J. van Benthem, edi-
tors, Logic and Philosophy Today, volume 2, pages 333–377. College Publications,
2011.
274 Chapter 6
[102] U. Endriss and J. Lang, editors. Proceedings of the 1st International Workshop on
Computational Social Choice (COMSOC-2006), 2006. ILLC, University of Ams-
terdam.
[104] U. Endriss, N. Maudet, F. Sadri, and F. Toni. Negotiating socially optimal alloca-
tions of resources. Journal of Artificial Intelligence Research, 25:315–348, 2006.
[106] B. Escoffier, J. Lang, and M. Öztürk. Single-peaked consistency and its complexity.
In Proceedings of the 18th European Conference on Artificial Intelligence (ECAI),
pages 366–370. IOS Press, 2008.
[116] J. Farfel and V. Conitzer. Aggregating value ranges: Preference elicitation and
truthfulness. Journal of Autonomous Agents and Multiagent Systems, 22(1):127–
150, 2011. Special Issue on Computational Social Choice.
[117] M. Fey. Choosing from a large tournament. Social Choice and Welfare, 31(2):
301–309, 2008.
[118] P. C. Fishburn. The Theory of Social Choice. Princeton University Press, 1973.
[119] E. Friedgut, G. Kalai, and N. Nisan. Elections can be manipulated often. In Pro-
ceedings of the 49th Symposium on Foundations of Computer Science (FOCS),
pages 243–249. IEEE Computer Society Press, 2008.
[123] C. Geist and U. Endriss. Automated search for impossibility theorems in social
choice theory: Ranking sets of objects. Journal of Artificial Intelligence Research,
40:143–174, 2011.
[125] A. Gibbard. Manipulation of schemes that mix voting with chance. Econometrica,
45(3):665–681, 1977.
[127] M. Grabisch. k-order additive discrete fuzzy measures and their representation.
Fuzzy Sets and Systems, 92:167–189, 1997.
[132] E. Hemaspaandra, L. A. Hemaspaandra, and J. Rothe. Anyone but him: The com-
plexity of precluding an alternative. AI Journal, 171(5–6):255–285, 2007.
[136] O. Hudry. On the computation of median linear orders, of median complete pre-
orders and of median weak orders. Mathematical Social Sciences, 2012. Forth-
coming.
[141] C. Kenyon-Mathieu and W. Schudy. How to rank with few errors: A PTAS for
weighted feedback arc set on tournaments. In Proceedings of the 39th Annual ACM
Symposium on Theory of Computing (STOC), pages 95–103. ACM Press, 2007.
[142] K. Konczak and J. Lang. Voting procedures with incomplete preferences. In Pro-
ceedings of the 1st Multidisciplinary Workshop on Advances in Preference Han-
dling, 2005.
Chapter 6 277
[143] S. Konieczny and S. Pino Pérez. Logic based merging. Journal of Philosophical
Logic, 40(2):239–270, 2011.
[146] C. Lafage and J. Lang. Logical representation of preferences for group deci-
sion making. In Proceedings of the 7th International Conference on Principles
of Knowledge Representation and Reasoning (KR-2000), pages 457–468. Morgan
Kaufmann Publishers, 2000.
[147] G. Laffond, J.-F. Laslier, and M. Le Breton. The bipartisan set of a tournament
game. Games and Economic Behavior, 5:182–201, 1993.
[148] J. Lang. Logical preference representation and combinatorial vote. Annals of Math-
ematics and Artificial Intelligence, 42(1–3):37–71, 2004.
[149] J. Lang and L. Xia. Sequential composition of voting rules in multi-issue domains.
Mathematical Social Sciences, 57(3):304–324, 2009.
[151] J.-F. Laslier. Tournament Solutions and Majority Voting. Springer-Verlag, 1997.
[152] J.-F. Laslier and M. R. Sanver, editors. Handbook on Approval Voting. Studies in
Choice and Welfare. Springer-Verlag, 2010.
[157] K. May. A set of independent, necessary and sufficient conditions for simple ma-
jority decisions. Econometrica, 20:680–684, 1952.
[161] E. Meskanen and H. Nurmi. Closeness counts in social choice. In M. Braham and
F. Steffen, editors, Power, Freedom, and Voting. Springer-Verlag, 2008.
[164] H. Moulin. Choosing from a tournament. Social Choice and Welfare, 3:271–291,
1986.
[168] T. Nipkow. Social choice theory in HOL: Arrow and Gibbard-Satterthwaite. Jour-
nal of Automated Reasoning, 43(3):289–304, 2009.
[170] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking:
Bringing order to the Web. Technical Report 1999–66, Stanford University, 1999.
Chapter 6 279
[171] D. M. Pennock, E. Horvitz, and C. L. Giles. Social choice theory and recommender
systems: Analysis of the axiomatic foundations of collaborative filtering. In Pro-
ceedings of the 17th National Conference on Artificial Intelligence (AAAI), pages
729–734. AAAI Press, 2000.
[173] C. R. Plott. Axiomatic social choice theory: An overview and interpretation. Amer-
ican Journal of Political Science, 20(3):511–596, 1976.
[180] S. Ramezani and U. Endriss. Nash social welfare in multiagent resource alloca-
tion. In Agent-Mediated Electronic Commerce: Designing Trading Strategies and
Mechanisms for Electronic Markets, volume 59 of Lecture Notes in Business In-
formation Processing, pages 117–131. Springer-Verlag, 2010.
[185] J. Rothe, H. Spakowski, and J. Vogel. Exact complexity of the winner problem for
Young elections. Theory of Computing Systems, 36(4):375–386, 2003.
[189] T. Sandholm. Contract types for satisficing task allocation: I Theoretical results.
In Proceedings of the AAAI Spring Symposium on Satisficing Models, 1998.
[193] T. Schwartz. The Logic of Collective Choice. Columbia University Press, 1986.
[195] A. Scott and M. Fey. The minimal covering set in large tournaments. Social Choice
and Welfare, 38(1):1–9, 2012.
[196] I. Segal. The communication requirements of social choice rules and supporting
budget sets. Journal of Economic Theory, 136:341–378, 2007.
[201] A. K. Sen and P. K. Pattanaik. Necessary and sufficient conditions for rational
choice under majority decision. Journal of Economic Theory, 1:178–202, 1969.
[205] P. Tang and F. Lin. Computer-aided proofs of Arrow’s and other impossibility
theorems. Artificial Intelligence, 173(11):1041–1053, 2009.
[209] M. Truchon. Borda and the maximum likelihood approach to vote aggregation.
Mathematical Social Sciences, 55(1):96–102, 2008.
[212] L. Wagman and V. Conitzer. Optimal false-name-proof voting rules with costly vot-
ing. In Proceedings of the National Conference on Artificial Intelligence (AAAI),
pages 190–195. AAAI Press, 2008.
[214] T. Walsh. Where are the really hard manipulation problems? The phase transi-
tion in manipulating the veto rule. In Proceedings of the 21st International Joint
Conference on Artificial Intelligence (IJCAI), pages 324–329. AAAI Press, 2009.
[215] R. B. Wilson. Social choice theory without the Pareto principle. Journal of Eco-
nomic Theory, 5:478–486, 1972.
[217] L. Xia and V. Conitzer. A sufficient condition for voting rules to be frequently
manipulable. In Proceedings of the 9th ACM Conference on Electronic Commerce
(ACM-EC), pages 99–108. ACM Press, 2008.
[218] L. Xia and V. Conitzer. Generalized scoring rules and the frequency of coalitional
manipulability. In Proceedings of the ACM Conference on Electronic Commerce
(EC), pages 109–118. ACM Press, 2008.
[219] L. Xia and V. Conitzer. Compilation complexity of common voting rules. In Pro-
ceedings of the National Conference on Artificial Intelligence (AAAI), pages 915–
920. AAAI Press, 2010.
[220] L. Xia and V. Conitzer. Stackelberg voting games: Computational aspects and
paradoxes. In Proceedings of the National Conference on Artificial Intelligence
(AAAI), pages 921–926. AAAI Press, 2010.
[221] L. Xia and V. Conitzer. Determining possible and necessary winners under com-
mon voting rules given partial orders. Journal of Artificial Intelligence Research,
41:25–67, 2011.
[223] L. Xia, V. Conitzer, and J. Lang. Voting on multiattribute domains with cyclic
preferential dependencies. In Proceedings of the National Conference on Artificial
Intelligence (AAAI), pages 202–207. AAAI Press, 2008.
Chapter 6 283
[226] L. Xia, V. Conitzer, and J. Lang. Strategic sequential voting in multi-issue do-
mains and multiple-election paradoxes. In Proceedings of the ACM Conference on
Electronic Commerce (EC), pages 179–188. ACM Press, 2011.
[227] M. Yokoo, Y. Sakurai, and S. Matsubara. The effect of false-name bids in combi-
natorial auctions: New fraud in Internet auctions. Games and Economic Behavior,
46(1):174–188, 2004.
[229] H. P. Young. Social choice scoring functions. SIAM Journal on Applied Mathe-
matics, 28(4):824–838, 1975.
[231] H. P. Young. Condorcet’s theory of voting. The American Political Science Review,
82(4):1231–1244, 1988.
[233] D. Zuckerman. Linear degree extractors and the inapproximability of max clique
and chromatic number. Theory of Computing, 3(1):103–128, 2007.
Chapter 7
1 Introduction
Mechanism design is a strategic version of social choice theory, which adds the
assumption that agents will behave so as to maximize their individual payoffs.
For example, in an election agents may not vote their true preference. Like social
choice theory, however, the scope of mechanism design is broader than voting.
The most famous application of mechanism design is auction theory, to which
we devote the second part of this chapter. However, mechanism design has many
other applications.
Consider the transportation network described in Figure 7.1. The number next
to a given edge is the cost of transporting along that edge, but these costs are the
private information of the various shippers that own each edge. The task here is to
find the shortest (least-cost) path from S to T ; this is hard because the shippers may
lie about their costs. Your one advantage is that you know that they are interested
in maximizing their revenue. How can you use that knowledge to extract from
them the information needed to compute the desired path?
This is where mechanism design, or implementation theory, comes in. Mecha-
nism design is sometimes colloquially called “inverse game theory.” The problem
most conventionally addressed by game theory can be framed as follows: given
1 This chapter is distilled from Chapters 10 and 11 of Multiagent Systems: Algorithmic, Game-
Theoretic and Logical Foundations, published by Cambridge University Press [29]. This material
is reprinted with the permission of its original publisher.
286 Chapter 7
WVUT
PQRS 2 /WVUT
PQRS
? 88 ??
8 ??
3 88 ??2
88 ??
8 ??
88
WVUT
PQRS
s 881 WVUT
PQRSt
8 88 kk 5
?? kkk ?
?? 88 kk
?? 5 kkkk
?? kk k8k8k8
2 ?? k 1
kkk 8
k kkk
WVUT
PQRS /WPQRS
VUT
3
• O is a set of outcomes;
2.1 Implementation
Together, a Bayesian game setting and a mechanism define a Bayesian game. The
aim of mechanism design is to select a mechanism, given a particular Bayesian
game setting, whose equilibria have desirable properties. We now define the most
fundamental such property: that the outcomes that arise when the game is played
are consistent with a given social choice function.
strategy v4(T4)
type T4
Original outcome
Mechanism
strategy vq(Tq)
type Tq
strategy T4
type T4 v4(T4)
Original outcome
(Mechanism
strategy Tq
type Tq vq (Tq)
New Mechanism
Figure 7.2: The revelation principle: how to construct a new mechanism with a
truthful equilibrium, given an original mechanism with equilibrium (s1 , . . . , sn ).
mally, a direct mechanism is one in which the only action available to each agent
is to announce its private information. Since in a Bayesian game an agent’s pri-
vate information is its type, direct mechanisms have Ai = Θi . When an agent’s
set of actions is the set of all its possible types, it may lie and announce a type θ̂i
that is different from its true type θi . A direct mechanism is said to be truthful (or
incentive compatible) if, for any type vector θ, in equilibrium of the game defined
by the mechanism, every agent i’s strategy is to announce its true type, so that
θ̂i = θi . We can thus speak about incentive compatibility in dominant strategies
and Bayes–Nash incentive compatibility. Our claim that truthfulness can always
be achieved implies, for example, that the social choice functions implementable
by dominant-strategy truthful mechanisms are precisely those implementable by
strategyproof direct mechanisms. This means that we can, without loss of cover-
age, limit ourselves to a small sliver of the space of all mechanisms.
Theorem 7.1 (Revelation principle) If there exists any mechanism that imple-
ments a social choice function C in dominant strategies, then there exists a direct
mechanism that implements C in dominant strategies and is truthful.
degree. (Observe that the original mechanism may require them to reveal much
less information.) Finally, even if not objectionable on privacy grounds, this full
revelation can sometimes place an unreasonable burden on the communication
channel. For all these reasons, in practical settings one must apply the revelation
principle with caution.
3 Quasilinear Preferences
If we are to design a dominant-strategy truthful mechanism that is not dictatorial,
we are going to have to relax some of the conditions of the Gibbard–Satterthwaite
theorem. First, we relax the requirement that agents be able to express any pref-
erences and replace it with the requirement that agents be able to express any
preferences in a limited set. Second, we relax the condition that the mechanism
be onto. We now introduce our limited set of preferences.
Intuitively, we split outcomes into two pieces that are linearly related. First,
X represents a finite set of non-monetary outcomes, such as the allocation of an
object to one of the bidders in an auction or the selection of a candidate in an
election. Second, pi is the (possibly negative) payment made by agent i to the
mechanism, such as a payment to the auctioneer.
What does it mean to assume that agents’ preferences are quasilinear? First,
it means that we are in a setting in which the mechanism can choose to charge or
reward the agents by an arbitrary monetary amount. Second, and more restrictive,
it means that an agent’s degree of preference for the selection of any choice x ∈ X
is independent of its degree of preference for having to pay the mechanism some
amount pi ∈ R. Thus an agent’s utility for a choice cannot depend on the total
amount of money that it has (e.g., an agent cannot value having a yacht more if it
is rich than if it is poor). Finally, it means that agents care only about the choice
selected and about their own payments: in particular, they do not care about the
monetary payments made or received by other agents.
Strictly speaking, we have defined quasilinear preferences in a way that fixes
the set of agents. However, we generally consider families of quasilinear prob-
lems, for any set of agents. In the following we assume that a quasilinear utility
function is still defined when any one agent is taken away. In this case the set of
non-monetary outcomes must be updated (e.g., in an auction setting the missing
agent cannot be the winner), and is denoted by O−i . Similarly, the utility functions
ui and the choice function C must be updated accordingly.
We have also made another restrictive assumption about quasilinear prefer-
ences, albeit one commonly made: we assume that the agent values money in
the same units as it values utility. This assumption is called transferable utility,
because it means that utility can be shifted from one agent to another through
monetary transfers. It implies a second assumption, called risk neutrality, which
means that the agent’s value for a unit of currency is independent of the total
amount of money the agent has. For a discussion of what happens when these
assumptions are violated, please see [29].
• x : A "→ Π(X) maps each action profile to a distribution over choices, and
In effect, we have split the function M into two functions x and ℘, where x is
the choice rule and ℘ is the payment rule. We will use the notation ℘i to denote
the payment function for agent i.
A direct revelation mechanism in the quasilinear setting is one in which each
agent is asked to state its type.
more, observe that it is also meaningful to extend the concept of valuation beyond settings in
which conditional utility independence holds; in such cases, we say that agents do not know their
own valuations. We describe such settings in [29].
294 Chapter 7
agent i. We will use the notation v̂i ∈ Vi to denote the valuation that agent i declares
to such a direct mechanism, which may be different from its true valuation vi . We
denote the vector of all agents’ declared valuations as v̂ and the set of all possible
valuation vectors as V . Finally, we denote the vector of declared valuations from
all agents other than i as v̂−i .
Now we can state some properties that it is common to require of quasilinear
mechanisms.
∀i∀vi , Ev−i |vi [vi (x (si (vi ), s−i (v−i ))) −℘i (si (vi ), s−i (v−i ))] ≥ 0,
Finally, in some domains there will be many possible mechanisms that satisfy
the constraints we choose, meaning that we need to have some way of choosing
among them. (And as we will see later, for other combinations of constraints no
mechanisms exist at all.) The usual approach is to define an optimization problem
that identifies the optimal outcome in the feasible set. For example, although we
have defined efficiency as a constraint, it is also possible to soften the constraint
and require the mechanism to achieve as much social welfare as possible. Here
we define some other quantities that a mechanism designer can seek to optimize.
First, the mechanism designer can take a selfish perspective. Interestingly,
this goal turns out to be quite different from the goal of maximizing social wel-
fare. (We give an example of the differences between these approaches when we
consider single-good auctions in Section 5.)
least-happy agent the happiest. We also take an expected value over different val-
uation vectors, but we could instead have required a mechanism that does the best
in the worst case.
where s(v) denotes the agents’ equilibrium strategy profile in the worst equilib-
rium of the mechanism – that is, the one in which ∑i∈N vi (x (s(v))) is the smallest.
4 Efficient Mechanisms
Efficiency (Definition 7.10) is often considered to be one of the most important
properties for a mechanism to satisfy in the quasilinear setting. One reason is
that, whenever an inefficient choice is selected, it is possible to find a set of side
payments among the agents with the property that all agents would prefer the
efficient choice in combination with the side payments to the inefficient choice.
(Intuitively, the sum of agents’ valuations for the efficient choice is greater than
for the inefficient choice. Thus, the agents who prefer the efficient choice would
still strictly prefer it even if they had to make side payments to the other agents
so that each of them also strictly preferred the efficient choice.) A great deal of
research has considered the design of mechanisms that are guaranteed to select
efficient choices when agents follow dominant or equilibrium strategies. In this
section we survey these mechanisms.
Chapter 7 297
Proof. Consider a situation where every agent j other than i follows some arbi-
trary strategy v̂ j . Consider agent i’s problem of choosing the best strategy v̂i . As
a shorthand, we write v̂ = (v̂−i , v̂i ). The best strategy for i is one that solves
max (vi (x (v̂)) −℘(v̂)) .
v̂i
The only way in which the declaration v̂i influences the maximization above
is through the term vi (x (v̂)). If possible, i would like to pick a declaration v̂i that
will lead the mechanism to pick an x ∈ X which solves
max vi (x) + ∑ v̂ j (x) . (7.1)
x
j=i
Thus, agent i leads the mechanism to select the choice that it most prefers
by declaring v̂i = vi . Because this argument does not depend in any way on the
declarations of the other agents, truth-telling is a dominant strategy for agent i.
Intuitively, the reason that Groves mechanisms are dominant-strategy truth-
ful is that agents’ externalities are internalized. Imagine a mechanism in which
agents declared their valuations for the different choices x ∈ X and the mechanism
selected the efficient choice, but in which the mechanism did not impose any pay-
ments on agents. Clearly, agents would be able to change the mechanism’s choice
to another that they preferred by overstating their valuation. Under Groves mech-
anisms, however, an agent’s utility does not depend only on the selected choice,
because payments are imposed. Since agents are paid the (reported) utility of all
the other agents under the chosen allocation, each agent becomes just as inter-
ested in maximizing the other agents’ utilities as in maximizing its own. Thus,
once payments are taken into account, all agents have the same interests.
Groves mechanisms illustrate a property that is generally true of dominant-
strategy truthful mechanisms: an agent’s payment does not depend on the amount
of its own declaration. Although other dominant-strategy truthful mechanisms ex-
ist in the quasilinear setting, the next theorem shows that Groves mechanisms are
the only mechanisms that implement an efficient allocation in dominant strategies
among agents with arbitrary quasilinear utilities.
Theorem 7.4 (Green–Laffont) An efficient social choice function C : RXn "→
X × Rn can be implemented in dominant strategies for agents with unrestricted
quasilinear utilities only if ℘i (v) = h(v−i ) − ∑ j=i v j (x (v)).
We do not give the proof here; it appears in [29]. It has also been shown
that Groves mechanisms are unique among Bayes–Nash incentive-compatible ef-
ficient mechanisms, in a weaker sense. Specifically, any Bayes–Nash incentive-
compatible efficient mechanism corresponds to a Groves mechanism in the sense
that each agent makes the same ex interim expected payments and hence has the
same ex interim expected utility under both mechanisms.
Chapter 7 299
Definition 7.19 (Clarke tax) The Clarke tax sets the hi term in a Groves mecha-
nism as
hi (v̂−i ) = ∑ v̂ j (x (v̂−i )) ,
j=i
First, note that because the Clarke tax does not depend on an agent i’s own
declaration v̂i , our previous arguments that Groves mechanisms are dominant-
strategy truthful and efficient carry over immediately to the VCG mechanism.
Now, we try to provide some intuition about the VCG payment rule. Assume that
all agents follow their dominant strategies and declare their valuations truthfully.
The second sum in the VCG payment rule pays each agent i the sum of every other
agent j = i’s utility for the mechanism’s choice. The first sum charges each agent
i the sum of every other agent’s utility for the choice that would have been made
had i not participated in the mechanism. Thus, each agent is made to pay its social
cost – the aggregate impact that its participation has on other agents’ utilities.
What can we say about the amounts of different agents’ payments to the mech-
anism? If some agent i does not change the mechanism’s choice by its participa-
tion – that is, if x (v) = x (v−i ) – then the two sums in the VCG payment function
will cancel out. The social cost of i’s participation is zero, and so it has to pay
300 Chapter 7
WVUT
PQRSB
2 /WVUT
PQRSD
? 88 ??
8 ??
3 88 ??2
8 ??
888 ??
88 1
WVUT
PQRS 8 WVUT
PQRS
A 88 5 F
?? kkk ?
?? 88
kk kkk
?? 885kkkk
? kkk88
2 ??? k k
kkk 8 1
kkk
WVUT
PQRS k /WPQRS
VUT
C 3
E
Note that in this example, the numbers labeling the edges in the graph denote
agents’ costs rather than utilities; thus, an agent’s utility is −c if a route involving
its edge (having cost c) is selected, and zero otherwise. The arg max in x will
amount to cost minimization. Thus, x (v) will return the shortest path in the graph,
which is ABEF. How much will agents have to pay? First, let us consider the
agent AC. The shortest path taking its declaration into account has a length of
5 and imposes a cost of −5 on agents other than it (because it is not involved).
Likewise, the shortest path without AC’s declaration also has a length of 5. Thus,
its payment is pAC = (−5) − (−5) = 0. This is what we expect, since AC is not
pivotal. Clearly, by the same argument BD, CE, CF, and DF will all be made
to pay zero. Now let us consider the pivotal agents. The shortest path taking
AB’s declaration into account has a length of 5, and imposes a cost of 2 on other
agents. The shortest path without AB is ACEF, which has a cost of 6. Thus
pAB = (−6) − (−2) = −4: AB is paid 4 for its participation. Arguing similarly,
you can verify that pBE = (−6) − (−4) = −2, and pEF = (−7) − (−4) = −3.
Chapter 7 301
Note that although EF had the same cost as BE, they are paid different amounts
for the use of their edges. This occurs because EF has more market power: for
the other agents, the situation without EF is worse than the situation without BE.
We have seen that Groves mechanisms are dominant-strategy truthful and effi-
cient. We have also seen that no other mechanism has both of these properties
in general quasilinear settings. Thus, we might be a bit worried that we have not
been able to guarantee either individual rationality or budget balance, two proper-
ties that are quite important in practice. (Recall that individual rationality means
that no agent would prefer not to participate in the mechanism; budget balance
means that the mechanism does not lose money.) We will consider budget bal-
ance in Section 4.4; here we investigate individual rationality.
As it turns out, our worry is well founded: even with the freedom to set hi ,
we cannot find a mechanism that guarantees us individual rationality in an unre-
stricted quasilinear setting. However, we are often able to guarantee the strongest
variety of individual rationality when the setting satisfies certain mild restrictions.
Theorem 7.5 The VCG mechanism is ex post individually rational when the
choice-set monotonicity and no negative externalities properties hold.
Theorem 7.6 The VCG mechanism is weakly budget balanced when the no
single-agent effect property holds.
Indeed, we can say something more about VCG’s revenue properties: restrict-
ing ourselves to settings in which VCG is ex post individually rational as discussed
earlier, and comparing to all other efficient and ex interim individually rational
mechanisms, VCG turns out to collect the maximal amount of revenue from the
agents. This is somewhat surprising, since this result does not require dominant
strategies, and hence compares VCG to all Bayes–Nash mechanisms. A useful
corollary of this result is that VCG is as budget balanced as any efficient mecha-
nism can be: it satisfies weak budget balance in every case where any dominant
strategy, efficient, and ex interim individually rational mechanism would be able
to do so.
3. VCG is not “frugal”: prices can be many times higher than the true value of
the best allocation involving no winning agents.
Having listed these problems, however, we offer a caveat: although there exist
mechanisms that circumvent each of the drawbacks we discuss, none of the draw-
backs are unique to VCG, or even to Groves mechanisms. Indeed, in some cases
the problems are known to crop up in extremely broad classes of mechanisms.
5 Single-Good Auctions
We now consider the problem of allocating (discrete) resources among selfish
agents in a multiagent system. Auctions – an interesting and important applica-
tion of mechanism design – turn out to provide a general solution to this problem.
We describe various different flavors of auctions, including single-good and com-
binatorial auctions. In each case, we survey some of the key theoretical, practical,
and computational insights from the literature.
The auction setting is important for two reasons. First, auctions are widely
used in real life, in consumer, corporate, as well as government settings. Millions
of people use auctions daily on Internet consumer web sites to trade goods. More
complex types of auctions have been used by governments around the world to sell
important public resources such as access to electromagnetic spectrum. Indeed, all
financial markets constitute a type of auction (one of the family of so-called double
auctions). Auctions are also often used in computational settings to efficiently
allocate bandwidth and processing power to applications and users.
The second – and more fundamental – reason to care about auctions is that
they provide a general theoretical framework for understanding resource alloca-
tion problems among self-interested agents. Formally speaking, an auction is any
protocol that allows agents to indicate their interest in one or more resources and
that uses these indications of interest to determine both an allocation of resources
and a set of payments by the agents. Thus, auctions are important for a wide range
of computational settings (e.g., the sharing of computational power in a grid com-
puter on a network) that would not normally be thought of as auctions and that
might not even use money as the basis of payments.
It is important to realize that the most familiar type of auction – the ascending-
bid, English auction – is a drop in the ocean of auction types. Indeed, since
auctions are simply mechanisms for allocating goods, there is an infinite number
of auction types. In the most familiar types of auctions there is one good for
Chapter 7 305
sale, one seller, and multiple potential buyers. Each buyer has its own valuation
for the good, and each wishes to purchase it at the lowest possible price. These
auctions are called single-sided, because there are multiple agents on only one
side of the market. Our task is to design a protocol for this auction that satisfies
certain desirable global criteria. For example, we might want an auction protocol
that maximizes the expected revenue of the seller. Or, we might want an auction
that is economically efficient; that is, one that guarantees that the potential buyer
with the highest valuation ends up with the good.
Given the popularity of auctions, on the one hand, and the diversity of auction
mechanisms, on the other, it is not surprising that the literature on the topic is vast.
In this section we provide a taste for this literature, beginning by concentrating on
auctions for selling a single good. We explore richer settings later in the chapter.
successively increasing prices in a regular fashion,5 and after each call each agent
must announce whether it is still in. When an agent drops out it is irrevocable, and
it cannot re-enter the auction. The auction ends when there is exactly one agent
left in; the agent must then purchase the good for the current price.
• set of agents N,
• set of outcomes O = X × Rn ,
• choice function x that selects one of the outcomes given the agents’ actions,
and
• payment function ℘ that determines what each agent must pay given all
agents’ actions.
Proof. Assume that all bidders other than i bid in some arbitrary way, and con-
sider i’s best response. First, consider the case where i’s valuation is larger than
the highest of the other bidders’ bids. In this case i would win and would pay
the next-highest bid amount, as illustrated in Figure 7.4a. Could i be better off
by bidding dishonestly in this case? If it bid higher, it would still win and would
still pay the same amount, as illustrated in Figure 7.4b. If it bid lower, it would
either still win and still pay the same amount (Figure 7.4c) or lose and pay zero
(Figure 7.4d).6 Since i gets non-negative utility for receiving the good at a price
less than or equal to its valuation, i cannot gain, and would sometimes lose by
bidding dishonestly in this case. Now consider the other case, where i’s valuation
is less than at least one other bidder’s bid. In this case i would lose and pay zero
(Figure 7.4e). If it bid less, it would still lose and pay zero (Figure 7.4f). If it
bid more, either it would still lose and pay zero (Figure 7.4g) or it would win and
pay more than its valuation (Figure 7.4h), achieving negative utility. Thus again, i
cannot gain, and would sometimes lose by bidding dishonestly in this case.
In the IPV case, we can identify strong relationships between the second-price
auction and Japanese and English auctions. Consider first the comparison be-
tween second-price and Japanese auctions. In both cases the bidder must select
a number (in the sealed-bid case the number is the one written down, and in the
Japanese case it is the price at which the agent will drop out); the bidder with
highest amount wins, and pays the amount selected by the second-highest bid-
der. The difference between the auctions is that information about other agents’
bid amounts is disclosed in the Japanese auction. In the sealed-bid auction an
agent’s bid amount must be selected without knowing anything about the amounts
6 Figure 7.4d is oversimplified: the winner will not always pay i’s bid in this case. (Do you see
why?)
Chapter 7 309
(a) Bidding hon- (b) i bids higher (c) i bids lower (d) i bids even
estly, i has the and still wins. and still wins. lower and loses.
highest bid.
i pays
i’s true i’s true i’s true i’s true
value value value value
(e) Bidding hon- (f) i bids lower (g) i bids higher (h) i bids even
estly, i does not and still loses. and still loses. higher and wins.
have the highest
bid.
Figure 7.4: A case analysis to show that honest bidding is a dominant strategy in
a second-price auction with independent private values.
selected by others, whereas in the Japanese auction the amount can be updated
based on the prices at which lower bidders are observed to drop out. In general,
this difference can be important; however, it makes no difference in the IPV case.
Thus, Japanese auctions are also dominant-strategy truthful when agents have in-
dependent private values.
Obviously, the Japanese and English auctions are closely related. Thus, it is
not surprising to find that second-price and English auctions are also similar. One
connection can be seen through proxy bidding, a service offered on some online
auction sites such as eBay. Under proxy bidding, a bidder tells the system the
maximum amount it is willing to pay. The user can then leave the site, and the
system bids as the bidder’s proxy: every time the bidder is outbid, the system will
respond with a bid one increment higher, until the bidder’s maximum is reached.
It is easy to see that if all bidders use the proxy service and update it only once,
what occurs will be identical to a second-price auction (excepting that the winner’s
payment may be one bid increment higher).
The main complication with English auctions is that bidders can place so-
called jump bids: bids that are greater than the previous high bid by more than the
minimum increment. Although it seems relatively innocuous, this feature compli-
310 Chapter 7
In other words, the unique equilibrium of the auction occurs when each player
bids n−1
n of its valuation. This theorem can be proved using calculus, but the
proof is long and tedious. Furthermore, this proof only shows how to verify an
equilibrium strategy. How do we identify one in the first place? Although it is
also possible to do this from first principles (at least for straightforward auctions
such as first-price), we will explain a simpler technique below.
Chapter 7 311
We must now find an expression for the expected value of the second-highest
valuation, given that bidder i has the highest valuation. It is helpful to know
the formula for the kth order statistic, in this case of draws from the uniform
distribution. The kth order statistic of a distribution is a formula for the expected
value of the kth-largest of n draws. For n independent and identically distributed
draws from [0, vmax ], the kth order statistic is
n+1−k
vmax . (7.2)
n+1
If bidder i’s valuation vi is the highest, then there are n − 1 other valuations
drawn from the uniform distribution on [0, vi ]. Thus, the expected value of the
second-highest valuation is the first-order statistic of n − 1 draws from [0, vi ]. Sub-
stituting into Equation (7.2), we have (n−1)+1−(1)
(n−1)+1 (vi ) = n vi . This confirms the
n−1
equilibrium strategy from Theorem 7.10. It also gives us a suspicion (that turns
out to be correct) about the equilibrium strategy for first-price auctions under val-
uation distributions other than uniform: each bidder bids the expectation of the
second-highest valuation, conditioned on the assumption that its own valuation is
the highest.
A caveat must be given about the revenue equivalence theorem: this result
makes an “if” statement, not an “if and only if” statement. That is, while it is true
that all auctions satisfying the theorem’s conditions must yield the same expected
revenue, it is not true that all strategies yielding that expected revenue constitute
equilibria. Thus, after using the revenue equivalence theorem to identify a strat-
egy profile that one believes to be an equilibrium, one must then prove that this
strategy profile is indeed an equilibrium. This should be done in the standard way,
by assuming that all but one of the agents play according to the equilibrium and
show that the equilibrium strategy is a best response for the remaining agent.
Finally, recall that we assumed above that the first-price auction allocates the
good to the bidder with the highest valuation. The reason it was reasonable to
do this (although we could instead have proved that the auction has a symmetric,
increasing equilibrium) is that we have to check the strategy profile derived using
the revenue equivalence theorem anyway. Given the equilibrium strategy, it is easy
to confirm that the bidder with the highest valuation will indeed win the good.
6 Position Auctions
Search engines make most of their money – many billions of dollars annually –
by selling advertisements through what are called position auctions. In these
auctions, multiple different goods (keyword-specific “slots,” usually a list on the
right-hand side of a page of search results) are simultaneously offered for sale to
Chapter 7 313
interested advertisers. Slots are considered to be more valuable the closer they
are to the top of the page, because this affects their likelihood of being clicked
by a user. Advertisers place bids on keywords of interest, and every time a user
searches for a keyword on which advertisers have bid, an auction is held. The
outcome of this auction is a decision about which ads will appear on the search
results page and in which order. Advertisers are required to pay only if a user
clicks on their ad.
We now give a formal model. As before, let N be the set of bidders (advertis-
ers), and let vi be i’s (commonly known) valuation for getting a click. Let bi ∈ R+
denote i’s bid, and let b( j) denote the jth-highest bid, or 0 if there are fewer than
j bids. Let G = {1, . . . , m} denote the set of goods (slots), and let α j denote the
expected number of clicks (the click-through rate) that an ad will receive if it
is listed in the ith slot. Observe that we assume that α does not depend on the
bidder’s identity. Observe that our model treats the auction as unrepeated, and
assumes that agents know each other’s valuations. The single-shot assumption is
motivated by the fact that advertisers tend to value clicks additively (i.e., the value
derived from a given user clicking on an ad is independent of how many other
users clicked earlier), at least when advertisers do not face budget constraints.
The perfect-information assumption makes sense because search engines allow
bidders either to observe other bids or to figure them out by probing the mecha-
nism.
The generalized first-price auction was the first position auction to be used by
search engine companies.
Definition 7.26 (VCG) In the position auction setting, the VCG mechanism
awards the bidder with the jth-highest bid the jth slot. If bidder i’s ad is ranked
k= j+1 b(k) (αk−1 − αk ).
in slot j and receives a click, it pays the auctioneer α1j ∑m+1
Intuitively, the key difference between the GSP and VCG is that the former
does not charge an agent its social cost, which depends on the differences between
click-through rates that other agents would receive with and without its presence.
Indeed, truthful bidding is not always a good idea under the GSP. Consider the
same bidders as in our running example, but change the click-through rate of slot
2 to α2 = 0.4. When all bidders bid truthfully we have already shown that bidder
1 would achieve expected utility of $3 (this argument did not depend on α2 ).
However, if bidder 1 changed its bid to $3, it would be awarded the second slot
and would achieve expected utility of 0.4($10 − $2) = $3.2. Thus the GSP is not
even truthful in equilibrium, let alone in dominant strategies.
What can be said about the equilibria of the GSP? Briefly, it can be shown
that in the perfect-information setting the GSP has many equilibria. The dynamic
nature of the setting suggests that the most stable configurations will be locally
envy free: no bidder will wish that it could switch places with the bidder who won
the slot directly above its own. There exists a locally envy-free equilibrium of the
GSP that achieves exactly the VCG allocations and payments. Furthermore, all
other locally envy-free equilibria lead to higher revenues for the seller, and hence
are worse for the bidders.
What about relaxing the perfect-information assumption? Here, it is possible
to construct a generalized English auction that corresponds to the GSP, and to
show that this English auction has a unique equilibrium with various desirable
Chapter 7 315
properties. In particular, the payoffs under this equilibrium are again the same as
the VCG payoffs, and the equilibrium is ex post, meaning that it is independent of
the underlying valuation distribution.
7 Combinatorial Auctions
We now consider a broader auction setting, in which a whole variety of different
goods are available in the same market. Switching to such an auction model is
important when bidders’ valuations depend strongly on which subset of the goods
they receive. Some widely studied practical examples include governmental auc-
tions for the electromagnetic spectrum, energy auctions, corporate procurement
auctions, and auctions for paths (e.g., shipping rights, bandwidth) in a network.
More formally, let us consider a setting with a set of bidders N = {1, . . . , n}
(as before) and a set of goods G = {1, . . . , m}. Let v = (v1 , . . . , vn ) denote the true
valuation functions of the different bidders, where for each i ∈ N, vi : 2G "→ R. We
will usually be interested in settings where bidders have non-additive valuation
functions, for example valuing bundles of goods more than the sum of the values
for single goods. We identify two important kinds of non-additivity. First, when
two items are partial substitutes for each other (e.g., a Sony TV and a Toshiba TV,
or, more partially, a CD player and an MP3 player), their combined value is less
than the sum of their individual values. Strengthening this condition, when two
items are strict substitutes their combined value is the same as the value for either
one of the goods. For example, consider two non-transferable tickets for seats on
the same plane.
This is easy for the seller, but it makes things difficult for the bidders. In partic-
ular, it presents them with what is called the exposure problem: a bidder might
bid aggressively for a set of goods in the hopes of winning a bundle, but succeed
in winning only a subset of the goods and therefore pay too much. This problem
is especially likely to arise in settings where bidders’ valuations exhibit strong
complementarities, because in these cases bidders might be willing to pay sub-
stantially more for bundles of goods than they would pay if the goods were sold
separately.
The next-simplest method is to run essentially separate auctions for the dif-
ferent goods, but to connect them in certain ways. For example, one could hold
a multiround (e.g., Japanese) auction, but synchronize the rounds in the different
auctions so that as a bidder bids in one auction it has a reasonably good indica-
tion of what is transpiring in the other auctions of interest. This approach can
be made more effective through the establishment of constraints on bidding that
span all the auctions (so-called activity rules). For example, bidders might be al-
lowed to increase their aggregate bid amount by only a certain percentage from
one round to the next, thus providing a disincentive for bidders to fail to partic-
ipate in early rounds of the auction and thus improving the information transfer
between auctions. Bidders might also be subject to other constraints: for example,
a budget constraint could require that a bidder not exceed a certain total commit-
ment across all auctions. Both of these ideas can be seen in some government
auctions for electromagnetic spectrum (where the so-called simultaneous ascend-
ing auction was used) as well as in some energy auctions. Despite some successes
in practice, however, this approach has the drawback that it only mitigates the
exposure problem rather than eliminating it entirely.
A third approach ties goods together in a more straightforward way: the auc-
tioneer sells all goods in a single auction, and allows bidders to bid directly on
bundles of goods. Such mechanisms are called combinatorial auctions. This ap-
proach eliminates the exposure problem because bidders are guaranteed that their
bids will be satisfied “all or nothing.” For example a bidder may be permitted to
offer $100 for the pair (TV, DVD player), or to make a disjunctive offer “either
$100 for TV1 or $90 for TV2, but not both.” However, we will see that while
combinatorial auctions resolve the exposure problem, they raise many other ques-
tions. Indeed, these auctions have been the subject of considerable recent study in
both economics and computer science.
VCG has some attractive properties when applied to combinatorial auctions.
Specifically, it is dominant-strategy truthful, efficient, ex post individual rational,
and weakly budget balanced (the latter by Theorems 7.5 and 7.6). The VCG
combinatorial auction mechanism is not without shortcomings, however, as we
already discussed in Section 4.3.3. For example, a bidder who declares its valua-
Chapter 7 317
tion truthfully has two main reasons to worry – one is that the seller will examine
its bid before the auction clears and submit a fake bid just below, thus increasing
the amount that the agent would have to pay if it wins. (This is called a shill
bid.) Another possibility is that both its competitors and the seller will learn its
true valuation and will be able to exploit this information in a future transaction.
Indeed, these two reasons are often cited as reasons why VCG auctions are rarely
seen in practice. Other issues include the fact that VCG is vulnerable to collusion
among bidders, and, conversely, to one bidder masquerading as several differ-
ent ones (so-called pseudonymous bidding or false-name bidding). Perhaps the
biggest potential hurdle, however, is computational, and it is not specific to VCG.
Any efficient combinatorial auction protocol must solve a core problem:
given the agents’ individual declarations v̂, it must determine the allocation
of goods to agents that maximizes social welfare. That is, we must compute
maxx∈X ∑i∈N v̂i (x). In single-good auctions this was simple – we just had to sat-
isfy the agent with the highest valuation. In combinatorial auctions, determining
the winners is a more challenging computational problem.
∑ xS,i ≤ 1 ∀i ∈ N (7.5)
S⊆G
xS,i = {0, 1} ∀S ⊆ G, i ∈ N (7.6)
In this integer programming formulation, the valuations v̂i (S) are constants
and the variables are xS,i . These variables are Boolean, indicating whether bundle
S is allocated to agent i. The objective function (7.3) states that we want to max-
imize the sum of the agents’ declared valuations for the goods they are allocated.
Constraint (7.4) ensures that no overlapping bundles of goods are allocated, and
constraint (7.5) ensures that no agent receives more than one bundle. (This makes
sense since bidders explicitly assign a valuation to every subset of the goods.) Fi-
nally, constraint (7.6) is what makes this an integer program rather than a linear
program: no subset can be partially assigned to an agent.
The fact that the WDP is an integer program rather than a linear program is bad
news, since only the latter are known to admit a polynomial-time solution. Indeed,
318 Chapter 7
a reader familiar with algorithms and complexity may recognize the combinatorial
auction allocation problem as a set packing problem (SPP). Unfortunately, it is
well known that the SPP is NP-complete. This means that it is not likely that
a polynomial-time algorithm exists for the problem. Worse, it so happens that
this problem cannot even be approximated uniformly, meaning that there does
not exist a polynomial-time algorithm and a fixed constant k > 0 such that for all
inputs the algorithm returns a solution that is at least 1k s∗ , where s∗ is the value of
the optimal solution for the given input.
There are two primary approaches to getting around the computational prob-
lem. First, we can restrict ourselves to a special class of problems for which there
is guaranteed to exist a polynomial-time solution. Second, we can resort to heuris-
tic methods that give up the guarantee of polynomial running time, optimality of
solution, or both. In both cases, relaxation methods are a common approach. One
instance of the first approach is to relax the integrality constraint, thereby trans-
forming the problem into a linear program, which is solvable in polynomial time.
In general the solution results in “fractional” allocations, in which fractions of
goods are allocated to different bidders. If we are lucky, however, our solution to
the LP will just happen to be integral; the broadest case when such luck is assured
arises when the integer program’s constraint matrix is totally unimodular.
8 Conclusions
Mechanism design studies the design of protocols that achieve desired objectives
even in the presence of self-interested agents. We say that a social choice func-
tion is implementable if it can be achieved in the equilibrium of some mechanism.
The revelation principle demonstrates that we can restrict our attention to direct
and truthful mechanisms without changing the set of implementable social choice
functions. However, few social choice functions are implementable when agents
are allowed general preferences. We thus consider the case of quasilinear prefer-
ences, in which agents’ preferences for money are additively separable from their
preferences for the choice made by a mechanism. VCG is a particularly impor-
tant mechanism for the quasilinear setting. It guarantees efficiency and dominant
strategy truthfulness, and under additional assumptions also achieves weak budget
balance and individual rationality.
Auctions are mechanisms for the allocation of scarce resources among a set
of selfish agents. Various canonical auctions exist for the single-good setting.
Second-price auctions offer dominant strategies (and, indeed, are a special case of
VCG for this setting), while first-price auctions offer only Bayes–Nash equilibria.
However, both auction types achieve the same revenue for the seller in equilib-
rium, under standard assumptions about agents’ valuations. Auctions can also be
Chapter 7 319
used in more complex settings. Two important examples are position auctions,
which are used to sell advertisements alongside search results on the Internet,
and combinatorial auctions, which sell multiple, heterogeneous goods in the same
auction, and allow bidders to specify valuations for arbitrary bundles of goods.
Mechanism design and auctions are covered to varying degrees in modern
game theory textbooks, but even better are the microeconomic textbook of [16]
and the excellent formal introduction to auction theory by [14]. More techni-
cal overviews from a computer science perspective are given in the introductory
chapters of [25], in [24], and in our own textbook [29], on which this chapter is
based. [12] is a large edited collection of many of the most important papers on
the theory of auctions, preceded by a thorough survey by the editor; this survey is
reproduced in [11]. Earlier surveys include [1], [33], and [17]. These texts cover
most of the canonical single-good auction types we discuss in the chapter. Spe-
cific publications that underlie some of the results covered in this chapter are as
follows.
The foundational idea of mechanisms as communication systems that select
outcomes based on messages from agents is due to [8], who also elaborated the
theory to include the idea that mechanisms should be “incentive compatible” [9].
The revelation principle was first articulated by [5] and was developed in the great-
est generality by Myerson [19, 20, 21]. In 2007, Hurwicz and Myerson shared a
Nobel Prize (along with Maskin, whose work we do not discuss in this chapter),
“for having laid the foundations of mechanism design theory.” Theorem 7.2 is due
to both Satterthwaite and Gibbard, in two separate publications [5, 28]. The VCG
mechanism was anticipated by [31], who outlined an extension of the second-price
auction to multiple identical goods. [7] explicitly considered the general family
of truthful mechanisms applying to multiple distinct goods (though the result had
appeared already in his 1969 Ph.D. dissertation). [2] proposed his tax for use with
public goods (i.e., goods such as roads and national defense, which are paid for
by all regardless of personal use). Theorem 7.4 is due to [6]; Theorem 7.7 is due
to that paper as well as to the earlier [10]. The fact that Groves mechanisms are
payoff equivalent to all other Bayes–Nash incentive-compatible efficient mecha-
nisms was shown by [13] and [32]; the former reference [13] also gave the results
that VCG is ex interim individually rational and that VCG collects the maximal
amount of revenue among all ex interim individually rational Groves mechanisms.
The Myerson–Satterthwaite theorem (7.8) appears in [22].
Vickrey’s seminal contribution [31] is still recommended reading for anyone
interested in auctions. In it Vickrey introduced the second-price auction and ar-
gued that bidders in such an auction do best when they bid sincerely. He also pro-
vided the analysis of the first-price auction under the independent private value
model with the uniform distribution described in this chapter. He even proved
320 Chapter 7
9 Exercises
1. Level 2 Consider a potentially infinite outcome space O ⊂ [0, 1], and a finite
set N of n agents. Denote the utility of an agent with type θi for outcome
o as ui (o, θi ). Constrain the utility functions so that every agent has some
unique, most-preferred outcome b(θi ) ∈ O, and so that |o − b(θi )| < |o −
b(θi )| implies that ui (o , θi ) > ui (o , θi ). Consider a direct mechanism that
asks every agent to declare its most-preferred outcome and then selects the
median outcome. (If there are an even number of agents, the mechanism
chooses the larger of the two middle outcomes.)
of the coin coming up heads. The designer could just offer to pay the psy-
chic a flat fee, but then the psychic’s utility would be the same, regardless of
the probability that the psychic reports, p̂. In order to induce the psychic to
reveal the true p, the mechanism designer commits to the following mech-
anism: if the coin comes up heads, then the psychic will be paid c + log2 p̂,
and if the coin comes up tails, then the psychic will be paid c + log2 (1 − p̂).
(a) Prove that the psychic’s utility is maximized by revealing the true p.
(b) Now consider a scenario where p, again known by the psychic, repre-
sents the probability that a buyer (agent 2) has a high valuation (200
rather than 100). Because p represents the psychic’s private informa-
tion, we can understand it as the psychic’s type. The seller knows that
the joint type distribution is as follows:
4. Level 2 Suppose you have some object that each of n agents desires, but
which you do not value. Assume that each agent i values it at vi , with vi ’s
drawn independently and uniformly from some positive real line interval,
say [0, 10100 ]. Although you do not desire the object and
√ also do not care
about the actual values of the vi ’s, you need to compute vi for each i.
Unfortunately, you face two problems. First, agents are not inclined to just
reveal to you anything about their vi ’s. Second, your computer is costly to
operate. It costs you 1 unit to determine the greater of two values, 2 units
to perform any basic √ arithmetic operation (+, −, ×, /), and anything more
complicated (such as x) costs 20 units. The (accurate) current time of day
can be observed without cost.
√
(a) How much would it cost to compute vi for each i using a straight-
forward VCG mechanism? (When computing cost, ignore the revenue
that the auction will generate.) Hint: this part is very easy.
(b) Your answer above gives an upper bound on the cost of computing
the square roots of the valuations. Design an incentive-compatible,
dominant-strategy (“strategyproof”)
√ direct mechanism that will allow
you to compute all vi at minimal cost. Assume that agents can do
computations for free.
Chapter 7 323
(c) In the previous part you were restricted to direct mechanisms. Show
that an indirect mechanism can achieve even lower cost.
5. Level 1 Consider a first-price auction with two bidders. Assume that they
have IPV valuations drawn uniformly from the interval [0, 10], and that they
are risk-neutral. We saw that s1 (v1 ) = 12 v1 and s2 (v2 ) = 12 v2 together form
a Bayes–Nash equilibrium for this game.
(a) Assuming that bidder 2 is instead using the bidding strategy s2 (v2 ) =
v2 (i.e., bidder 2 bids bidder 1’s valuation), what is the best response
bidding strategy s1 (v1 ) for bidder 1?
(b) Now consider instead a second-price auction. However, suppose the
mechanism has a buggy implementation of max: most of the time the
mechanism works correctly, but with some probability p that is strictly
less than 1/3, it awards the object to the second-highest bidder (instead
of the highest bidder). In all cases it correctly calculates price as the
second highest bid. Assuming that bidder 2 is still bidding truthfully,
compute the best-response strategy for bidder 1. Is it still truthful?
(a) Model this problem as an auction. State all the relevant assumptions
that you make in building this model, and explain why they are rea-
sonable.
(b) Find a symmetric Bayes–Nash equilibrium of this game. You may as-
sume that for this game there exists a symmetric, pure-strategy equilib-
rium for which the bid amount is a monotonically increasing function
of the agent’s valuation.
if there are numbers vi and di such that the bidder values di or more units of
the good at vi , and any smaller number of units at 0. Say that i has known
simple multiunit preferences if di is common knowledge. Consider mech-
anisms running in the multiunit setting where each agent has quasilinear,
known simple multiunit preferences. In this setting, a direct mechanism
is one where each agent i declares a valuation v̂i for its bundle size, and
the mechanism chooses an allocation of its k units among the agents and
payments that each agent must make.
An allocation rule is monotone if for any agent i and any profile v̂−i of
declarations of the other agents, there exists a critical value κi (v̂−i ) ∈ (R ∪
{∞}) such that
(1) any declaration v̂i > κi (v̂−i ) is a winning declaration (i.e., causes i to
be allocated its desired number of units), and
(2) any declaration v̂i < κi (v̂−i ) is a losing declaration (i.e., causes i not to
be allocated its desired bundle).
(a) Let v1s denote the first-highest bid for slot s. (Similarly define v1t ,
v1b , v2s , v2t and v2b ). Express the VCG revenue as a function of these
values. Assume that the values are zero if there is no corresponding
bidder. Assume that the auctioneer breaks ties in favor of bidders who
want both slots.
(b) Demonstrate (with a specific example) that VCG’s revenue can de-
crease as these quantities increase.
(c) Consider a setting where one of the agents is capable of creating a
second identity (at some very small cost α), and submitting bids using
both identities. (If it does so it gets every good won by either identity,
and must pay for both.) Demonstrate that VCG is no longer dominant-
strategy truthful in this case.
(d) Consider the following auction mechanism: allocate both goods to
whichever advertiser has submitted the highest bid, and charge him or
her the amount of the second highest bid (ignoring the fi element of
their types). Prove that this mechanism has the following properties:
i. the agents have a dominant strategy of reporting truthfully, using
only their true identities, and
ii. in equilibrium, the mechanism generates at least half as much rev-
enue as VCG would if the agents submitted truthful reports.
9. Level 3 Using data from an auction website (e.g., eBay), estimate bidders’
valuation distributions, using (e.g.) kernel density estimation. Do this for
several dissimilar kinds of goods. How different are the maximum like-
lihood estimates of the valuation distributions across these auctions? How
much social welfare would be lost if bidders played the equilibrium strategy
from one auction setting in another?
10. Level 4 Mechanism design and auction theory are based on a perfect-
rationality model of agent behavior. Investigate how the theory is impacted
by a more realistic model of behavior. (E.g., revenue equivalence holds
only under a set of given assumptions. Consider which auction design a
seller should choose if agents are not perfectly rational.)
References
[1] R. Cassady. Auctions and Auctioneering. University of California Press, 1967.
[2] E.H. Clarke. Multipart pricing of public goods. Public Choice, 11:17–33, 1971.
326 Chapter 7
[3] P.C. Cramton, Y. Shoham, and R. Steinberg, editors. Combinatorial Auctions. MIT
Press, Cambridge, MA, 2006.
[4] B. Edelman, M. Schwarz, and M. Ostrovsky. Internet advertising and the general-
ized second price auction: Selling billions of dollars worth of keywords. American
Economic Review, 97(1):242–259, March 2007.
[6] J. Green and J.J. Laffont. Characterization of satisfactory mechanisms for the reve-
lation of preferences for public goods. Econometrica, 45(2):427–438, 1977.
[10] L. Hurwicz. On the existence of allocation systems whose manipulative Nash equi-
libria are Pareto optimal. Unpublished, 1975.
[11] Paul Klemperer. Auction theory: A guide to the literature. Journal of Economic
Surveys, 13(3):227–286, July 1999.
[12] Paul Klemperer, editor. The Economic Theory of Auctions. Edward Elgar, 1999.
[13] V. Krishna and M. Perry. Efficient Mechanism Design. Technical report, Pennsyl-
vania State University, 1998.
[14] Vijay Krishna. Auction Theory. Elsevier Science, New York, 2002.
[17] R. McAfee and J. MacMillan. Auctions and bidding. Journal of Economic Litera-
ture, 25(3):699–738, 1987.
[18] R. Müller. Tractable cases of the winner determination problem. In Cramton et al.
[3], chapter 13, pages 319–336.
Chapter 7 327
[22] R. Myerson and M. Satterthwaite. Efficient mechanisms for bilateral trading. Jour-
nal of Economic Theory, 29(2):265–281, 1983.
[26] J.G. Riley and W.F. Samuelson. Optimal auctions. The American Economic Review,
71(3):381–392, 1981.
[28] M.A. Satterthwaite. Strategy-proofness and Arrow’s conditions: Existence and cor-
respondence theorems for voting procedures and social welfare functions. Journal
of Economic Theory, 10:187–217, 1975.
[29] Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-
Theoretic, and Logical Foundations. Cambridge University Press, New York, 2009.
[33] Robert Wilson. Auction theory. In J. Eatwell, M. Milgate, and P. Newman, editors,
The New Palgrave Dictionary of Economics, volume I. Macmillan, London, 1987.
Chapter 8
1 Introduction
In many multiagent systems, agents can improve their performance by forming
coalitions, i.e., pooling their efforts and resources so as to achieve the tasks at
hand in a more efficient way. This holds both for cooperative agents, i.e., agents
who share a common set of goals, and for selfish agents who only care about their
own payoffs. For cooperative agents, to find the optimal collaboration pattern,
it suffices to identify the best way of splitting agents into teams. In contrast,
when the agents are selfish, we also have to specify how to distribute the gains
from cooperation, since each agent needs to be incentivized to participate in the
proposed solution.
In this chapter, we discuss coalition formation in multiagent systems for both
selfish and cooperative agents. To deal with selfish agents, we introduce classic
solution concepts of coalitional game theory that capture the notions of stability
and fairness in coalition formation settings. We then give an overview of existing
representation formalisms for coalitional games. For each such formalism, we
discuss the complexity of computing the solution concepts defined earlier in the
chapter, focusing on algorithms whose running time is polynomial in the number
of agents n. In the second half of the chapter, we focus on practical approaches for
finding an optimal partition of agents into teams. We present the state-of-the-art
algorithms for this problem, and compare their relative strengths and weaknesses.
∗ The first two authors have contributed equally to the chapter.
330 Chapter 8
Example 8.1 If several researchers from different universities write a joint paper,
each researcher receives a payoff from its own university: the paper can count
toward promotion or tenure, receive an internal prize, or, sometimes, be rewarded
with a monetary bonus. However, these payoffs are allocated to individual re-
searchers, and, with the exception of a bonus payment, cannot be transferred from
one researcher to another.
The settings similar to the one in Example 8.1 are modeled by assuming that each
coalitional action corresponds to a vector of payoffs – one for each member of
the coalition. Games represented in this manner are known as games with non-
transferable utility, or NTU games.
It is important to note that in NTU settings two coalitional actions may be
incomparable. For instance, consider the 2-player coalition {a1 , a2 } that chooses
between actions x and y. Suppose that whenever the players choose x, player
a1 gets a payoff of 5, whereas player a2 gets a payoff of 1; on the other hand,
if players choose y, player a1 gets 2 and player a2 gets 7. Obviously, player
a1 prefers x to y, even though action y has a higher total utility, whereas player
a2 prefers y to x. In contrast, in TU games, all players prefer the action(s) that
result(s) in the highest sum of payoffs, as they can distribute the total payoff so
1 Recently, games with overlapping coalitions have also been considered; see, e.g. [12].
Chapter 8 331
that everyone is better off. This intracoalitional competition makes NTU games
more difficult to analyze, which may explain why TU games received much more
attention in the multiagent literature. We will follow this trend, and for the rest of
the chapter focus on TU games only.
Now, in each of the examples considered so far, the payoffs that each coalition
could attain were determined by the identities and actions of the coalition mem-
bers. However, there are cases where a coalition’s productivity also depends on
the coalition structure that it is a part of, i.e., it may be influenced by the actions of
non-members. This is the case, for instance, in market-like environments, where
each coalition provides a service, and the payment it can charge for its service de-
pends on the competition it faces. While this phenomenon can be observed both
in TU and in NTU settings, traditionally, it has been studied in the transferable
utility model only. Transferable utility games where the value of each coalition
may depend on the coalition structure it appears in are known as partition func-
tion games [37]. On the other hand, games where the value of each coalition is
the same in every coalition structure are known as characteristic function games.
Clearly, characteristic function games form a proper subclass of partition function
games, and tend to be much easier to work with. Thus, from now on, we will
further restrict our attention to characteristic function games.
2 Definitions
In this section, we will formally define characteristic function games as well as
several important subclasses of these games.
Definition 8.1 A characteristic function game G is given by a pair (A, v), where
A = {a1 , . . . , an } is a finite set of players, or agents, and v : 2A → R is a character-
istic function, which maps each subset, or coalition, of agents C to a real number
v(C). This number is referred to as the value of the coalition C.
We remark that we can represent a characteristic function game by explicitly
listing all coalitions together with their values; the size of this naive representation
is exponential in n. However, in practice we are usually interested in games that
admit a succinct representation and can be analyzed in time polynomial in n. A
number of such representations have been considered in the literature; we will
discuss some of them in Section 4.
We will now present two examples of characteristic function games.
Example 8.2 Charlie (C), Marcie (M), and Pattie (P) want to pool their savings
to buy ice cream. Charlie has c dollars, Marcie has m dollars, Pattie has p dollars,
and the ice cream packs come in three different sizes: (1) 500g which costs $7,
332 Chapter 8
(2) 750g which costs $9, and (3) 1000g which costs $11. The children value
ice cream, and assign no utility to money. Thus, the value of each coalition is
determined by how much ice cream it can buy.
This situation corresponds to a characteristic function game with the set of
players A = {C, M, P}. For c = 3, m = 4, p = 5, its characteristic function v is
/ = 0, v({C}) = v({M}) = v({P}) = 0, v({C, M}) = v({C, P}) = 500,
given by v(0)
v({M, P}) = 750, v({C, M, P}) = 1000. For c = 8, m = 8, p = 1, its characteristic
/ = 0, v({C}) = v({M}) = 500, v({P}) = 0, v({C, P}) =
function v is given by v(0)
v({M, P}) = 750, v({C, M}) = 1250, v({C, M, P}) = 1250.
It is usually assumed that the value of the empty coalition 0/ is 0, i.e., v(0)
/ = 0.
Moreover, it is often the case that the value of each coalition is non-negative (i.e.,
agents form coalitions to make a profit), or else that the value of each coalition is
non-positive (i.e., agents form coalitions to share costs). Throughout this chapter,
we will mostly focus on the former scenario, i.e., we assume that v(C) ≥ 0 for all
C ⊆ A. However, all our definitions and results can be easily adapted to the latter
scenario.
2.1 Outcomes
An outcome of a characteristic function game consists of two parts: (1) a partition
of players into coalitions, and (2) a payoff vector, which distributes the value of
each coalition among its members.
Formally, a coalition structure over A is a collection of non-empty coalitions
CS = {C1 , . . . ,C|CS| } such that
Chapter 8 333
|CS|
• j=1 C j = A, and
Since we have assumed that the value of each coalition is non-negative, su-
peradditivity implies monotonicity: if a game G = (A, v) is superadditive, and
C ⊆ C , then v(C ) ≤ v(C ) − v(C \ C ) ≤ v(C ). However, the converse is not
necessarily true: consider, for instance, a game where the value of the character-
istic function grows logarithmically with the coalition size, i.e., v(C ) = log |C |.
In superadditive games, there is no compelling reason for agents to form a
coalition structure consisting of multiple coalitions: the agents can earn at least
as much profit by forming the grand coalition, i.e., the coalition that contains all
agents. Therefore, for superadditive games it is usually assumed that the agents
form the grand coalition, i.e., the outcome of a superadditive game is of the form
({A}, x) where x satisfies ∑ni=1 xi = v(A). Conventionally, {A} is omitted from
the notation, i.e., an outcome of a superadditive game is identified with a payoff
vector for the grand coalition.
Proof. For the “only if” direction, assume that G = (A, v) is convex, and consider
two coalitions C ,C such that C ⊂ C ⊂ A and a player ai ∈ A \ C . By setting
X = C , Y = C ∪ {ai }, we obtain
v(C ∪ {ai }) − v(C ) = v(X ∪ y) − v(X) ≥ v(Y ) − v(X ∩Y ) = v(C ∪ {ai }) − v(C ),
v(C ) − v(C ∩ C ) = v(C ) + v(C ) (here we use our assumption that v(0)/ = 0).
To see that the converse is not always true, consider a game G = (A, v), where
A = {a1 , a2 , a3 }, and v(C) = 1 if |C| ≥ 2 and v(C) = 0 otherwise. It is easy to
check that this game is superadditive. On the other hand, for C = {a1 , a2 } and
C = {a2 , a3 }, we have v(C ) = v(C ) = 1, v(C ∪C ) = 1, v(C ∩C ) = 0.
3 Solution Concepts
Any partition of agents into coalitions and any payoff vector that respects this par-
tition correspond to an outcome of a characteristic function game. However, not
all outcomes are equally desirable. For instance, if all agents contribute equally
to the value of a coalition, a payoff vector that allocates the entire payoff to one
of the agents is less appealing than the one that shares the profits equally among
all agents. Similarly, an outcome that incentivizes all agents to work together is
preferable to an outcome that some of the agents want to deviate from.
More broadly, one can evaluate the outcomes according to two sets of criteria:
(1) fairness, i.e., how well each agent’s payoff reflects its contribution, and (2) sta-
bility, i.e., what the incentives are for the agents to stay in the coalition structure.
These two sets of criteria give rise to two families of payoff division schemes, or
solution concepts. We will now discuss each of them in turn.
usually defined for superadditive games. As argued above, for such games an
outcome can be identified with a payoff vector for the grand coalition, i.e., the
Shapley value prescribes how to share the value of the grand coalition in a fair
way.
To present the formal definition of the Shapley value, we need some additional
notation. Given a characteristic function game G = (A, v), let ΠA denote the set of
all permutations of A, i.e., one-to-one mappings from A to itself. Given a permuta-
tion π ∈ ΠA , we denote by Cπ (ai ) the coalition that consists of all predecessors of
ai in π, i.e., we set Cπ (ai ) = {a j ∈ A | π(a j ) < π(ai )}. The marginal contribution
of an agent ai with respect to a permutation π in a game G = (A, v) is denoted by
ΔGπ (ai ) and is given by
this quantity measures by how much ai increases the value of the coalition con-
sisting of its predecessors in π when it joins them. Informally, the Shapley value
of a player ai is its average marginal contribution, where the average is taken over
all permutations of A. More formally, we have the following definition.
Definition 8.6 Given a characteristic function game G = (A, v), the Shapley value
of a player ai ∈ A is denoted by ϕi (G) and is given by
1
ϕi (G) = ∑ ΔG
n! π∈ΠA π
(ai ).
The Shapley value has many attractive properties. In what follows, we list four
of them; the proofs of Propositions 8.2–8.5 are left as an exercise for the reader.
First, the Shapley value is efficient, i.e., it distributes the value of the grand
coalition among all agents.
Proposition 8.2 For any characteristic function game G = (A, v), we have
∑ni=1 ϕi (G) = v(A).
Second, the Shapley value does not allocate any payoffs to players who do
not contribute to any coalition. Formally, given a characteristic function game
G = (A, v), a player ai ∈ A is said to be a dummy if v(C) = v(C ∪ {ai }) for every
C ⊆ A. It is not hard to see that the Shapley value of a dummy player is 0.
say that players ai and a j are symmetric in G if v(C ∪{ai }) = v(C ∪{a j }) for every
coalition C ⊆ A \ {ai , a j }. It turns out that symmetric players have equal Shapley
values.
Definition 8.7 Given a characteristic function game G = (A, v), the Banzhaf in-
dex of a player i ∈ A is denoted by βi (G) and is given by
1
βi (G) =
2n−1 ∑ [v(C ∪ {ai }) − v(C)].
C⊆A\{ai }
It is not hard to verify that the Banzhaf index satisfies properties (2), (3), and (4)
in the list above. However, it does not satisfy property (1), i.e., efficiency.
Example 8.4 Consider a characteristic function game G = (A, v), where v(A) = 1
and v(C) = 0 for every C ⊂ A. We have ϕi (G) = 1n , βi (G) = 2n−1
1
for each ai ∈ A.
Since efficiency is a very desirable property of a payoff distribution scheme, some
researchers also consider the normalized Banzhaf index ηi (G), which is defined
as
βi (G)
ηi (G) = .
∑i∈A βi (G)
While this version of the Banzhaf index satisfies efficiency, it loses the additivity
property.
3.3 Core
We have introduced two solution concepts that attempt to measure the agents’
marginal contribution. In contrast, the solution concepts considered in this and
subsequent sections are defined in terms of coalitional stability.
Consider a characteristic function game G = (A, v) and an outcome (CS, x) of
this game. Let x(C) denote the total payoff of a coalition C under a payoff vector x,
i.e., x(C) = ∑i:ai ∈C xi . Now, if x(C) < v(C), then the agents in C have an incentive
to deviate since they could do better by abandoning CS and forming a coalition of
their own. For example, if the agents were to share the extra profit equally among
themselves, every agent ai ∈ C would receive a payoff of xi + v(C)−x(C)
|C| instead of
xi . An outcome where no subset of agents has an incentive to deviate is called
stable, and the set of all such outcomes is called the core of G [29].
Definition 8.8 The core of a characteristic function game G = (A, v) is the set of
all outcomes (CS, x) such that x(C) ≥ v(C) for any C ⊆ A.
In a superadditive game, the outcomes are payoff vectors for the grand coali-
tion, so for such games the core can be defined as the set of all vectors x that
satisfy: (1) xi ≥ 0 for all ai ∈ A, (2) x(A) = v(A), and (3) x(C) ≥ v(C) for all
C ⊆ A.
The outcomes in the core are stable and therefore they are more likely to arise
when a coalitional game is played. However, some games have empty cores.
Chapter 8 339
Example 8.5 Consider the game G = (A, v), where A = {a1 , a2 , a3 }, v(C) = 1
if |C| ≥ 2 and v(C) = 0 otherwise. We claim that this game has an empty core.
Indeed, suppose that the core of G is non-empty. Since G is superadditive, its core
contains a vector x = (x1 , x2 , x3 ), where x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, and x1 + x2 +
x3 = 1. The latter constraint implies that xi ≥ 13 for some ai ∈ A. But then for
C = A \ {ai } we have v(C) = 1, x(C) ≤ 2/3, which means that (x1 , x2 , x3 ) is not in
the core. This contradiction shows that the core of G is empty.
Observe that the set of all outcomes in the core of a superadditive game can
be characterized by the following linear feasibility program (LFP):
xi ≥ 0 for each ai ∈ A
∑ xi = v(A) (8.1)
i:ai ∈A
Theorem 8.1 A simple game G = (A, v) has a non-empty core if and only if it has
a veto player. Moreover, an outcome (x1 , . . . , xn ) is in the core of G if and only if
xi = 0 for any player ai who is not a veto player in G.
for the sake of contradiction that we have v(C) > x(C) for some coalition C =
{ai1 , . . . .ais }. We can assume without loss of generality that π(ai1 ) ≤ · · · ≤ π(ais ),
i.e., the members of C appear in π ordered as ai1 , . . . , ais . We can write v(C) as
By adding up these inequalities, we obtain v(C) ≤ x(C), i.e., coalition C does not
have an incentive to deviate, which is a contradiction.
Observe that the construction used in the proof of Theorem 8.2 immediately im-
plies that in a convex game the Shapley value is in the core: indeed, the Shapley
value is a convex combination of outcomes constructed in the proof of Theo-
rem 8.2, and the core can be shown to be a convex set. However, Theorem 8.2
does not, in general, enable us to check whether a given outcome of a convex
game is in the core of that game.
The least core of G is its ε∗ (G)-core. The quantity ε∗ (G) is called the value of the
least core of G.
To see that the least core is always non-empty, observe that we can modify the
linear feasibility program (8.1) so as to obtain a linear program for the value of
the least core as well as a payoff vector in the least core. Specifically, we have
vectors in the least core, we can pick the ones that minimize the second highest
deficit d2 = max{v(C) − x(C) | C ⊆ A, v(C) − x(C) < d1 }, and remove all other
payoff vectors. We can continue this procedure until the set of the surviving pay-
off vectors stabilizes. The resulting set can be shown to consist of a single payoff
vector: this payoff vector is known as the pre-nucleolus. If, at each step, we only
consider imputations (rather than arbitrary payoff vectors), we obtain the nucle-
olus. The nucleolus is an attractive solution concept, as it arguably identifies the
most stable outcome of a game. However, its formal definition involves an expo-
nentially long vector, and therefore the nucleolus is not easy to compute from the
first principles. However, some classes of games defined on combinatorial struc-
tures (see Section 4) admit efficient algorithms for computing the nucleolus: see,
e.g., [19, 26, 36].
The kernel [17] consists of all outcomes where no player can credibly demand
a fraction of another player’s payoff. Formally, for any player ai we define its
surplus over the player a j with respect to a payoff vector x as the quantity
suri, j (x) = max{v(C) − x(C) | C ⊆ A, ai ∈ C, a j ∈ C}.
Intuitively, this is the amount that ai can earn without the cooperation of a j , by
asking a set C \ {ai } to join it in a deviation, and paying each player in C \ {ai }
what it used to be paid under x. Now, if suri, j (x) > sur j,i (x), player ai should
be able to demand a fraction of player a j ’s payoff – unless player a j already
receives the smallest payment that satisfies the individual rationality condition,
i.e., v({a j }). Following this intuition, we say that an imputation x is in the
kernel of a superadditive game G if for any pair of players (ai , a j ) we have ei-
ther: (1) suri, j (x) = sur j,i (x), or (2) suri, j (x) > sur j,i (x) and x j = v({a j }), or (3)
suri, j (x) < sur j,i (x) and xi = v({ai }).
The bargaining set [38] is defined similarly to the core. However, in contrast
to the definition of the core, we only take into account coalitional deviations that
are themselves stable, i.e., do not admit a counterdeviation. Consequently, the
bargaining set contains the core, and the containment is sometimes strict. In fact,
the bargaining set can be shown to contain the least core [22], which implies that
the bargaining set is guaranteed to be non-empty.
4 Representation Formalisms
It would be desirable to have a representation language that allows us to encode
all coalitional games so that the description size of each game is polynomial in
the number of agents n. However, a simple counting argument shows that no
representation formalism can encode each coalitional game using poly(n) bits;
this is true even if we restrict ourselves to simple games. Therefore, one needs
344 Chapter 8
Definition 8.11 A weighted voting game G is given by a triple (A, w, q), where A
is the set of players, |A| = n, w = (w1 , . . . , wn ) ∈ Rn is a vector of weights, and
q ∈ R is a quota. The characteristic function v of a game G = (A, w, q) is given by
v(C) = 1 if ∑i:ai ∈C wi ≥ q and v(C) = 0 otherwise.
It is usually assumed that all weights and the quota are integers given in binary; it
can be shown that this assumption can be made without loss of generality. Further,
most of the work on weighted voting games assumes that all weights are non-
negative; observe that in this case weighted voting games are simple games.
Weighted voting games are used to model decision making in voting bodies;
for instance, the game described in Example 8.3 is a weighted voting game with
quota q = 51 and weights 40, 22, 30, 9 for players L, M, C, G, respectively.
Indeed, the Shapley value and the Banzhaf index in such games are often viewed
as measures of a party’s voting power in a parliament and have therefore received
significant attention from political scientists. In such settings it is usually assumed
that the quota q is at least half of the players’ total weight; however, in general
task execution scenarios the quota q can take any value between 0 and ∑ni=1 wi .
It is important to note that a player’s power in a weighted voting game is not
necessarily proportional to its weight. Indeed, in Example 8.3, the Liberal party
and the Moderate party have the same Shapley value (namely, 1/3), even though
their weights differ by almost a factor of 2. Moreover, the Green party is a dummy
and thus its Shapley value is 0, even though it has a non-zero weight. Observe also
Chapter 8 345
that if we changed the quota to, say, q = 60, the balance of power would change:
for instance, we would have v({M,C}) = 0, but v({M,C, G}) = 1, so G would no
longer be a dummy.
Specifically, Elkind et al. [23] showed that that former problem is NP-hard, while
the latter problem is coNP-complete. On the positive side, they also showed that if
all weights are polynomially bounded, one can check in polynomial time whether
an outcome is in the core. It is currently open whether a similar easiness result
holds for the problem of checking the non-emptiness of the core; although it is
conjectured that this problem remains hard even for small weights [23].
Elkind et al. analyze the complexity of computing the value of the least core
and the nucleolus [24, 26]. Again, a familiar picture emerges: both problems
are hard when weights are given in binary, but easy when weights are given in
unary. Moreover, even for large weights, the value of the least core admits a fully
polynomial-time approximation scheme (FPTAS), i.e., an algorithm that, given a
weighted voting game G = (A, w, q) and a parameter δ, outputs a value ε that
satisfies ε ≤ ε ≤ (1 + δ)ε, where ε is the true value of the least core of G, and runs
in time that is polynomial in the number of players n, the maximum weight, and
1/δ.
Vector weighted voting games are widely used in practice: for instance, the
European Union decision-making system is a 27-player 3-weighted voting game,
where the three component games correspond to the commissioners, countries,
and population [9].
From the computational perspective, vector weighted voting games are similar
to the ordinary weighted voting games if k is bounded by a constant, but become
harder to deal with if k is viewed as part of the input: for instance, Elkind et
al. [27] show that deciding whether a player is a dummy in a k-weighted voting
game is coNP-complete even if all weights are in {0, 1} (recall that, in contrast,
for weighted voting games this problem is easy as long as all weights are polyno-
mially bounded).
Now, we have seen that vector weighted voting games are more expressive
than weighted voting games; but are they fully expressive? We will now show
that the answer is “yes,” i.e., any simple game can be represented as a k-weighted
voting game for a suitable value of k; this holds even if all weights are required to
be in {0, 1}.
Theorem 8.3 Any simple game G = (A, v) with |A| = n can be represented as
a k-weighted voting game G1 ∧ . . . ∧ Gk , where k ≤ 2n and all weights in each
component game are either 0 or 1.
j j
where q j = 1 and wi = 1 if ai ∈ C j , wi = 0 if ai ∈ C j . Observe that a coalition C
is a winning coalition in G j if and only if it contains some agent ai ∈ A \C j .
We claim that G is equivalent to G = G1 ∧ . . . ∧ Gk . Indeed, if C ⊆ A is a
losing coalition in G, then C = C j for some j = 1, . . . , k, and therefore C loses in
the corresponding component game and hence in G . On the other hand, if C ⊆ A
is a winning coalition in G, then, by monotonicity, C is not contained in any losing
coalition, i.e., for any coalition C j in our list we have C \ C j = 0/ and hence C is
a winning coalition in C j . Since this holds for any j = 1, . . . , k, C is a winning
coalition in G . To complete the proof, it remains to observe that k ≤ 2n .
348 Chapter 8
In induced subgraph games [21], players are vertices of a weighted graph, and
the value of a coalition is the total weight of its internal edges. It can be checked
that if all weights are non-negative, this game is convex and therefore has a non-
empty core. However, if we allow negative weights, the core may be empty, and,
moreover, checking whether an outcome is in the core becomes coNP-complete.
In contrast, the Shapley value in this game is easy to compute even if the weights
can be negative: the Shapley value of a vertex x is half of the total weight of the
edges that are incident to x.
In network flow games [33, 34], the players are edges of a network with a source
and a sink. Each edge has a positive integer capacity, indicating how much flow it
can carry. The value of a coalition C is the maximum amount of flow that can be
sent from the source to the sink using the edges in C only. Various stability-related
Chapter 8 349
solution concepts for this class of games were studied in [31] and subsequently
in [19].
One can also consider a variant of network flow games where the value of a
coalition is 1 if it can carry at least k units of flow from the source to the sink, and
0 otherwise. Such games are called threshold network flow games, and have been
studied in [6] and subsequently in [2].
v(C) = ∑C ϑr .
r∈R
Example 8.6 The MC-net that consists of the rules R = {b1 ∧ b2 → 5, b2 → 2},
corresponds to a coalitional game G = (A, v), where A = {a1 , a2 }, v({a1 }) = 0,
v({a2 }) = 2, v({a1 , a2 }) = 7.
An MC-net is said to be basic if the left-hand side of any rule is a conjunction
of literals, i.e., variables and their negations. In this case, we can write a rule
r ∈ R as (Pr , Nr ) → ϑr , where Pr and Nr are the sets of agents that correspond to
positive and negative literals in Br , respectively. Thus, r is applicable to coalition
350 Chapter 8
C if C contains every agent in Pr and none of the agents in Nr . It is not hard to see
that any coalitional game G = (A, v) with |A| = n can be represented by a basic
MC-net with 2n − 1 rules: for each non-empty coalition C ⊆ A we create a rule
(∧i:ai ∈C bi ) (∧i:ai ∈C ¬bi ) → v(C).
v(C) ≥ max
CS∈PC \{C}
∑
v(C ). (8.3)
C ∈CS
Now, if the inequality (8.3) holds with equality, then there is no need to store the
value of C as it can be computed from the values of the smaller coalitions. There-
fore, we can represent G by listing the values of all coalitions of size 1 as well as
the values of the coalitions for which there is a synergy, i.e., the inequality (8.3) is
strict.
By construction, the SCG representation is complete. Moreover, it is succinct
when there are only a few groups of agents that can collaborate productively. Fur-
ther, it allows for an efficient procedure for checking whether an outcome is in the
core: it can be shown that if an outcome is not in the core, then there is a “syn-
ergetic” coalition, i.e., one whose value is given explicitly in our representation,
which can profitably deviate. However, the SCG representation has a major draw-
back: computing the value of a coalition may involve finding an optimal partition
of the players into subcoalitions, and is therefore NP-hard.
Chapter 8 351
This representation is more compact than that of [46] when the number of skills
is large (so that the domain of the function u is very large), but the game can be
described in terms of a small number of tasks, or if the function F can be encoded
succinctly.
that a j , ak ∈/ C it holds that v(C ∪ {a j }) = v(C ∪ {ak }). We will refer to the sets
1 T
A , . . . , A as agent types. Then the value of any coalition depends solely on how
many agents of each type it contains. More precisely, given a coalition C ⊆ A, we
define the coalition-type of C as a vector ψ = n1 , . . . , nT
, where ni = |C ∩Ai |. It is
immediate that two coalitions of the same coalition-type have the same value. This
means that the conventional characteristic function v : 2A → R can be replaced
with the more concise type-based characteristic function, vt : Ψ → R, which is
defined on the set
Ψ = {n1 , . . . , nT
| 0 ≤ ni ≤ Ai }
of all possible coalition-types. To represent
this function, we only need to store
O(nT ) coalitional values, since |Ψ| = (A1 + 1) × · · · × (AT + 1) < nT . Thus,
for small values of T , this representation is significantly more succinct than the
standard one. On the other hand, it is obviously complete: in the worst case, all
agents have different types and vt coincides with v.
argmaxCS∈PA V (CS).
To date, there are two main representations of the space of possible coalition struc-
tures. The first, proposed by Sandholm et al. [60], is called the coalition structure
graph. In this undirected graph, every node represents a coalition structure. These
nodes are categorized into levels PA1 , . . . , PAn , where level PAi contains the nodes
that represent all coalition structures containing exactly i coalitions. An edge con-
nects two coalition structures if and only if: (1) they belong to two consecutive
levels PAi and PAi−1 , and (2) the coalition structure in PAi−1 can be obtained from
the one in PAi by merging two coalitions into one. A four-agent example can be
seen in Figure 8.1.
Having described the main representations of the search space, in the remain-
ing subsections we will present different approaches to the coalition structure gen-
eration problem, some of which are built upon those representations.
Theorem 8.4 Given a coalition C ⊆ A, let f (C) be the value of an optimal parti-
tion of C, i.e., f (C) = maxP∈PC V (P). Then
v(C) if |C| = 1
f (C) = (8.4)
max v(C) , max{C ,C }∈PC f (C ) + f (C ) otherwise.
Proof. The proof is trivial when |C| = 1. Thus, for the remainder of the proof
we will assume that |C| > 1. Let opt(C) be some optimal partition of C, i.e.,
opt(C) ∈ argmaxP∈PC V (P). We will make use of the following lemma.
Proof of Lemma 8.1 To prove the lemma, observe that P∗ = P ∪P and V (P∗ ) =
V (P ) + V (P ). Suppose for the sake of contradiction that P was not an op-
timal partition of C . Then there exists another partition P ∈ PC such that
V (P ) > V (P ). However, since P ∪ P is a partition of C, and since V (P ∪ P ) =
V (P ) + V (P ) > V (P∗ ), it follows that P∗ cannot be an optimal partition of C, a
contradiction. Assuming that P is not an optimal partition of C leads to a contra-
diction as well, by a similar argument. Thus, the proof of the lemma is complete.
Lemma 8.1 shows that if |opt(C)| > 1, then there exists a coalition struc-
ture {C ,C } ∈ PC such that opt(C) = opt(C ) ∪ opt(C ). On the other hand, if
|opt(C)| = 1, then surely we would have opt(C) = {C} and V (opt(C)) = v(C).
Equation (8.4) covers both possibilities
by taking the maximum over v(C) and
max{C ,C }∈PC f (C ) + f (C ) .
The way DP works is by iterating over all the coalitions of size 1, and then over
all those of size 2, and then size 3, and so on until size n: for every such coalition
C, it computes f (C) using equation (8.4). As can be seen, whenever|C| > 1, the
equation requires comparing v(C) with max{C ,C }∈PC f (C ) + f (C ) . The result
of this comparison is stored in a table, t, which has an entry for every coalition.
In particular, if v(C) was greater, then the algorithm sets t[C] = C, so that it can
later on remember that it is not beneficial
to split C into two coalitions. Other-
wise, it sets t[C] = argmax{C ,C }∈PC f (C ) + f (C ) to remember the best way of
splitting C into two coalitions. By the end of this process, f (A) will be computed,
which is by definition equal to V (CS∗ ). It remains to compute CS∗ itself. This
is done recursively using the table t. The running time of this algorithm can be
shown to be O(3n ).
The execution of the algorithm is illustrated by the following example.
Example 8.7 Given A = {a1 , a2 , a3 , a4 }, suppose that t[A] = {{a1 , a2 }, {a3 , a4 }},
i.e., it is most beneficial to split A into {a1 , a2 } and {a3 , a4 }. Moreover, suppose
that t[{a1 , a2 }] = {{a1 }, {a2 }}, while t[{a3 , a4 }] = {a3 , a4 }, i.e., it is most bene-
ficial to split {a1 , a2 } into {a1 } and {a2 }, but it is not beneficial to split {a3 , a4 }.
In this case, CS∗ = {{a1 }, {a2 }, {a3 , a4 }}.
Although DP is guaranteed to find an optimal coalition structure, Rahwan and
Jennings [53] showed that many of its operations are in fact redundant. Based
on this, they developed an improved dynamic programming algorithm (IDP) that
avoids all redundant operations. To date, IDP is the fastest algorithm that can
356 Chapter 8
find an optimal solution in O(3n ) time. This is significantly less than ω(nn/2 ) –
the time required to exhaustively enumerate all coalition structures. However, the
disadvantage is that IDP provides no interim solution before completion, meaning
that it is not possible to trade computation time for solution quality.
This problem can be approached by (1) dividing the space into subsets, and (2)
identifying a sequence in which these subsets are searched so that the worst-case
bound on solution quality is guaranteed to improve after each subset. The first
such algorithm was developed by Sandholm et al. [60], and is mainly based on the
following theorem.
Proof. For a partial search to establish a bound on solution quality, every coalition
C ⊆ A must appear in at least one of the searched coalition structures. This is due
to the possibility of having a single coalition whose value is arbitrarily greater
than the values of other coalitions. Now, since the grand coalition appears in
PA1 , and every other coalition C ⊂ A appears in {C, A\C} ∈ PA2 , the value of the
Chapter 8 357
best coalition structure in PA1 ∪ PA2 is at least maxC⊆A v(C). On the other hand,
since CS∗ can include at most n coalitions, its value cannot be greater than n ×
∗
maxC⊆A v(C). This means max V (CS )V (CS∗ ) ≤ n.
CS∈PA A
1 ∪P2
AAs for the number of searched coalition structures, the reader can check that
P ∪ PA = 2n−1 . What remains is to show that no bound can be established
1 2
by searching a different set of at most 2n−1 coalition structures. This is done by
proving that PA1 ∪ PA2 is the unique subset of PA of size at most 2n−1 in which
every coalition appears in some coalition structure. We leave this as an exercise
for the reader.
Based on this theorem, the algorithm starts by searching the bottom two levels.
After that, if additional time is available, the algorithm searches the remaining
levels one by one, starting from the top level and moving downward. Sandholm
et al. proved that the bound improves with this search. In particular, once the
algorithm completes searching level PAi , the bound becomes β = 'n/h(, where h =
'(n − i)/2( + 2. The only exception is when n ≡ h − 1(mod h) and n ≡ i(mod 2),
in which case the bound becomes β = )n/h*. Importantly, this means that after
searching the bottom two levels and establishing the bound β = n, one can very
easily drop (i.e., improve) the bound to β = )n/2* by searching the top level,
which only contains one coalition structure.
A different approach was proposed by Dang and Jennings [16]. Their algo-
rithm starts by searching the bottom two levels, as well as the top one (as Sand-
holm et al.’s algorithm does). After that, however, instead of searching the re-
maining levels one by one (as Sandholm et al. do), the algorithm searches through
certain subsets of all remaining levels. Specifically, it searches the coalition struc-
tures that have at least one coalition of size at least )n(d − 1)/d* (with d running
from '(n + 1)/4( down to 2). Dang and Jennings proved that, for any given value
of d, the algorithm establishes a bound of 2d − 1.
So far, we have seen how certain bounds can be established by searching
certain subsets of the search space. However, with the exception of β = n and
β = )n/2*, we still do not know the minimum subset that must be searched in
order to establish a desired bound. To this end, let us introduce the following no-
tation. For any integer partition I ∈ In , let PI denote the set of possible partitions
of I. For instance, P{1,1,2} consists of the following four partitions: {{1, 1, 2}},
{{1, 1}, {2}}, {{1, 2}, {1}}, and {{1}, {1}, {2}}. Moreover, for any set of integer
partitions I ⊆ In , let S(I ) be the set that consists of every non-empty subset of
every integer partition in I , i.e.,
S(I ) = {J}.
I∈I J⊆I,J=0/
358 Chapter 8
For example, given I = {{1, 1, 2}, {1, 3}}, the set S(I ) consists of the following
subsets: {{1}}, {{2}}, {{3}}, {{1, 1}}, {{1, 2}}, {{1, 3}}, {{1, 1, 2}}. Finally,
for any integer partition I ∈ In and any set of integer partitions I ⊆ In , let η(I , I)
denote the minimum number of subsets in S(I ) that are required to construct a
partition in PI . Formally,
⎧
⎨ minS⊆S(I ):S∈PI |S| if ∃S ⊆ S(I ) : S ∈ PI
η(I, I ) =
⎩
+∞ otherwise.
For example, given I = {1, 1, 1, 3} and I = {{1, 1, 2}, {1, 3}}, the minimum num-
ber of subsets in S(I ) that are required to construct a partition of I is 2, and those
subsets are {1, 1} and {1, 3}. Therefore, we have η(I , I) = 2. Rahwan et al. [55]
showed that this definition is crucial when determining the minimum subset that
must be searched in order to establish a certain bound. Specifically, they prove the
following theorem.
Theorem 8.6 For any real value b, 1 ≤ b ≤ n, and for any I ⊆ In , we can estab-
lish a bound β = b by searching ∪I∈I PAI if and only if the following holds:
∀I ∈ In , η(I , I) ≤ b. (8.5)
Furthermore, the minimum set of coalition structures that must be searched in or-
der to establish a bound β = b is ∪I∈In (b) PAI , where In (b) is defined as follows:
A
I (b) ∈
n
arg min ∪I∈I PI .
I ⊆In :∀I∈In ,η(I ,I)≤b
Therefore, it would be useful to have an algorithm that can efficiently search those
subspaces. In what follows, we present an algorithm that does exactly that.
∑CS∈PAI V (CS)
A
P = ∑ I(i) · AvgAi . (8.6)
I i∈I
Proof. For any C ⊆ A, the number of coalition structures in PAI that contain C
depends solely on the size of C. In other words, this number is equal for any two
|C|
coalitions that are of the same size. Let us denote this number by NI . Formally,
|C|
for every C ⊆ A we set NI = {CS ∈ PAI | C ∈ CS}. Then we have
n
∑ A V (CS) = ∑ ∑ NI · v(C) = ∑ NI ∑ v(C) = ∑ NI · i · AvgAi,
i i i
where ni is the binomial coefficient (i.e., the number of possible coalitions of
size i). Thus, to prove (8.6) it suffices to prove that
n
∑i∈I NIi · i · AvgAi
A
P = ∑ I(i) · AvgAi .
I i∈I
This can be done by proving that the following holds for all i ∈ I:
n
NI ·
i
= I(i) · PAI . (8.7)
i
Observe that every CS ∈ PAI contains exactly I(i) coalitions of size i. Thus:
|C|
∑ NI = ∑ ∑
A
1= ∑A ∑ 1= ∑ AI(i) = |PAI| · I(i).
C:|C|=i C:|C|=i CS∈PI :C∈CS CS∈PI C∈CS:|C|=i CS∈PI
360 Chapter 8
|C| |C|
We have shown that ∑C:|C|=i NI = |PAI | · I(i). On the other hand, since NI
|C|
is equal for all coalitions of size |C|, we obtain ∑C:|C|=i NI = ni · NIi . Thus,
equation (8.7) holds.
∑ zi, j · x j = 1 for i = 1, 2, . . . , n
j=1,...,2n
x j ∈ {1, 0} for j = 1, 2, . . . , 2n
Theorem 8.8 Let Ink ⊆ In be a set in which every integer partition contains at most
k integers that are greater than 1. Then, the best coalition structure in ∪I∈Ink PAI is
within a bound β = n2 /k from optimum.
Proof. Assume that CS∗ contains multiagent coalitions, where > k. Let
C1 , . . . ,C−k be the − k coalitions with the smallest values in CS∗ . Let us split
each coalition Ci , i = 1, . . . , − k, into single-agent coalitions; denote the resulting
coalition structure by CSk . Clearly, CSk ∈ ∪I∈Ink PAI . Furthermore, the total value
∗
V (CS ), and the values of the single-agent coalitions
of C1 , . . . ,C−k is at most −k
are non-negative.
n Hence, we have V (CSk ) ≥ k V (CS∗ ). It remains to observe that
≤ 2 .
364 Chapter 8
/
• Independent (ID): This is when Pr ∩ Pr = Pr ∩ Nr = Pr ∩ Nr = 0.
Theorem 8.9 A set of rules R is feasible if and only if (1) it includes no pair of
rules that are connected by an edge of type IC, and (2) for any two rules in R
that are connected by an edge of type CD, it is not possible to reach one from the
other via a series of edges of type CS.
To understand the intuition behind the proof, consider an example of three rules,
r1 , r2 , r3 . Suppose that for i = 1, 2, 3 we have ri = (Pi , Ni ) → ϑi , where P1 =
{a1 , a2 }, N1 = 0, / P2 = {a2 , a3 }, N2 = 0,/ and P3 = {a3 , a4 }, N3 = {a1 }. Here,
r1 and r2 are connected by an edge of type CS. Thus, they must be applicable to
a single coalition in CS, say C , such that P1 ∪ P2 ⊆ C . Similarly, an edge of type
CS connects r2 and r3 , and so they must be applicable to a single coalition in CS,
say C , such that P2 ∪ P3 ⊆ C . Now, since P1 ∪ P2 overlaps with P2 ∪ P3 , and since
the coalitions in CS are pairwise disjoint, we must have C = C . This means that
r1 , r2 , r3 must all be applicable to the same coalition, i.e., the edge between r1 and
r3 must not be of the type IC or CD. However, in our example, we happen to have
an edge of type CD between r1 and r3 . Therefore, any rule set containing r1 , r2 , r3
is not feasible.
Based on Theorem 8.9, Ohta et al. proposed the following MIP formulation.
max ∑ ϑr · xr subject to:
r∈R
xri + xr j ≤ 1 for each edge (ri , r j ) of type IC (8.8)
yeri = 0 for each edge e = (ri , r j ) of type CD with j > i (8.9)
yer j ≥ 1 for each edge e = (ri , r j ) of type CD with j > i (8.10)
yerk ≤ yer + (1 − xrk ) + (1 − xr )
for each edge (rk , r ) of type CS (8.11)
yer ≤ yrk + (1 − xrk ) + (1 − xr )
e
Bachrach et al. [4] considered the coalition structure generation problem in coali-
tional skill games (see Section 4.3.3). While this problem is, in general, very hard
computationally, Bachrach et al. showed that it admits an efficient algorithm as
long as the number of tasks m and the treewidth of a certain associated hyper-
graph are small. To describe their algorithm, we need a few additional definitions.
Given a skill game with a skill set S, its skill graph is a hypergraph g = V, E
in
which every agent corresponds to a vertex, and every skill si ∈ S is represented as a
hyperedge esi ∈ E that connects all agents that possess this skill. The “complexity”
of a hypergraph can be measured using the notion of treewidth. The following
definition is reproduced from [30] (an illustration is provided in Figure 8.3).
Figure 8.3: A skill graph and its tree decomposition with width 2.
Let CSG(m, w) be the class of all coalitional skill games where the number of
tasks is at most m and the treewidth of the corresponding skill graph is at most
w. We will now show that, for fixed m and w, the coalition structure generation
problem for a game in CSG(m, w) can be solved in time polynomial in the number
of agents n and the number of skills k (but exponential in m and w).
To start, observe that a single task can be performed multiple times by a single
coalition structure CS. To be more precise, a task that requires a skill which only
x agents share can be performed at most x times (this is when each one of those
Chapter 8 367
x agents appears in a different coalition in CS). Let d denote the largest number
of agents sharing a single skill; note that d ≤ w + 1. Then a coalition structure
can accomplish at most dm tasks. Based on this, we will define a candidate task
solution as a set {Γi }hi=1 where each Γi is a subset of Γ, and h ≤ dm. For every
coalition structure CS = {Ci }hi=1 , we say that CS accomplishes {Γi }hi=1 if Ci ac-
complishes all tasks in Γi , for i = 1, . . . , h. We say that {Γi }hi=1 is feasible if there
exists at least one coalition structure that accomplishes it. Clearly, the total value
obtained by accomplishing these tasks is ∑hi=1 F(Γi ). The problem of finding an
optimal coalition structure is thus equivalent to the problem of finding a feasible
set of task subsets that maximizes ∑hi=1 F(Γi ). To solve this problem, it is suffi-
cient to iterate over all possible choices of {Γi }hi=1 : for each such choice we find
the coalition structure that accomplishes it, or determine that it is not feasible.
Next, we show how this can be done for a fixed set {Γi }hi=1 in time polynomial in
n and k; the bound on the running time follows as the number of candidate task
solutions is bounded by (2m )dm ≤ (2m )(w+1)m .
To this end, observe that every coalition structure can be viewed as a coloring
of the agents, where all agents with the same color form a coalition. Based on
this, for each choice of {Γi }hi=1 , let us define a constraint satisfaction problem2
whose underlying graph is the skill graph g, where:
• the domain (i.e., the possible values) of each variable (i.e., agent) consists
of the possible colors (i.e., the possible coalitions that the agent can join);
• the domain of every bag consists of the possible colorings of the agents
in the bag. The size of this domain is O(hw+1 ) = O(((w + 1)m)w+1 ) since
every bag contains at most w + 1 agents, and every agent has h possible
colors;
2 For more details on constraint satisfaction problems, see [59].
368 Chapter 8
• the constraints are of two types. The first prevents an agent from getting
different colors in two neighboring bags. This, in turn, ensures that every
agent gets the same color in all bags (due to the structure of the tree decom-
position). The second type of constraints is exactly the same as the one in
the primal problem (i.e., if a skill is required for at least one task in Γi , then
at least one agent in Ci possesses that skill).
Note that a solution to the dual problem is in fact a valid solution to the primal
problem. Since the underlying graph of the dual problem is a tree, it can be solved
in time polynomial in n and k [4, 59].
Aziz and de Keijzer [3] and Ueda et al. [71] studied the coalition structure gener-
ation problem under the agent-type representation (see Section 4.3.4). Recall that
under this representation the game is given by a partition of the set of agents A
into T types A1 , . . . , AT and a type-based
characteristic function vt : Ψ → R, where
Ψ = {n , . . ., n
| 0 ≤
nT ≤ A }. Thus, a coalition structure can be viewed as a
1 T i i
Q[n1 , . . . , nT
] = Q[n1 − x1 , . . . , nT − xT
] + vt (x1 , . . . , xT
),
R[n1 , . . . , nT
] = R[n1 − x1 , . . . , nT − xT
], x1 , . . . , xT
.
By the end of this process, we compute Q[|A1 |, . . . , |AT |
] and R[|A1 |, . . . , |AT |
],
which provide the solution to the coalition structure generation problem. Filling
out each cell of R and Q requires O(nT ) operations, and the size of each table is
|Ψ| < nT . Hence, the algorithm runs in time O(n2T ).
• coalitions that do not contain ai . For those, every positive or negative con-
straint that contains ai has no effect, and so can be removed.
Thus, the problem of dealing with c(A, P, N, S) is replaced with two simpler
problems; we can then apply the same procedure recursively. This process can be
visualized as a tree, where the root is c(A, P, N, S), and each node has two outgo-
ing edges: one leads to a subtree containing some agent a j and the other leads to a
subtree that does not contain a j . As we move down the tree, the problems become
simpler and simpler, until one of the following two cases is reached: (1) a case
where one can easily generate the feasible coalitions, which is called a base case,
or (2) a case where one can easily verify that there are no feasible coalitions (i.e.,
the constraints cannot be satisfied), which we call an impossible case (see [54] for
more details). This is illustrated in Figure 8.4 (A), where the edge labels ai and
ai indicate whether the branch contains, or does not contain, ai , respectively. By
Chapter 8 371
Figure 8.4: Feasible coalitions and coalition structures: given a basic CCF, (A)
shows how to generate feasible coalitions, while (B) shows how to generate feasi-
ble coalition structures.
generating the feasible coalitions in all base cases, one ends up with the feasible
coalitions in c(A, P, N, S).
The tree structure described above also facilitates the search for an optimal
feasible coalition structure. Indeed, observe that every such tree contains exactly
one path that (1) starts with the root, (2) ends with a leaf, and (3) consists of
edges that are each labeled with ai for some ai ∈ A. In Figure 8.4, for example,
this path is the one connecting c(A, P, N, S) to baseCase15 . Now, let us denote by
A∗ the sequence of agents that appear in the labels of this path. For instance, in
Figure 8.4, we have A∗ = a5 , a2 , a1 , a8
. Finally, let us denote by a∗i the ith agent
in A∗ .
With these definitions in place, we can now present the coalition structure
generation algorithm in [54]; we will call this algorithm DC as it uses a divide-
and-conquer technique. The basic idea is to create lists, L1∗ , · · · , L|A ∗
∗ |+1 , where
∗ ∗ ∗ ∗
L1 consists of the base cases that contain a1 , each Li , i = 1, . . . , |A |, consists of
the base cases that contain a∗i but not a∗1 , . . . , a∗i−1 , and L|A
∗
∗ |+1 consists of the base
∗ ∗
cases that do not contain a1 , . . . , a|A∗ | . This is illustrated in Figure 8.4 (B). Impor-
tantly, by constructing the lists in this way, we ensure that every feasible coalition
structure contains exactly one coalition from L1∗ , and at most one coalition from
372 Chapter 8
each Li∗ , i > 1. Thus, the algorithm picks a coalition, say C1 , from some base case
in L1∗ , and checks whether {C1 } is a feasible coalition structure. If not, then the
agents in C1 are added to the negative constraints of all base cases in L2∗ . This
places further constraints on the coalitions in those base cases, so as to ensure
that they do not overlap with C1 . Next, the algorithm picks a coalition, say C2 ,
from some base case in L2∗ , and checks whether {C1 ,C2 } is a feasible coalition
structure, and so on. Eventually, all feasible coalition structures are examined. To
speed up the search, the algorithm applies a branch-and-bound technique (see [54]
for more details). This algorithm was compared to the integer programming for-
mulation in Section 5.3.3, where z contains a column for every feasible coalition,
instead of a column for every possible coalition. This comparison showed that DC
outperforms the integer programming approach by orders of magnitude.
6 Conclusions
We gave a brief overview of basic notions of cooperative game theory, followed
by a discussion of a number of representation formalisms for coalitional games
that have been proposed in the literature. We then presented several algorithms
for finding an optimal coalition structure, both under the standard representation,
and under the more succinct encodings discussed earlier in the chapter. There are
several other approaches to the optimal coalition structure generation problem,
which we were unable to cover due to space constraints; this problem continues
to attract a lot of attention from the multiagent research community due to its
challenging nature and numerous applications.
We would like to conclude this chapter by giving a few pointers to the lit-
erature. Most standard game theory textbooks provide some coverage of coop-
erative game theory; the well-known text of Osborne and Rubinstein [47] is a
good example. There are also several books that focus exclusively on cooperative
games [11, 15, 48]. A very recent book by Chalkiadakis et al. [13] treats the topics
covered in the first part of this chapter in considerably more detail than we do, and
also discusses coalition formation under uncertainty. However, its coverage of the
coalition structure generation problem is much less comprehensive than ours.
7 Exercises
1. Level 1 Compute the Shapley values of all players in the two variants of the
ice cream game described in Example 8.2. Do these games have non-empty
cores?
Chapter 8 373
2. Level 1 Argue that any n-player induced subgraph game can be represented
as a basic MC-net with O(n2 ) rules.
3. Level 1 Given the characteristic function shown in Table 8.1, where the
value of the grand coalition is 165, identify the optimal coalition structure
using the same steps as those of the integer partition-based (IP) algorithm.
7. Level 2 Consider two simple games G1 = (A, v1 ) and G2 = (A, v2 ) with the
same set of players A. Suppose that a player i ∈ A is not a dummy in both
games. Can we conclude that i is not a dummy in the game G∩ = (A, v∩ ),
with the characteristic function v∩ given by v∩ (C) = min{v1 (C), v2 (C)}?
What about the game G∪ = (A, v∪ ), where v∪ is given by v∪ (C) =
max{v1 (C), v2 (C)}?
8. Level 2 Prove that any outcome in the core maximizes the social welfare,
i.e., for any coalitional game G it holds that if (CS, x) is in the core of G,
then CS ∈ argmaxCS∈PA V (CS).
374 Chapter 8
References
[1] George E. Andrews and Kimmo Eriksson. Integer Partitions. Cambridge University
Press, Cambridge, UK, 2004.
[2] Haris Aziz, Felix Brandt, and Paul Harrenstein. Monotone cooperative games and
their threshold versions. In AAMAS’10: Ninth International Joint Conference on
Autonomous Agents and Multi-Agent Systems, pages 1107–1114, 2010.
[3] Haris Aziz and Bart de Keijzer. Complexity of coalition structure generation. In
AAMAS’11: Tenth International Joint Conference on Autonomous Agents and Multi-
Agent Systems, pages 191–198, 2011.
Chapter 8 375
[4] Yoram Bachrach, Reshef Meir, Kyomin Jung, and Pushmeet Kohli. Coalitional
structure generation in skill games. In AAAI’10: Twenty-Fourth AAAI Conference
on Artificial Intelligence, pages 703–708, 2010.
[5] Yoram Bachrach and Jeffrey S. Rosenschein. Coalitional skill games. In AAMAS’08:
Seventh International Conference on Autonomous Agents and Multi-Agent Systems,
pages 1023–1030, 2008.
[6] Yoram Bachrach and Jeffrey S. Rosenschein. Power in threshold network flow
games. Autonomous Agents and Multi-Agent Systems, 18(1):106–132, 2009.
[7] John F. Banzhaf. Weighted voting doesn’t work: A mathematical analysis. Rutgers
Law Review, 19:317–343, 1965.
[10] Peter Borm, Herbert Hamers, and Ruud Hendrickx. Operations research games: A
survey. TOP, 9:139–199, 2001.
[11] Rodica Brânzei, Dinko Dimitrov, and Stef Tijs. Models in Cooperative Game The-
ory. Springer, 2005.
[12] Georgios Chalkiadakis, Edith Elkind, Evangelos Markakis, Maria Polukarov, and
Nicholas R. Jennings. Cooperative games with overlapping coalitions. Journal of
Artificial Intelligence Research (JAIR), 39:179–216, 2010.
[13] Georgios Chalkiadakis, Edith Elkind, and Michael Wooldridge. Computational As-
pects of Cooperative Game Theory. Morgan and Claypool, 2011.
[15] Imma Curiel. Cooperative Game Theory and Applications. Kluwer, 1997.
[16] Viet D. Dang and Nicholas R. Jennings. Generating coalition structures with fi-
nite bound from the optimal guarantees. In AAMAS’04: Third International Joint
Conference on Autonomous Agents and Multi-Agent Systems, pages 564–571, 2004.
[17] Morton Davis and Michael Maschler. The kernel of a cooperative game. Naval
Research Logistics Quarterly, 12(3):223–259, 1965.
376 Chapter 8
[19] Xiaotie Deng, Qizhi Fang, and Xiaoxun Sun. Finding nucleolus of flow game. In
SODA’06: 17th ACM-SIAM Symposium on Discrete Algorithms, pages 124–131,
2006.
[20] Xiaotie Deng, Toshihide Ibaraki, and Hiroshi Nagamochi. Algorithmic aspects of
the core of combinatorial optimization games. Mathematics of Operations Research,
24(3):751–766, 1999.
[21] Xiaotie Deng and Christos Papadimitriou. On the complexity of cooperative solution
concepts. Mathematics of Operations Research, 19(2):257–266, 1994.
[22] Ezra Einy, Ron Holzman, and Dov Monderer. On the least core and the Mas-Colell
bargaining set. Games and Economic Behavior, 28:181–188, 1999.
[23] Edith Elkind, Georgios Chalkiadakis, and Nicholas R. Jennings. Coalition struc-
tures in weighted voting games. In ECAI’08: Eighteenth European Conference on
Artificial Intelligence, pages 393–397, 2008.
[24] Edith Elkind, Leslie Ann Goldberg, Paul Goldberg, and Michael Wooldridge. On
the computational complexity of weighted voting games. Annals of Mathematics
and Artificial Intelligence, 56(2):109–131, 2009.
[25] Edith Elkind, Leslie Ann Goldberg, Paul Goldberg, and Michael Wooldridge. A
tractable and expressive class of marginal contribution nets and its applications.
Mathematical Logic Quarterly, 55(4):362–376, 2009.
[26] Edith Elkind and Dmitrii Pasechnik. Computing the nucleolus of weighted voting
games. In SODA’09: 20th ACM-SIAM Symposium on Discrete Algorithms, 2009.
[27] Piotr Faliszewski, Edith Elkind, and Michael Wooldridge. Boolean combinations
of weighted voting games. In AAMAS’09: 8th International Joint Conference on
Autonomous Agents and Multiagent Systems, pages 185–192, 2009.
[28] Thomas A. Feo and Mauricio G. C. Resende. Greedy randomized adaptive search
procedures. Journal of Global Optimization, 6:109–133, 1995.
[30] Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions:
A survey. In MFCS’01: 26th International Symposium on Mathematical Founda-
tions of Computer Science, pages 37–57, 2001.
Chapter 8 377
[31] Daniel Granot and Frieda Granot. On some network flow games. Mathematics of
Operations Research, 17(4):792–841, 1992.
[32] Samuel Ieong and Yoav Shoham. Marginal contribution nets: a compact repre-
sentation scheme for coalitional games. In ACM EC’05: 6th ACM Conference on
Electronic Commerce, pages 193–202, 2005.
[33] Ehud Kalai and Eitan Zemel. Generalized network problems yielding totally bal-
anced games. Operations Research, 30(5):998–1008, 1982.
[34] Ehud Kalai and Eitan Zemel. Totally balanced games and games of flow. Mathe-
matics of Operations Research, 7(3):476–478, 1982.
[35] Helena Keinänen. Simulated annealing for multi-agent coalition formation. In KES-
AMSTA’09: Third KES International Symposium on Agent and Multi-Agent Sys-
tems: Technologies and Applications, pages 30–39, 2009.
[36] Walter Kern and Daniël Paulusma. Matching games: the least core and the nucleo-
lus. Mathematics of Operations Research, 28(2):294–308, 2003.
[37] William Lucas and Robert Thrall. n-person games in partition function form. Naval
Research Logistic Quarterly, pages 281–298, 1963.
[38] Andreu Mas-Colell. An equivalence theorem for a bargaining set. Journal of Math-
ematical Economics, 18:129–139, 1989.
[39] Michael Maschler, Bezalel Peleg, and Lloyd S. Shapley. Geometric properties of
the kernel, nucleolus, and related solution concepts. Mathematics of Operations
Research, 4:303–338, 1979.
[40] Tomomi Matsui and Yasuko Matsui. A survey of algorithms for calculating power
indices of weighted majority games. Journal of the Operations Research Society of
Japan, 43(1):71–86, 2000.
[41] Yasuko Matsui and Tomomi Matsui. NP-completeness for calculating power in-
dices of weighted majority games. Theoretical Computer Science, 263(1-2):305–
310, 2001.
[42] Nicola Di Mauro, Teresa M. A. Basile, Stefano Ferilli, and Floriana Esposito. Coali-
tion structure generation with GRASP. In AIMSA’10: Fourteenth International Con-
ference on Artificial Intelligence: Methodology, Systems, and Applications, pages
111–120, 2010.
[43] Tomasz Michalak, Jacek Sroka, Talal Rahwan, Michael Wooldridge, Peter McBur-
ney, and Nicholas R. Jennings. A distributed algorithm for anytime coalition
structure generation. In AAMAS’10: Ninth International Joint Conference on Au-
tonomous Agents and Multi-Agent Systems, pages 1007–1014, 2010.
378 Chapter 8
[44] Pragnesh Jay Modi. Distributed Constraint Optimization for Multiagent Systems.
PhD thesis, University of Southern California, Los Angeles, CA, USA, 2003.
[45] Naoki Ohta, Vincent Conitzer, Ryo Ichimura, Yuko Sakurai, Atsushi Iwasaki, and
Makoto Yokoo. Coalition structure generation utilizing compact characteristic func-
tion representations. In CP’09: Fifteenth International Conference on Principles
and Practice of Constraint Programming, pages 623–638, 2009.
[46] Naoki Ohta, Atsushi Iwasaki, Makoto Yokoo, Kohki Maruono, Vincent Conitzer,
and Tuomas Sandholm. A compact representation scheme for coalitional games in
open anonymous environments. In AAAI’06: Twenty-First National Conference on
Artificial Intelligence, pages 697–702, 2006.
[47] Martin Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, 1994.
[48] David Peleg and Peter Sudhölter. Introduction to the Theory of Cooperative Games.
Springer, 2007.
[49] Adrian Petcu and Boi Faltings. A scalable method for multiagent constraint op-
timization. In IJCAI’05: Nineteenth International Joint Conference on Artificial
Intelligence, pages 266–271, 2005.
[50] Kislaya Prasad and Jerry S. Kelly. NP-completeness of some problems concerning
voting games. International Journal of Game Theory, 19(1):1–9, 1990.
[51] Talal Rahwan and Nicholas R. Jennings. An algorithm for distributing coalitional
value calculations among cooperative agents. Artificial Intelligence, 171(8–9):535–
567, 2007.
[52] Talal Rahwan and Nicholas R. Jennings. Coalition structure generation: Dynamic
programming meets anytime optimisation. In AAAI’08: Twenty-Third AAAI Confer-
ence on Artificial Intelligence, pages 156–161, 2008.
[53] Talal Rahwan and Nicholas R. Jennings. An improved dynamic programming algo-
rithm for coalition structure generation. In AAMAS’08: Seventh International Con-
ference on Autonomous Agents and Multi-Agent Systems, pages 1417–1420, 2008.
[54] Talal Rahwan, Tomasz P. Michalak, Edith Elkind, Piotr Faliszewski, Jacek Sroka,
Michael Wooldridge, and Nicholas R. Jennings. Constrained coalition formation. In
AAAI’11: Twenty-Fifth AAAI Conference on Artificial Intelligence, pages 719–725,
2011.
[55] Talal Rahwan, Tomasz P. Michalak, and Nicholas R. Jennings. Minimum search
to establish worst-case guarantees in coalition structure generation. In IJCAI’11:
Twenty-Second International Joint Conference on Artificial Intelligence, pages 338–
343, 2011.
Chapter 8 379
[56] Talal Rahwan, Sarvapali D. Ramchurn, Viet D. Dang, and Nicholas R. Jennings.
Near-optimal anytime coalition structure generation. In IJCAI’07: Twentieth Inter-
national Joint Conference on Artificial Intelligence, pages 2365–2371, 2007.
[57] Talal Rahwan, Sarvapali D. Ramchurn, Andrea Giovannucci, Viet D. Dang, and
Nicholas R. Jennings. Anytime optimal coalition structure generation. In AAAI’07:
Twenty-Second Conference on Artificial Intelligence, pages 1184–1190, 2007.
[58] Talal Rahwan, Sarvapali D. Ramchurn, Andrea Giovannucci, and Nicholas R. Jen-
nings. An anytime algorithm for optimal coalition structure generation. Journal of
Artificial Intelligence Research (JAIR), 34:521–567, 2009.
[59] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.
Prentice Hall, Upper Saddle River, N.J., 2nd edition, 2003.
[60] Tuomas Sandholm, Kate Larson, Martin Andersson, Onn Shehory, and Fernando
Tohmé. Coalition structure generation with worst-case guarantees. Artificial Intelli-
gence, 111(1–2):209–238, 1999.
[61] David Schmeidler. The nucleolus of a characteristic function game. SIAM Journal
on Applied Mathematics, 17:1163–1170, 1969.
[63] Sandip Sen and Partha Dutta. Searching for optimal coalition structures. In IC-
MAS’00: Sixth International Conference on Multi-Agent Systems, pages 286–292,
2000.
[64] Lloyd S. Shapley. A value for n-person games. In H. W. Kuhn and A. W. Tucker,
editors, Contributions to the Theory of Games, volume II, pages 307–317. Princeton
University Press, 1953.
[65] Lloyd S. Shapley and Martin Shubik. The assignment game I: The core. Interna-
tional Journal of Game Theory, 1:111–130, 1972.
[66] Onn Shehory and Sarit Kraus. Methods for task allocation via agent coalition for-
mation. Artificial Intelligence, 101(1–2):165–200, 1998.
[67] Tammar Shrot, Yonatan Aumann, and Sarit Kraus. On agent types in coalition
formation problems. In AAMAS’10: Ninth International Joint Conference on Au-
tonomous Agents and Multi-Agent Systems, pages 757–764, 2010.
[68] Tamas Solymosi and T. E. S. Raghavan. An algorithm for finding the nucleolus of
assignment games. International Journal of Game Theory, 23:119–143, 1994.
380 Chapter 8
[69] Alan D. Taylor and William S. Zwicker. Simple Games. Princeton University Press,
1999.
[70] Suguru Ueda, Atsushi Iwasaki, Makoto Yokoo, Marius Calin Silaghi, Katsutoshi
Hirayama, and Toshihiro Matsui. Coalition structure generation based on distributed
constraint optimization. In AAAI’10: Twenty-Fourth AAAI Conference on Artificial
Intelligence, pages 197–203, 2010.
[71] Suguru Ueda, Makoto Kitaki, Atsushi Iwasaki, and Makoto Yokoo. Concise char-
acteristic function representations in coalitional games based on agent types. In
IJCAI’11: Twenty-Second International Joint Conference on Artificial Intelligence,
pages 393–399, 2011.
[72] D. Yun Yeh. A dynamic programming approach to the complete set partitioning
problem. BIT Numerical Mathematics, 26(4):467–474, 1986.
1 Introduction
In open multiagent systems, agents are autonomous and therefore their behavior
is not deterministic. On the one hand, this is a desirable feature essential to the
nature of the multiagent paradigm. The designer of an agent has to take into ac-
count the autonomy of other agents while programming the way it will interact
with them. This relaxes constraints on other agents’ behavior and makes adapta-
tion during run-time a must for any agent that wants to be competitive. On the
other hand, autonomy also brings vulnerabilities. An agent cannot assume that
the other individuals will behave following the same code of conduct. Each agent
has different interests that do not necessarily agree with the interests of the others.
Similar to what happens in human societies, artificial societies need some kind
of mechanism to guarantee a certain degree of control. Traditionally, the control in
electronic environments has been approached only from a computational security
perspective. The field of trust-based computing has been interested in develop-
ing secure systems aimed at preventing a set of well-defined attacks. Proposed
solutions often rely on cryptography algorithms. Public-key infrastructures and
trusted platform modules are examples of secure systems using asymmetric key
algorithms [9]. Even if these techniques can be used to ensure specific properties
such as authentication or message integrity, they do not secure a multiagent sys-
tem regarding aspects like the truth of messages or the subjective fulfillment of a
382 Chapter 9
service. Furthermore, with this approach it is required that there exists a few reli-
able trusted third parties to provide public and private keys, credentials, or secure
deployment infrastructures. These assumptions are unrealistic if decentralization
and openness are necessary features.
Control has then to be implemented with an approach compatible with the
specific characteristics of open multiagent systems. The term soft security has
been proposed by Rasmusson and Jansson [32] to refer to control techniques that
provide a degree of protection without being too restrictive for the system devel-
opment. The general idea is to provide social control mechanisms that do not
prevent every occurrence of undesirable events but that are able to adapt the sys-
tem to prevent them from appearing again in the future. Trust and reputation
mechanisms have been working in human societies for a long time to implement
soft security.
Inspired by its importance in human relations, the use of social trust and rep-
utation in agent interactions has been proposed. Their use is dual. From a local
perspective, they are integrated into an agent decision process when it involves
other agents in order to decide with whom to interact. From a global perspec-
tive, they can be used as social control mechanisms [5]. The particularity of social
control mechanisms is that the individuals themselves are in charge of supervising
each other. Fear of ostracism is used as a deterrent. An agent that does not behave
as expected would be distrusted by its neighbors and socially excluded.
The implementation of trust and reputation mechanisms for software agents
needs first an explicit representation of these social evaluations. Section 2 sum-
marizes the different formalisms used to represent them. Then, sections 3 and 4
describe the processes required to implement, respectively, trust and reputation
mechanisms. Finally, the connections between trust and other agreement tech-
nologies are described in section 5.
2 Computational Representation of
Trust and Reputation Values
There is not a unique way to represent trust and reputation values. Existing mod-
els use different formalisms, its choice being justified by the type of reasoning
the agents have to do. The two main characteristics of this choice are the sim-
plicity and the expressiveness of values. It is, however, difficult to combine these
two characteristics as they are rather opposite, and a trade-off between them has
usually to be done. The advantages of having a simple representation is that
it facilitates the calculation functions and the reasoning mechanisms. However,
simplicity implies less information and therefore the kind of reasoning that can be
Chapter 9 383
done is less sophisticated. On the other hand, expressive values require more com-
putational and storage capacity as well as complex reasoning algorithms to take
advantage of that expressiveness. This section shows some of the representations
that can be used for trust and reputation values.
ambiguity brings about interoperability problems in the event that agents have to
exchange their own values.
takes advantage of the linguistics modifiers in fuzzy sets. Figure 9.2 shows how
linguistic modifiers affect a fuzzy set representing a reputation value. The idea
is that in adding these modifiers the issuer expresses the degree of precision of
the reputation value represented by the fuzzy set (for example, “somewhat right”
expresses less precision than “extremely”). This is reflected in the wideness of the
fuzzy set and is interpreted as a measure of reliability of the reputation value, in
other words, “the reliability of reputation is implicitly represented in the shape of
the fuzzy set.”
of integration between the trust and reputation model and the rest of the elements
of the agent. Integration is essential if we consider deliberative architectures, and,
more specifically, one of the most successful architectures in the MAS field, the
BDI architecture (see Chapter 1). As we said, trust and reputation are meant to
be used in agents’ reasoning mechanisms, so they have to be represented in the
same way as any other mental states. Therefore in a BDI architecture, the trust
and reputation values should be represented in terms of beliefs. Using beliefs to
represent trust or reputation raises two main issues. The first one is to define the
content and the semantics of this specific belief. The second issue consists of
linking the belief to the aggregated data grounding it.
The content of a trust or a reputation belief has to include in the representa-
tion all the constitutive elements of that belief. For instance, if we rely on the
socio-cognitive theory proposed by Castelfranchi and Falcone [6] claiming that
“an agent i trusts another agent j in order to do an action α with respect to a
goal ϕ", the formal representation has to be able to express all this information.
More specifically, it means that trust is about an agent and has to be relative to a
given action and a given goal. Such a formalization has been done in the ForTrust
model [16] by the definition of a specific predicate OccTrust(i, j,α,ϕ) holding for
specific instances of a trustor (i), a trustee ( j), an action (α), and a goal (ϕ). The
OccTrust(i, j,α,ϕ) predicate is used to represent the concept of occurrent trust,
which refers to a trust belief holding here and now.
The problem of linking beliefs to an aggregation and weighting of values (for
example, using a numerical representation of trust and reputation) has been tack-
led in the BDI+RepAge [30] by the integration of the RepAge model with a BDI
architecture. The link consists of transforming each one of the probability values
of the probability distribution used in RepAge into a belief. Following the defi-
nition that reputation is “what a social entity says about a target regarding his/her
behavior” (see Section 4 for a discussion of this definition), the final belief that
will be introduced in the belief data base of the agent will be a belief that reflects
the certainty that the corresponding evaluation is communicated among the agents
in the group. The approach used in BDI+RepAge is based on LBC [27], a belief
language and logic that makes it possible to ground the reputation values into be-
liefs and then reason about them. Among other things, LBC introduces a belief
predicate S representing what is believed to be said in the community.
For example, from a reputation value calculated by RepAge that says that the
reputation of agent j playing the role seller is Rep( j, seller, [0.6, 0.1, 0.1, 0.1, 0.1])
where the probability distribution is defined over the discrete set
{V BadProduct, BadProduct, OKProduct, GoodProduct,V GoodProduct},
the system would generate the set of beliefs shown in Figure 9.3. The predicate
S(buy( j), GoodProduct, 0.1, seller) for example, represents the belief “people say
Chapter 9 387
1.0
0.0
VGoodProduct
BadProduct
VBadProduct
GoodProduct
OKProduct
RepAge Beliefs
that the probability of receiving a good product from the seller j is 0.1.” Notice
that the belief is about what people say, and not about the quality of j as a seller.
One relevant aspect here is that because the reputation/trust can be represented
in the agent’s mind as a belief (or set of beliefs), it can be used like any other belief
to perform a standard BDI reasoning without extending the BDI model.
narrow fuzzy set reflects the contrary. In other words, the shape of the fuzzy set
reflects implicitly the reliability of the reputation value.
Social
information
Direct Communicated
experiences experiences
Epistemic
Images
Reputations
Trust evaluations
Motivational
Trust decisions
Actions
• Direct experiences: direct interactions between the trustor and the trustee.
This source is usually considered as the most reliable because it comes from
a direct perception without intermediaries. An exception would be if the
sensors of the agent that allow it to perceive the interaction are not reliable.
Almost every multiagent trust model uses the agents’ direct experiences as
a source for image calculation.
Besides images, a trustor can use reputation as a source for trust. Reputation is
available since the agent evolves in a society in which social evaluations are com-
municated. Reputation should not be confused with communicated experiences.
This concept and its social construction are detailed in section 4.
The second step described in the figure is the trust decision process. The goal
of this process is to determine if a trustee should be trusted for a given task. The
trustor is now in a situation where it is in its own interest that the trustee behaves
in an expected way, and the trustor has to decide if it will intend to rely on the
trustee. Obviously, trust evaluations are considered to make this decision. But the
decision should also take into account the current context since trustworthiness
may depend on it. For example, I can trust my car mechanic to fix correctly my
car (the trust evaluations indicate he or she is a good mechanic), but I need my
car urgently and I see that my mechanic has many cars waiting to be repaired.
Although he or she is a good mechanic I know that my mechanic will not be able
to do a good job in that context. In that context probably it is better to choose
another mechanic with worse trust evaluations but with more availability.
We have seen the general structure of a trust model. In the following subsec-
tions we will go in depth in the different aspects of trust.
the most reliable source of experience because it is assumed that the trustor has
a correct perception of its own interactions and of its local satisfaction criteria.
Communicated experiences are useful to increase the input size of the trust cal-
culation process so that the resulting value is more accurate. However, they may
also introduce noise or false information in case agents have different satisfac-
tion criteria (e.g., one agent considers an interaction as satisfactory while another
one may have considered it differently), or if some agents create fake experience
reports (e.g., a group of malicious agents trying to recommend each other).
The nature of the inputs as well as the chosen formalisms for trust evaluations
have an influence on the trust calculation process.
theory of trust presented in [6]. They consist of internal and external attributions
about the trustee:
• the internal attribution of the trustee means that it should have the intention
of behaving as expected;
Beli G∗ ((κ ∧ Choicei Fϕ) → (Intends j (α) ∧ Capable j (α) ∧ After j:α ϕ))
Dispositional trust is believed to be true for a truster i toward a trustee j for
doing an action α in order to achieve a goal ϕ when the conditions κ hold if the
truster has potentially the goal ϕ when κ holds (PotGoali (ϕ, κ)), and if the truster
believes that when it has the goal ϕ and when κ holds (κ∧Choicei Fϕ)), the trustee
j intends to perform α (Intends j (α)), is capable of doing it (Capable j (α)), and
has the power to satisfy ϕ by doing α (After j:α ϕ)). The condition κ is used to
contribute to the multidimensionality of the model (together with the action α).
It represents a general context that can be a state of the world, for example, the
previous sending of a query from i to j to perform α.
dispositional trust when the conditions κ hold; (ii) has the goal ϕ; (iii) believes
that the conditions κ currently hold.
Several competitions were organized between 2005 and 2008. But the ART com-
petitions have also emphasized the limitations of statistical approaches. The best
trust models performed well in the specific scenario of the testbed but it was not
really possible to explain the reasons for this good performance nor to exploit
lessons for real applications. Moreover, it has been shown that different trust mod-
els were not able to exchange trust information because of the high heterogeneity
of representations. Interoperability between heterogeneous trust models [45] re-
quires a rigorous semantical definition of trust concepts. Sociological groundings
are necessary to define a precise semantics to trust concepts that would facilitate
interoperability with other software agents but also with a human trust decision.
of time as a social mechanism that helps these societies to regulate the behavior
of their individuals. Apart from this social functionality, reputation also has an
individual dimension, acting as one of the main sources used by individuals to
build trust relations.
Similar to what happens with the notion of trust, many definitions of what
reputation is can be found in the literature, some of them mixing (wrongly) the
concept of reputation with that of trust or assuming as general some properties
that cannot always be taken for certain. We will base our definition on the work
by Pinyol et al. [27]. In this work the authors define an ontology for reputation
that at the same time is based on the previous work by Sabater et al. in the RepAge
model [37].
We define reputation as “what a social entity says about a target regarding
his or her behavior.” Let’s take apart this definition. A “social entity” is defined
as a set of individuals plus a set of social relations among these individuals or
properties that identify them as a group in front of its own members and the society
at large. Ruben [34] defines a social entity as “a group which is irreducible to the
sum of its individual members, and so must be studied as a phenomenon in its
own right.” Examples of social entities in human societies are companies, sport
clubs, neighborhoods, street bands, etc. Notice we are talking about a group of
individuals without trying to individualize. It is not what A, B, or C are saying
at an individual level but what they say in the name of the group as a whole.
When we talk about reputation we lose track of the single individuals, which
constitute the group that is responsible for the reputation value. This property is
important because this allows reputation to be an efficient mechanism to spread
social evaluations by reducing retaliation fear [28]. Because it is the social entity
that is responsible for the evaluation and not the issuer at an individual level, the
degree of responsibility that the issuer takes is much less. This allows the issuer
to be more inclined to spread the evaluation even if it is not so sure about it.
The second important aspect is regarding the word “says.” In the computa-
tional reputation models literature, reputation is also defined as the “opinion” that
a social entity has about a target. We prefer to use the word “says” because it
stresses two other important aspects of reputation. First that the reputation is not
necessarily a belief of the issuer and second that reputation cannot exist without
communication. It makes no sense to talk about reputation if there is no exchange
of evaluations. You could imagine a social entity in which its individuals share
an evaluation but there is no communication among them and with the rest of
the society. In that case we cannot talk about reputation but only about a shared
evaluation. Reputation only appears when the evaluation is communicated and
circulates, being identified as an evaluation associated with the social entity and
not with the individual who is performing the communicative act. In this case the
398 Chapter 9
fact that the opinion is believed to be true or not by the communicator is irrelevant.
In fact, an individual can help to spread a reputation believing that the truth is the
contrary. What is relevant is that it is communicated and attributable to the social
entity.
Finally, reputation (like trust) is always associated with a specific behav-
ior/property. It makes no sense to talk about reputation as a general property
and, when it is done, it is because the object of the reputation is implicit.
is being communicated and (ii) that the individuals who share the image are a
good sample of what the whole social entity thinks. The former is usually ful-
filled (we know the image exists because it is communicated) but very few models
take into account the latter. Usually, as soon as the first communicated image is
received, the model already gives a reputation value. Generalizing a reputation
value just from a few images is not very reliable. To lessen the effects of this,
some models incorporate a measure of the reliability that a reputation value has
(see section 2.6). An example of a model that takes this into account is the Re-
GreT system [35]. What in ReGreT is called the witness reputation is nothing
else but a reputation calculated from communicated images. ReGreT uses social
information to detect the most representative and credible members of a social
entity (those that can represent better the general thinking of the group) and uses
400 Chapter 9
those images to build the reputation. Moreover, it assigns to each reputation value
a reliability value that reflects how confident the model is on that calculation. The
reliability value is based on the credibility of the individuals whose images are
used to build the reputation.
knowledge. The ReGreT [35] model makes an extensive use of inherited reputa-
tion. Specifically it takes into account inherited reputation coming from the role
the agent is playing (what they call system reputation) and inherited reputation
coming from the social relations (what they call neighborhood reputation). The
FIRE [17] model also uses the information about the role to calculate what they
call role-based trust. This role-based trust is, in spite of its name, a reputation
value inherited from the reputation that that role has in the society and that is
automatically transformed into a trust value.
When we talk about an object instead of an individual, reputation can be in-
herited also through other kinds of structural relations. For example, if a scientific
journal has a good reputation, a paper published in it inherits part of this reputa-
tion just because there is a relation “part of.” One model that takes into account
the structural relation to calculate reputation is that of Osman et al. [25]. The pro-
posed algorithm makes it possible to calculate how the reputation propagates in a
structural graph taking into account the structural relations among individuals/ob-
jects.
Inherited reputation is usually used as a starting approach to the actual rep-
utation value. Once the agent starts interacting with the other agents and has
enough information to use images as a source for reputation or starts receiving
communicated reputations, inherited reputation should be overwritten by those
more reliable sources.
Ra→b (ϕ) = ∑ ξi · R
a→b
i (ϕ)
i∈{W,N,S,D}
ξW = RL W (ϕ)
a→b
ξN = RL N (ϕ) · (1 − ξW )
a→b
ξS = RL S (ϕ) · (1 − ξW − ξN )
a→b
ξD = 1 − ξW − ξN − ξS
A central service is responsible for collecting the raw information (images, social
relations, etc.) from individuals in a social entity and for calculating a reputa-
tion value using an aggregation mechanism. This value is then available to the
individuals as a measure of reputation.
A centralized reputation system has the following advantages:
• The fact that reputation values are public allows newcomers to benefit from
this information even without having any previous knowledge about the so-
ciety.
• The individuals have to trust the central service regarding the impartiality
of the calculation.
• The mechanism used to calculate the reputation is not taking into account
personal preferences and biases.
• The central repository is a bottleneck for the system. That means that it
introduces a system vulnerability in case of failure of the central service. It
may also cause a system overload or slowness in case agents require very
frequent access to reputation values and send a huge amount of queries and
data to the central repository.
• Each agent can decide the method that it wants to follow to calculate repu-
tation.
• It can take some time for the agent to obtain enough information from the
rest of the society to calculate a reliable reputation value. It is not so easy
for newcomers to start using reputation in a society that does not have a
centralized reputation service.
Many multiagent models fall in this category (ReGret [35], Travos [43],
FIRE [17]).
As we have seen in section 3, reputation is one of the elements that can contribute
to building trust in a trustee. Usually reputation is used when there is a lack of
direct information. This is how it is used for instance in the ReGreT model [35].
In that model what is called direct experience is the main source for building trust,
but if that source is not available, the model relies on reputation. Specifically, the
model considers three types of reputation values, depending on the information
source: information from third parties, social relations, and roles.
Chapter 9 405
4.4.2 Ballot-Stuffing
Ballot-stuffing is an attack in which an agent sends more feedback than interac-
tions it has been involved in as a partner. The main counterattacks described in the
literature are filtering feedback that comes from peers suspected of ballot-stuffing,
and using feedback per interaction rates instead of per accumulation of feedback.
4.4.4 Whitewashing
Whitewashing occurs when an agent changes its identifier in order to escape pre-
vious bad feedback. A more sophisticated attack happens when whitewashing is
combined with collusion and unfair ratings. A group of malicious agents collude
to allow whitewashing by using unfair ratings to increase the reputation of the
agent who has just changed personality. This is no different than a Sybil attack
(see Section 4.4.6).
4.4.5 Collusion
Collusion occurs when a group of agents cooperate with one another in order to
take advantage of the system and other agents. This is not an attack “per se” but
an enhancer of other attacks. Collusion is difficult to counter. A possibility is to
detect an important and recurrent deviation in the feedbacks of different agents
regarding the same targets. If that detection leads to identifying a small group of
agents that keeps recommending each other (while other agents provide a different
feedback), this group can be suspected of collusion and excluded from the system.
However, this detection is hard to perform if there is no global view of the agent
society or if there is a high number of colluding agents.
Chapter 9 407
Sybil attacks are a sort of security threat that can be launched in scenarios in which
it is easy to create fake identities. The attack consists of creating enough identi-
ties so that a single agent can subvert the normal functioning of the system. An
example of a Sybil attack would be the improvement of one’s reputation through
artificial feedback from many Sybil identities.
All reputation mechanisms have an inertia, that is, when the behavior of an agent
changes, it takes some time for the reputation to reflect the new reality. This
is necessary to avoid having reputation values that oscillate from one value to
another in reacting to the minimum change, and depending on the model, this
inertia can be bigger or smaller. This attack consists of using this lag that the
reputation mechanism needs to reflect the new reality and exploiting it to obtain
benefit. The malevolent agent increases the reputation to a certain point and then
starts defrauding – taking advantage of the good reputation while the reputation
value is still high. Once the reputation decreases, it starts again to show a good
behavior to increase again the reputation, repeating the cycle. The solution to this
attack can rely on two aspects. First, adjusting the reaction time of the reputation
mechanism so it reacts quickly enough to changes in the behavior. Second, giving
the agent the possibility of detecting patterns that show this cyclic behavior in the
reputation value.
S1
Rep(a,seller,VB)
Comm(b,Rep(a,seller,VB)) Img(b,inf,VG)
SharedEval(b,inf,VG)
Attack
Comm(c,Img(b,inf,VG))
S2
SharedEval(b,inf,VB)
Comm(d, Img(b,inf,VB))
5.1 Argumentation
In a scenario with autonomous entities that have to interact and at the same time
maintain self-interest goals, it is normal to have disagreements. This makes it
necessary to establish dialogues to try to reach a consensus. This is a central
problem in the field of argumentation (see Chapter 5).
We can see the relation of trust and reputation models to argumentation from
two different perspectives (that at the same time are complementary):
• Argumentation for trust/reputation: Arguments can be used to explain a
trust/reputation value. A straightforward approach is to provide to the other
agent the “raw” data that has been used to calculate the value [38], where by
“raw” data we mean the data that still has not been evaluated by the agent.
This raw data can be seen as the argument that gives support to the value.
The main drawback of this approach consists of the problem of privacy in
providing basic data like contracts or other kinds of sensitive information,
Chapter 9 409
• Trust/reputation for argumentation: In this case trust and reputation are not
the object of the argumentation process but mechanisms that help to decide
about the reliability of the arguments. During an argumentation dialogue
about a social evaluation, it can be either that the object of the evaluation is
known by the two agents but they disagree on the value (that is the case in
the example in Figure 9.6 where there is a disagreement about the quality
of b as an informer) or that the attack is based on information that the first
agent did not know before (for example new information that reveals that b
is not a good informer). In both cases (especially in the latter) the trust in
the agents that are sending the information and their reputation as informers
are crucial elements that can be used to decide about the acceptance of the
argument/attack. Therefore, if the sender of the information is a trustworthy
agent, I can accept what it says without having any further certainty or even
if the information contradicts what I know.
410 Chapter 9
5.2 Negotiation
As stated in Chapter 4, the capacity to negotiate with other agents is a skill usually
demanded of an autonomous agent. Similar to what happens with argumentation,
the relation between trust/reputation and negotiation models can be at different
levels:
5.3 Norms
According to the definition of Marsh [23], trust (and by extension reputation)
refers to an expectation about an uncertain behavior. Trust evaluations are built
based on the compliance of past behavior of the trustee and are used by the trustor
to predict the trustee’s future behavior. Even if the expectations are often hidden
in the trust calculation function, they can also be made explicit by using norms.
The use of norms is covered in more depth in Chapter 2, in particular, for situa-
tions in which norms are used by electronic institutions in order to control agents’
behaviors.
The concept of norms has various implementations in multiagent systems. In
the categories of norms identified by Tuomela [44], rules (r-norms for Tuomela)
represent hard constraints that have to be respected in a society, social norms
(s-norms) consist of preferences shared by a group of individuals, and personal
norms (p-norms or m-norms) are individual expectations. Rules are usually en-
forced by electronic institutions. But this solution requires an intrusive control of
agents and the centralization of some tasks by trusted third parties. If such charac-
teristics are not feasible or not desired, trust and reputation acting as social control
mechanisms can be used to enforce agents to behave as expected.
Chapter 9 411
Trust is then used to assess the agent’s norm obedience [22]. When individual
norms are considered, trust evaluations are subjective and performed locally by
agents. Contracts may also be used to make explicit the trustor’s expectations.
The distance between the outcome of a transaction and the initial contract is then
an experience to consider as an input of trust calculation (e.g., Regret [35] uses
such experiences). With global norms attached to groups or to the whole society,
the expectations are attached to a set of agents. Each agent of the group should use
trust mechanisms to contribute to social control by supervising the compliance of
its neighbors’ behavior to rules and social norms. Trust models that attach trust
evaluations to given norms [14, 42] are able to represent different trust values
about the same trustee and provide a multidimensional evaluation covering local
and global norms.
5.4 Organizations
Besides norms, other organizational concepts bring interesting properties when
combined with trust and reputation. When agents are organized in groups, it is
therefore possible to place some information at a group level. Reputation for
instance can be attached to an explicit group of agents, and an agent may have
different reputations in different groups. The reputation concept is represented in
this way in the ForTrust model [16] by the predicate Reput(I, j, α, ϕ, κ) stating
that agent j has the reputation in the group I to achieve ϕ by doing α when the
conditions κ hold. An explicit representation of the group I is required to be
integrated with the formal representation of reputation.
A generalization of trust assessments can be done using the concept of roles.
Trust is then attached to agents but also to general roles. There is a mutual relation
between role-based and agent-based trust. An agent’s behaviors influence the trust
attached to its role but the evaluation of the agent also depends on the trust about
its roles. It is then possible to estimate the trustworthiness of an agent without
any prior experience with it, by considering the role it is playing. However, it is
essential that roles are defined and explicitly attached to agents. Sometimes, roles
represent stereotypes of agents and are built dynamically by clustering agents ac-
cording to exhibited characteristics [15].
One solution to this problem is the use of a common ontology. Till now there
have been only two ontologies [4, 29] specifically oriented to trust and reputation.
The ontologies define a set of terms related to trust and reputation unambiguously
so every agent that adopts the ontology can use it as a bridge to communicate with
the other agents regarding trust and reputation. The ontology of Pinyol et al. [29]
also establishes how to make the conversion between different types of trust and
reputation values representations. For example, it provides a way to represent a
reputation value initially represented as a probability distribution as a real number.
This way, a model using internally a real value to represent trust values can use
information coming from an agent that uses probability distributions.
This, however, does not solve the problem of subjectivity. Although the agents
can share the meaning of what is an image, a reputation, or a shared voice, this
does not mean that they also share the scale of values associated with each con-
cept. Moreover, the interests of the agents may be different, and so their eval-
uations of a particular event may be different as well. Something that is good
for one agent, but can be very bad for another. One possible solution, as pro-
posed by Koster et. al. [20], is that the agent aligns its trust model with that of
its partner prior to starting to consider social evaluations coming from it. This
alignment process however is not an easy task and requires a considerable amount
of shared information between the two agents. In order to reduce the amount of
previously shared information, another solution is to use argumentation as we saw
in the previous section when we were talking about the use of argumentation for
trust/reputation. Justifying a communicated value can give clues to the receiver
about what the motivation is for that value and if that motivation can adapt to its
situation. The idea is to use argumentation dialogues to “understand” why the
other agent is giving that social evaluation. By doing that, the agent can decide
to modify its own beliefs, adapt the information that it receives, or simply discard
that information.
Chapter 9 413
6 Conclusions
There is no doubt that computational trust and reputation models have become an
important topic in the area of MASs. Currently it is almost inconceivable to de-
ploy an agent in an open multiagent system without giving that agent the capability
of evaluating the trust of partners. At the same time, the nature of a MAS makes
necessary mechanisms of social control like reputation to guarantee its good func-
tioning. It is this relevance that resulted in a large number of computational trust
and reputation models being proposed in the last few years. However, the fact
that this topic is a crossroad of different disciplines (psychology, sociology, cog-
nitive science, artificial intelligence, economics, etc.) makes it difficult and slow
to achieve a consensus even on the main definitions.
The initial and mainstream approach was to define mathematical functions
computing a single or a small set of trust values from a set of inputs (usually ob-
servations of agents’ behaviors). Recently this tendency has changed and now
much more effort is dedicated to semantical aspects with the objective of building
trust and reputation representations that are closer to the real concepts involved in
human relations. This is mainly due to the fact that MAS are currently seen as a
global system populated not only by artificial entities but also by human beings.
Therefore, the old agent that knew about computational protocols and languages
and was using black boxes to make the calculations is not yet enough. The new
challenge is a new generation of agents that can communicate (in the broad sense
of the word) with humans. To achieve that, more cognitive approaches that take
into account not only the final value but also the “path” that carries one to that
value are appearing. The motivations behind these two approaches mainly depend
on the nature of the trust decisions to be made. For instance, if we need an au-
tomatic decision that should make accurate predictions, mathematical approaches
are relevant. If it is important to be precise in the semantics of the concepts a
symbolic socio-cognitive approach seems more suited. It is however nonsense
to oppose these two approaches. They should indeed be seen as complementary
and modern trust and reputation models should now provide accurate computation
mechanisms that lead to the instantiation of well-defined concepts.
Another important challenge is the integration of the trust and reputation mod-
els with the rest of the elements of the agent. The black box and reactive approach,
where the trust and reputation model is a passive element of the architecture wait-
ing for queries about the trust or reputation value assigned to a possible partner,
may be satisfactory for simplistic applications but is clearly insufficient for more
complex MAS. We have to see the trust and reputation module as an element
that can justify the returned values if it is necessary and participate in the deci-
sion making in a proactive way, proposing actions that improve the knowledge the
agent has about the society.
414 Chapter 9
7 Exercises
1. Level 1 Provide a critical discussion on the advantages/drawbacks of adopt-
ing a mathematical approach to trust representation and calculation versus
a symbolic one.
7. Level 2 Design a heuristic for the aggregation function like that in Sec-
tion 4.1.4 to obtain a reputation evaluation that takes into account the con-
text.
8. Level 3 Trust and reputation models are meant to be used in an agent rea-
soning process besides other agreement technologies. Explain how they
can be combined with argumentation techniques and what the contribution
of trust/reputation is for argumentation and vice versa.
9. Level 3 Design a trust model that takes into account the elements depicted
in Figure 9.4 to calculate trust evaluations. Consider that the agents use
Chapter 9 415
10. Level 4 Evaluating and comparing existing trust models is a difficult task
for which there is not yet any satisfactory solution. Describe the main diffi-
culties that prevent the development of an evaluation tool and propose your
ideas to overcome them.
12. Level 4 Implement an agent that will send unfair ratings to the centralized
reputation mechanism implemented in the previous exercise. Extend this
agent implementation in order to simulate a collusion in which agents use
unfair ratings to recommend each other. Try several executions of a MAS
composed of a different number of attackers to estimate the proportion of
agents for which your centralized reputation mechanism does not provide
reliable reputation values.
References
[1] Alfarez Abdul-Rahman and Stephen Hailes. Supporting trust in virtual commu-
nities. In Proceedings of the 33rd Hawaii International Conference on System Sci-
ences (HICSS), volume 6, page 6007, Washington, DC, USA, 2000. IEEE Computer
Society.
[2] Matt Blaze, Joan Feigenbaum, and Angelos D. Keromytis. Keynote: Trust manage-
ment for public-key infrastructures. In Proceedings of the 1998 Security Protocols
International Workshop, volume 1550, pages 59 – 63, Cambridge, England, April
1998. Springer LNCS.
[3] J. Carbo, J. M. Molina, and J. Davila. Trust management through fuzzy reputation.
Int. Journal in Cooperative Information Systems, 12(1):135–155, 2003.
[4] Sara J. Casare and Jaime S. Sichman. Using a functional ontology of reputation to
interoperate different agent reputation models. Journal of the Brazilian Computer
Society, 11(2):79–94, November 2005.
[5] Cristiano Castelfranchi. Engineering social order. In Proceedings of the 2nd Inter-
national Workshop on Engineering Societies in the Agent’s World (ESAW’00), 2000.
416 Chapter 9
[6] Cristiano Castelfranchi and Rino Falcone. Trust Theory: A Socio-Cognitive and
Computational Model; electronic version. Wiley Series in Agent Technology. John
Wiley & Sons Ltd., Chichester, 2010.
[7] R. Conte and M. Paolucci. Reputation in Artificial Societies: Social Beliefs for
Social Order. Kluwer Academic Publishers, 2002.
[8] Chrysanthos Dellarocas and Charles A. Wood. The sound of silence in online feed-
back: Estimating trading risks in the presence of reporting bias. Manage. Sci.,
54:460–476, March 2008.
[9] Whitfield Diffie and Martin Hellman. New directions in cryptography. IEEE Trans-
actions on Information Theory, November 1976.
[12] Karen Fullam, Tomas Klos, Guillaume Muller, Jordi Sabater, Andreas Schlosser, Zvi
Topol, Suzanne Barber, Jeffrey Rosenschein, Laurent Vercouter, and Marco Voss. A
specification of the Agent Reputation and Trust (ART) testbed: Experimentation
and competition for trust in agent societies. In Frank Dignum, Virginia Dignum,
Sven Koenig, Sarit Kraus, Munindar P. Singh, and Michael Wooldridge, editors,
Proceedings of the 4th International Conference on Autonomous Agents and Multi-
Agent Systems (AAMAS 2005), pages 512–518, Utrecht, Netherlands, July 2005.
ACM Press.
[13] Diego Gambetta. Can We Trust Trust?, pages 213–237. Basil Blackwell, 1988.
[14] Amandine Grizard, Laurent Vercouter, Tiberiu Stratulat, and Guillaume Muller. A
peer-to-peer normative system to achieve social order. In V. Dignum, N. Fornara,
and P. Noriega, editors, Coordination, Organizations, Institutions, and Norms in
Agent Systems, volume LNCS 4386 of Lecture Notes in Computer Science, pages
274–289, Berlin, Germany, 2007. Springer-Verlag.
[15] Ramón Hermoso, Holger Billhardt, and Sascha Ossowski. Role evolution in open
multi-agent systems as an information source for trust. In Proceedings of the
9th International Conference on Autonomous Agents and Multiagent Systems (AA-
MAS’10), volume 1, pages 217–224, 2010.
[16] Andreas Herzig, Emiliano Lorini, Jomi F. Hübner, and Laurent Vercouter. A logic
of trust and reputation. Logic Journal of the IGPL, Normative Multiagent Systems,
18(1):214–244, February 2010.
[17] Trung Dong Huynh, Nicholas R. Jennings, and Nigel R. Shadbolt. An integrated
trust and reputation model for open multi-agent systems. Journal of Autonomous
Agents and Multi-Agent Systems, 13(2):119–154, 2006.
Chapter 9 417
[18] Audun Jøsang and Jennifer Golbeck. Challenges for robust of trust and reputation
systems. In 5th International Workshop on Security and Trust Management (STM
2009), 2009.
[19] Audun Jøsang, Roslan Ismail, and Colin Boyd. A survey of trust and reputation
systems for online service provision. Decis. Support Syst., 43:618–644, March 2007.
[20] Andrew Koster, Jordi Sabater-Mir, and Marco Schorlemmer. Trust alignment: a sine
qua non of open multi-agent systems. In Robert Meersman, Tharam Dillon, and Pilar
Herrero, editors, Proceedings of On the Move to Meaningful Internet Systems (OTM
2011, Part I), volume 7044 of LNCS, pages 182–191, Hersonissos, Greece, 2011.
Springer.
[21] Eleni Koutrouli and Aphrodite Tsalgatidou. Reputation-based trust systems for P2P
applications: Design issues and comparison framework. In Simone Fischer-Hübner,
Steven Furnell, and Costas Lambrinoudakis, editors, Trust and Privacy in Digi-
tal Business, volume 4083 of Lecture Notes in Computer Science, pages 152–161.
Springer Berlin / Heidelberg, 2006. 10.1007/11824633_16.
[22] Emiliano Lorini and Robert Demolombe. Trust and norms in the context of computer
security: Toward a logical formalization. In R. van der Meyden and L. van der Torre,
editors, International Workshop on Deontic Logic in Computer Science (DEON),
volume 5076 of LNCS, pages 50–64. Springer-Verlag, 2008.
[24] Stephen Marsh and Pamela Briggs. Examining trust, forgiveness and regret as
computational concepts. In Jennifer Golbeck, editor, Computing with Social Trust,
Human-Computer Interaction Series, pages 9–43. Springer London, 2009.
[25] Nardine Osman, Carles Sierra, and Jordi Sabater-Mir. Propagation of opinions in
structural graphs. In 19th European Conference on Artificial Intelligence (ECAI-
10), 2010.
[27] Isaac Pinyol. Milking the Reputation Cow: Argumentation, Reasoning and Cogni-
tive Agents. Number 44 in Monografies de l’Institut d’Investigació en Intel.ligència
Artificial. IIIA-CSIC, 2011.
[28] Isaac Pinyol, Mario Paolucci, Jordi Sabater-Mir, and Rosaria Conte. Beyond accu-
racy. Reputation for partner selection with lies and retaliation. In Multi-Agent-Based
Simulation VIII, volume 5003, pages 128–140. Springer, 2008.
418 Chapter 9
[29] Isaac Pinyol, Jordi Sabater-Mir, and Guifre Cuni. How to talk about reputation
using a common ontology: From definition to implementation. In Ninth Workshop
on Trust in Agent Societies, pages 90–102, 2007.
[30] Isaac Pinyol, Jordi Sabater-Mir, Pilar Dellunde, and Mario Paolucci. Reputation-
based decisions for logic-based cognitive agents. Autonomous Agents and Multi-
Agent Systems, 2010. 10.1007/s10458-010-9149-y.
[32] Lars Rasmusson and Sverker Jansson. Simulated social control for secure Inter-
net commerce. In Proceedings of the 1996 Workshop on New Security Paradigms,
NSPW ’96, pages 18–25, New York, NY, USA, 1996. ACM.
[33] Martin Rehák and Michal Pĕchouc̆ek. Trust modeling with context representation
and generalized identities. In Matthias Klusch, Koen Hindriks, Mike Papazoglou,
and Leon Sterling, editors, Cooperative Information Agents XI, volume 4676 of Lec-
ture Notes in Computer Science, pages 298–312. Springer Berlin / Heidelberg, 2007.
[35] Jordi Sabater. Trust and Reputation for Agent Societies. Number 20 in Monografies
de l’Institut d’Investigació en Intel.ligència Artificial. IIIA-CSIC, 2003.
[36] Jordi Sabater and Carles Sierra. Review on computational trust and reputation mod-
els. Artif. Intell. Rev., 24:33–60, September 2005.
[37] Jordi Sabater-Mir, Mario Paolucci, and Rosaria Conte. Repage: REPutation and
imAGE among limited autonomous partners. Journal of Artificial Societies and
Social Simulation (JASSS), 9(2):3, 2006.
[38] Michael Schillo, Petra Funk, and Michael Rovatsos. Using trust for detecting de-
ceitful agents in artificial societies. Applied Artificial Intelligence, 14(8):825–849,
2000.
[39] Sandip Sen, Anish Biswas, and Sandip Debnath. Believing others: Pros and cons.
Artificial Intelligence, 142:279–286, 2000.
[40] Sandip Sen and Neelima Sajja. Robustness of reputation-based trust: Boolean case.
In M. Gini, T. Ishida, C. Castelfranchi, and W. L. Johnson, editors, Proceedings
of the First International Joint Conference on Autonomous Agents and Multi-Agent
Systems (AAMAS’02), volume 1, pages 288–293. ACM, 2002.
[42] Viviane Torres Silva, Ramón Hermoso, and Roberto Centeno. A hybrid reputation
model based on the use of organizations. In Jomi Fred Hübner, Eric Matson, Olivier
Boissier, and Virginia Dignum, editors, Coordination, Organizations, Institutions
and Norms in Agent Systems IV, pages 111–125. Springer-Verlag, Berlin, Heidel-
berg, 2009.
[43] W. T. L. Teacy, J. Patel, N. R. Jennings, and M. Luck. Travos: Trust and reputation in
the context of inaccurate information sources. Autonomous Agents and Multi-Agent
Systems, 12(2):183–198, March 2006.
[44] Raimo Tuomela. The Importance of Us: A Philosophical Study of Basic Social
Norms. Stanford University Press, 1995.
[45] Laurent Vercouter, Sara J. Casare, Jaime S. Sichman, and Anarosa A. F. Brandao.
An experience on reputation models interoperability based on a functional ontology.
In 20th International Joint Conference on Artificial Intelligence 2007, Hyderabad,
India, January 2007.
[46] Laurent Vercouter and Guillaume Muller. L.I.A.R.: Achieving social control in open
and decentralised multi-agent systems. Applied Artificial Intelligence, 24(8):723–
768, September 2010.
[47] William H. Winsborough, Kent E. Seamons, and Vicki E. Jones. Automated trust
negotiation. In DARPA Information Survivability Conference and Exposition, vol-
ume 1, 2000.
[48] Marianne Winslett, Ting Yu, Kent E. Seamons, Adam Hess, Jared Jacobson, Ryan
Jarvis, Bryan Smith, and Lina Yu. Negotiating trust on the web. IEEE Internet
Computing, 6(6):30–37, 2002.
[49] Bin Yu and Munindar P. Singh. An evidential model of distributed reputation man-
agement. In Proceedings of the First International Joint Conference on Autonomous
Agents and Multiagent Systems (AAMAS), pages 294–301. ACM Press, 2002.
Multiagent Learning
1 Introduction
One of the key properties attributed to an agent operating in an unknown environ-
ment is its ability to learn from its experiences. For a single-agent system, this
generally consists of building a mapping from the agent’s inputs (sensor reading
and internal state) to an output (action). How that mapping is constructed depends
on many factors, and there are many algorithms that have been extensively studied,
including learning automata [49], reinforcement learning [72], neuro-evolutionary
algorithms [26] and biologically inspired methods [12].
When extending such algorithms to multiagent learning, two new key issues
arise: how do agents account for the collective action of other agents in the sys-
tem, and how do agents select actions that not only provide a direct benefit but
also shape the actions of other agents in the future? The first issue addresses an
agent’s perception, in that it forces an agent to differentiate between the poten-
tially stochastic changes to an environment and the actions of intelligent agents
and to exploit this knowledge. The second issue addresses an agent’s impact, in
that it forces an agent to select actions that lead to desired behavior through its
interactions with other agents’ actions. These two issues together lead to both
theoretical (convergence) and practical (signal to noise in rewards) complications
and render the direct application of single-agent learning algorithms problematic.
Yet, multiagent learning must solve these problems because it is a fundamen-
tal component of multiagent systems both from scientific and engineering per-
spectives. From a scientific perspective, studying the interactions among learning
424 Chapter 10
agents provides insights into many social phenomena from game theory to com-
modities trading to resource allocation problems. From an engineering perspec-
tive, learning agents provide a conceptually proven approach to distributed control
problems such as load balancing, sensor networks, multirobot coordination, and
air traffic management. The benefits of understanding the dynamics of multiagent
systems are numerous, in that it would provide desirable system characteristics,
including:
• Scalability: A multiagent system removes the need for full information flow
in both the sensory and action directions, and hence multiagent systems
generally scale better than centralized systems.
For a multiagent system to realize those potential gains, agents need to interact
with each other and quickly adapt to changing environments and the changing
strategies of their peers.
As a consequence, machine learning algorithms are crucial to development
and deployment of adaptive multiagent approaches [4, 19, 33, 35, 43, 62, 64, 70,
73, 77, 79, 80, 83, 87]. The subfield of multiagent learning studies agent defi-
nitions, algorithms, interactions, and reward structures to create adaptive agents
that can function in environments where their actions shape and are shaped by the
actions of other agents. Though most multiagent learning algorithms are based
on traditional single-agent learners (e.g., reinforcement learning), they need to be
modified to effectively deal with the challenges stemming from the interaction
among multiple independent agents.
In this chapter we introduce the basics of multiagent learning, present its chal-
lenges and approaches and provide examples of domains that can benefit from
such approaches. The remainder of the chapter is organized as follows.
Section 2 elaborates on the main differences between single and multiagent
learning algorithms, and highlights the challenges that multiagent learning algo-
rithms must address. In Section 3 we concisely introduce the necessary back-
ground on single-agent reinforcement learning, present the multiagent Markov de-
cision process formulation, and introduce a number of state-of-the-art multiagent
Chapter 10 425
counts for their contribution to the system. This problem has also been stud-
ied [4, 47, 81, 89, 90, 91, 92, 98].
In most multiagent problems, these two structural problems are confounded
and each agent needs to not only determine its contribution to the full system
performance, but also determine the relative impact of each of its actions in a se-
quence. Solving both these credit assignment problems at once is difficult since
each reward is a function of actions from different agents over different time steps.
Temporal difference methods, such as Q-learning, are an important class of meth-
ods for addressing the temporal credit assignment problem for single-agent rein-
forcement learning. Unfortunately they do not address the structural credit assign-
ment problem present in multiagent problems and can have very low performance
when there are many agents.
Because it is essential for designing the proper agent rewards, this credit as-
signment problem is critical in both cooperative and competitive multiagent sys-
tems. In the former, it allows for the design of agent rewards that explicitly seek to
maximize a global reward (e.g., win the soccer game in robotic soccer, maximize
the information gathered in multirover exploration). In the latter, it crystalizes the
competition for shared resources and allows the creation of appropriate incentive
structures to improve system performance (e.g., overall improved traffic conges-
tion).
in this section, one can replace “reward” with “objective,” “evaluation,” “payoff,” or “utility” and
retain the same content.
428 Chapter 10
directly [21]. In more complex systems, agents may need local rewards that in-
crease their ability to learn good policies. Unfortunately this “reward shaping”
problem becomes more difficult in domains involving continuous state spaces,
and dynamic and stochastic environments. For example, reward structures that
have been shown to perform well in static environments do not necessarily lead to
good system behavior in dynamic environments.
One way to assess the impact of a reward structure is to have reward charac-
teristics that are independent of the domains and learning algorithms used. Two
such characteristics measure reward “alignment” and “sensitivity,” which measure
necessary, but often conflicting, properties.
• Alignment – formalized as “factoredness” [4, 81] – defines how well two re-
wards are matched in terms of their assessment of the desirability of partic-
ular actions. Intuitively, high values of factoredness mean that actions that
optimize one reward will also optimize the other reward. In that sense, hav-
ing agent rewards have high factoredness with respect to the system reward
results in a system where agents that aim to improve their own performance
also tend to improve system performance.
Based on these characteristics, let us analyze three potential agent rewards for
a given system, as system performance hinges on balancing the degree of factored-
ness and learnability for each agent. In general, a reward with high factoredness
will have low learnability and a reward with high learnability will have low fac-
toredness [102]. Consider the following three reward functions for an agent:
1. Full system reward: Each receives the full system reward. By definition
it is fully factored, meaning every action that is good for the agent is good
for the system. This reward has been used successfully on multiagent prob-
lems with few agents [21]. However, since each agent’s reward depends on
the actions of all the other agents, it generally has poor learnability, a prob-
lem that gets progressively worse as the size of the system grows. This is
because, when an agent’s reward changes, it is difficult for the agent to de-
termine whether that change was caused by its own action or by the actions
Chapter 10 429
of the other agents in the system. It is therefore only suited for problems
with a small number of agents.
2. Local reward: Each agent receives the local component of the full system
reward. In contrast to the global reward, which requires full state informa-
tion, the local reward is composed of the components of the global reward,
which depend on the states of agent i. Because it does not depend on the
states of other agents, this reward is “perfectly learnable.” However, de-
pending on the domain, it may have a low degree of factoredness, in that
having each agent optimize its own performance may or may not promote
coordinated system level behavior.
3. Difference reward: Each agent receives a reward that measures its im-
pact. This reward is based on the difference between the system reward and
the system reward that would have resulted had the agent performed some
“null” action – hence the name [5, 77, 79, 81, 102]. This reward has gen-
erally high factoredness as the removed “null” action term does not depend
on the agent’s action. As a result, any impact an agent has on this reward
comes from its impact on the global reward. Hence, any actions that im-
prove/deteriorate the difference reward also improve/deteriorate the system
reward. Furthermore, this reward usually has better learnability than the
system reward because subtracting out the null action term removes some
of the effects of other agents (i.e., noise) from the agent’s reward. While
having good properties, this reward can be impractical to compute because
it requires knowledge about the system state. The formal definition and ap-
plication of this reward is presented in the air traffic case study in Section 7.
The key then for devising good agent rewards is to balance the individual
agent’s learning with its coordination with other agents. Since directly using the
system reward is impractical in most real-world cases, shaping the agent reward
is critical for both local and difference rewards. In both cases, how the miss-
ing information is handled (ignored, estimated) determines the trade-off between
factoredness and learnability, which in turn directly determines the system perfor-
mance [4, 79].
settings, and reward shaping. We consider these difficulties as the main complex-
ity factors behind agents that learn concurrently.
The settings we follow in this book chapter are that of stochastic or Markov
games (see Section 3.5), evolutionary game theory (see Section 4), swarm intel-
ligence (see Section 5), and neuro-evolutionary algorithms (see Section 6). In
this section we describe two simple learning algorithms, action-value learning
and cross learning (a kind of learning automaton [13]), as an introduction to the
reinforcement learning approach to multiagent learning presented in Section 3.
step t, take the action that agent k took at time step t − 1,” or “at time step t do
the action you performed at time step t − 2,” or “at time step t take the action that
yielded the highest reward in the last three time steps.” Each of these is a policy
and the agent can update its probability vector as described above. However, as
the case with the action-value approach, this method need not converge as the
outcome of the policies depend on the actions of other agents. In both cases,
the simple learning approaches provide a method for learning to take actions and
tracking a non-stationary process, but cannot provide guarantees of convergence.
In the next section, we discuss the extensions of these simple learning concepts
to more diverse and complex situations encompassing multiple states, delayed
rewards, and the general Markov decision context.
st
Agent
rt-1
at
rt
Environment
st+1
Figure 10.1: Basic reinforcement learning scheme: At time step t, the agent is
in state st and takes action at . This results in the agent receiving reward rt and
moving to state st+1 [72].
∞
V π (s) = E ∑ γt R(st )|s0 = s (10.4)
t=0
π
Vt+1 (s) = R(s) + γ ∑ T (s, π(s), s )Vtπ (s ) (10.5)
s ∈S
The process of updating state value functions based on current estimates of
successor state values is referred to as bootstrapping. The depth of successor
states considered in the update can be varied, i.e., one can perform a shallow
bootstrap where one only looks at immediate successor states or a deep bootstrap
where successors of successors are also considered. The value functions of suc-
cessor states are used to update the value function of the current state. This is
called a backup operation. Different algorithms use different backup strategies,
e.g., sample backups (sample a single successor state) or full backups (sample all
successor states).
The goal of an MDP is to find the optimal policy, i.e., the policy that receives
∗
the most reward. The optimal policy π∗ (s) is such that V π (s) ≥ V π (s) for all s ∈ S
and all policies π.
less good at that moment to check whether they might be better in the long term
and so is prone to yield suboptimal solutions.
An alternative is to behave greedily most of the time but once in a while select a
random action to make sure we do not miss the better actions in the long term. This
approach is called the ε-greedy exploration method in which the greedy option is
chosen with high probability 1 − ε and with a small probability ε a random action
is selected. The benefit of this method is that when we play a sufficiently large
number of iterations every action will be sufficiently sampled to guarantee that we
have learned for all actions ai its true value Qt∗ (s, ai ). This ensures that an agent
learns to play the optimal action in the long term. For a thorough overview, we
refer to [101]. Though annealing can be used for ε as well, in practice this is often
unnecessary.
Another alternative is to use a “softmax” approach, or Boltzmann exploration,
where the good actions have an exponentially higher probability of being selected
and the degree of exploration is based on a temperature parameter τ. An action a j
is chosen with probability:
Q(s,a j )
e τ
pj = Q(s,ai )
(10.6)
∑i e τ
" #
Q(s, a) → (1 − α)Q(s, a) + α r + γ max
Q(s , a ) (10.7)
a
where α is the learning rate, and γ the discount-rate. A similar model-free tem-
poral difference learning method is the SARSA learner. In this case, the update
algorithm is given by:
$ %
Q(s, a) → (1 − α)Q(s, a) + α r + γ Q(s , a ) (10.8)
where a denotes the action taken at step s (this is sometimes denoted by π(s )
to denote that policy π needs to be followed from state s ). The key difference
between SARSA learning and Q-learning is that in SARSA learning, the Q-value
update depends on the action selected at s and hence the policy. Q-learning, on
the other hand, is policy independent, since the Q-value update is based on the
best possible action from state s and, thus, does not depend on how the actions
are chosen. Both algorithms are proven to converge to optimal policies, given that
no abstractions have been applied, though in many practical problems with large
state spaces, solving the full MDP is both time-intensive and impractical.
When an environment’s model (i.e., transition function T and reward func-
tion R) is known or can be learned, that model can be used directly to evaluate
the potential outcomes of an agent’s actions. Such a “model-based” RL has two
key advantages over model-free methods: First, given the existence of a model,
the agent needs much less interaction with the environment, making it desirable
in situations where the cost of interacting with the environment is high. Second,
having a model provides a method for evaluating policies off-line – again mini-
mizing the need for direct interaction with the environment. Adaptive real time
dynamic programming (RTDP) is one of the earliest model-based reinforcement
learning algorithms and has the simple flow shown in Algorithm 10.2 [8].
Chapter 10 437
• Full state, joint action MDP with team reward S, A, T, R, Π
: Find single
policy π mapping system state s ∈ S to system action a ∈ A that maximizes
team rewards r ∈ R.
Note that in repeated games player i either tries to maximize the limit of the
average of stage rewards
1 T
max lim ∑ τi (t) (10.9)
πi T →∞ T
t=1
T
or the discounted sum of stage rewards ∑t=1 τi (t) δt−1 with 0 < δ < 1, where τi (t)
is the immediate stage reward for player i at time step t.
b0 b1
a0 r1 r2
a1 r3 r4
Suppose that agent 1 (or the row player) has Q-values for all four joint actions;
then the reward it can expect to accumulate will depend on the strategy adopted
by the second (or column) player. Therefore a JAL agent will keep a model of
the strategies of other agents i participating in the game such that it can compute
the expected value of joint actions in order to select good subsequent actions bal-
ancing exploration and exploitation. A JAL then assumes that the other players i
will choose actions in accordance with the model it keeps on the strategies of the
other players. Such a model can be simply implemented through a fictitious play
approach, in which one estimates the probability with which an agent will play a
specific action based on the frequencies of their past plays. In such a way expected
values can be computed for the actions of a JAL based on the joint actions. For
instance in the above example we would have the following expected value EV
for the first player’s actions:
with Prb1 j the probability with which player 1 believes the other player will choose
actions b j implemented through a fictitious play approach. Using these EV values,
player 1 can now implement, e.g., a Boltzmann exploration strategy for action
selection.
agent system, the Q-function for an agent becomes Q(s, a1 , ..., an ), rather than
the single-agent Q-function, Q(s, a). Given these assumptions Hu and Wellman
define a Nash Q-value as the expected sum of discounted rewards when all agents
follow specified Nash equilibrium strategies from the next period on. The al-
gorithm uses the Q-values of the next state to update those of the current state.
More precisely, the algorithm makes updates with future Nash equilibrium pay-
offs, whereas single-agent Q-learning updates are based on the agent’s own max-
imum payoff. To be able to learn these Nash equilibrium payoffs, the agent must
observe not only its own reward, but those of others as well (as was the case in the
JAL algorithm).
The algorithm proceeds as follows. The learning agent, indexed by i, learns
its Q-values by starting with arbitrary values at time 0. An option is to let
Qi0 (s, a1 , ..., an ) = 0 for all s ∈ S, a1 ∈ A1 , ..., an ∈ An . At each time t, agent i
observes the current state, and takes its action. After that, it observes its own re-
ward, actions taken by all other agents, rewards of others, and the new state s .
Having this information it then computes a Nash equilibrium π1 (s ), ..., πn (s ) for
the stage game (Qt1 (s ), ..., Qtn (s )), and updates its Q-values according to:
i
Qt+1 (s, a1 , ..., an ) =
(1 − αt )Qti (s, a1 , ..., an ) + αt [rti + βNashQti (s )]
where NashQti (s ) = π1 (s )...πn (s ).Qti (s ).
NashQti (s ) is agent i’s payoff in state s for the selected equilibrium. In or-
der to calculate the Nash equilibrium(π1 (s ), ..., πn (s )), agent i needs to know
Qt1 (s ), ..., Qtn (s ). Since this information about other agents’ Q-values is not avail-
able, this has to be learned as well. Since i can observe other agents’ immediate
rewards and previous actions, it can use that information to learn the other agents’
Q-functions as well.
The algorithm is guaranteed to converge to Nash equilibrium, given certain
technical conditions hold. Littman tackled these restrictive conditions of Nash-
Q and introduced Friend or Foe Q-learning [44], which converges to Nash-
equilibrium with fewer restrictions than Nash-Q. For more details on Nash-Q we
refer to [34].
(generalized infinitesimal gradient ascent) algorithm beyond two actions using the
regret measure by Zinkevich [103]. Regret measures how much worse an algo-
rithm performs compared to the best static strategy, with the goal of guaranteeing
at least zero average regret (i.e., no-regret) in the limit. Since GIGA reduces to
IGA in two-player, two-action games, it does not achieve convergence in all types
of games. As a response to the fact that the IGA algorithm does not converge in all
two-player-two-action games, IGA-WoLF (Win or Learn Fast) was introduced by
Bowling [15] in order to improve the convergence properties of IGA. The policies
of infinitesimal gradient ascent with WoLF learning rates are proven to converge
to the Nash equilibrium policies in two-agent, two-action games [15]. The learn-
ing rate is made large if WoLF is losing. Otherwise, the learning rate is kept small
as a good strategy has been found. In contrast to other reinforcement learning
algorithms, IGA-WoLF assumes that the agents possess a lot of information about
the payoff structure. In particular, sometimes agents are not able to compute the
gradient of the reward function since that information is not available, which is
necessary for this algorithm. Another well-known gradient-ascent type algorithm
is policy hill climbing (PHC) explained in [15]. PHC is a simple adaptive strat-
egy based on an agent’s own actions and rewards, which performs hill climbing
in the space of mixed policies. It maintains a Q-table of values for each of its
base actions, and at every time step it adjusts its mixed strategy by a small step to-
wards the greedy policy of its current Q-function. Also the PHC-WoLF algorithm
needs prior information about the structure of the game. Related algorithms to
infinitesimal gradient ascent have been devised to tackle this issue, such as for in-
stance the WPL (weighted policy learner) algorithm of Abdallah and Lesser; see
[1]. The GIGA-WoLF algorithm extended the GIGA algorithm with the WoLF
principle [14] improving on its convergence properties. The algorithm basically
keeps track of two policies of which one is used for action selection and the other
is used for approximating the Nash equilibrium.
C D O1 F H T
C 3, 3 0, 5 O 1, 2 0, 0 H 1, −1 −1, 1
D 5, 0 1, 1 F 0, 0 12 , 1 T −1, 1 1, −1
Three distinct classes of 2 × 2 normal form games are identified in [25]. The
first class consists of games with one pure Nash equilibrium, such as the Prisoner’s
Dilemma (both players play D or Defect). The second class of games has two pure
and one mixed Nash equilibrium. The Battle of the Sexes game belongs to this
class (the pure ones are (O, O) and (F, F), the calculation of the mixed one is
left as an exercise). Finally, the third class of games has only one mixed Nash
equilibrium; an example is the Matching Pennies game (both players play both
actions with probability 0.5).
The second concept, Pareto optimality, is defined as a solution in a game where
there is no joint action for all the players that can improve (or leave unchanged)
the payoff of all the players. That is, there is no joint action that will not reduce
the payoff of at least one player. More formally we have: a strategy combination
s = (s1 , ..., sn ) for n agents in a game is Pareto optimal if there does not exist
another strategy combination s for which each player receives at least the same
payoff Pi and at least one player j receives a strictly higher payoff than Pj . As
Chapter 10 445
an example, the Nash equilibrium (D, D) in the Prisoner’s Dilemma game is not
Pareto optimal. Playing profile (C,C) is Pareto optimal, but not Nash.
Now we can state that a strategy s is an ESS if ∀ s = s there exists some δ ∈ ]0, 1[
such that ∀ ε : 0 < ε < δ
holds. The condition ∀ ε : 0 < ε < δ expresses that the share of mutants needs to
be sufficiently small.
Note that one can frame the idea of ESS without the need to consider agents
breeding. Instead one can consider agents being able to observe the average payoff
of agents playing different strategies, and making a decision as to which strategy
to adopt based on that payoff. In such a case, the “mutant” strategy is simply
446 Chapter 10
a new strategy adopted by some agents – if other agents observe that they do
well, they will switch to this strategy also, while if the new strategy does not do
comparatively well, the agents that play it will switch back to the original strategy.
dxi
= [(Ax)i − x · Ax] xi (10.12)
dt
In Equation 10.12, xi represents the density of action i in the population, and A
is the payoff matrix that describes the different payoff values that each individual
replicator receives when interacting with other replicators in the population. The
state x of the population can be described as a probability vector x = (x1 , x2 , ..., xn ),
which expresses the different densities of all the different types of replicators in
the population. Hence (Ax)i is the payoff that replicator i receives in a popula-
tion with state x and x · Ax describes the average payoff in the population. The
dxi
growth rate xdti of the proportion of action i in the population equals the differ-
ence between the action’s current payoff and the average payoff in the population.
[25, 31, 97] detail further information on these issues.
Usually, we are interested in models of multiple agents that evolve and learn
concurrently, and therefore we need to consider multiple populations. For simplic-
ity, the discussion focuses on only two such learning agents. As a result, we need
two systems of differential equations, one for each agent. This setup corresponds
to a replicator dynamics for asymmetric games, where the available actions or
strategies of the agents belong to two different populations.
This translates into the following coupled replicator equations for the two pop-
ulations:
Chapter 10 447
1 1 1
y1 y1 y1
0.5 0.5 0.5
0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x1 x1 x1
Figure 10.4: Visualization of the replicator dynamics of BoS, PD, and MP games.
dxi
= xi [(Ay)i − xT Ay] (10.13)
dt
dyi
= yi [(Bx)i − yT Bx] (10.14)
dt
where x (y) is the frequency distribution for player 1 (2), and A (B) represents its
individual payoff matrix.
Equations 10.13 and 10.14 indicate that the growth rate of the types in each
population is additionally determined by the composition of the other population,
in contrast to the single population (learner) case described by Equation 10.12.
As an example we plot the direction fields of these equations for the three ex-
ample games, i.e., Battle of the Sexes, Prisoner’s Dilemma, and Matching Pennies,
in Figure 10.4. It is a graphical representation of the solutions of the differential
equations without solving the differential equations analytically. The plots visu-
alize the dynamics qualitatively and show how all possible initial policies of the
agents will evolve over time. Moreover they show the attractors and their basins.
The x-axis shows the probability with which the first player plays its first action
and the y-axis shows the probability with which the second player plays its first
action. For instance in the PD game the x-axis shows the probability with which
the first player plays cooperate – the y-axis shows the same probability for the
second player.
these components exhibit properties that are usually assigned to intelligent enti-
ties, the more the theory of interactive decision making, i.e., game theory, be-
comes relevant to understand, model, and steer the interactions of these intelligent
components. Agent technology, which has become an important research field
within computer science, exactly studies computer systems that are capable of au-
tonomous action taking in order to optimize some given performance measure.
For an overview of the application of game theory to multiagent systems we refer
to [53] and to Chapter 17. Here we lay out the use of evolutionary game theory
for the purpose of evolution and learning in multiagent systems.
Building models of agents that evolve and behave optimally requires insight
into the type and form of these agents’ interactions with the environment and other
agents in the system. In much work on multiagent adaptation and learning, this
modeling is very similar to that used in a standard game theoretical model: players
are assumed to have complete knowledge of the environment, are hyperrational,
and optimize their individual payoff disregarding what this means for the utility of
the entire population. A more recent approach relaxes the strong assumptions of
classical game theory and follows the evolutionary approach. The basic properties
of multiagent systems seem to correspond well with those of evolutionary game
theory. First of all, a multiagent system is a distributed dynamic system, which
makes it hard to model using a rather static theory like traditional game theory.
Secondly, a multiagent system consists of actions and interactions between two
or more independent agents, who each try to accomplish a certain, possibly co-
operative or conflicting, goal. No agent has the guarantee to be completely in-
formed about the other agents’ intentions or goals, nor has it the guarantee to be
completely informed about the complete state of the environment. Furthermore,
evolutionary game theory offers us a solid basis to understand dynamic iterative
situations in the context of strategic interactions and this fits well with the dynamic
nature of a typical multiagent system. Not only do the fundamental assumptions
of evolutionary game theory and multiagent systems seem to fit each other rather
well, but there is also a formal relationship between the replicator equations of
evolutionary game theory and reinforcement learning. This relation will allow for
studying the theoretical properties of multiagent learning in greater detail.
environment, facing unobservable actions and rewards of other agents and non-
stationarity of the environment. All these complicating properties of multiagent
systems make it hard to engineer learning algorithms capable of finding optimal
solutions.
Recent debate in the MAL community gave direction to a new research agenda
for the field [64]. An important problem of MAL that stands out is the lack of a
theoretical framework such as exists for the single-agent case. For this purpose
we employ an evolutionary game theoretic approach by formally linking and an-
alyzing the relation between RL and replicator dynamics (RD). More precisely,
in [13, 52, 84, 86] the authors derived a formal link between the replicator equa-
tions of evolutionary game theory and such reinforcement learning techniques as
Q-learning and learning automata. In particular this link showed that in the limit
these learning algorithms converge to a certain form of the RD. This makes it pos-
sible to establish equilibria using the RD that tell us what states a given learning
system will settle into over time and what intermediate states it will go through.
There are a number of benefits to exploiting this link: one, the model predicts
desired parameters to achieving Nash equilibriums with high utility; two, the in-
tuitions behind a specific learning algorithm can be theoretically analyzed and
supported by using the basins of attraction; three, it was shown how the frame-
work can easily be adapted and used to analyze new MAL algorithms, such as, for
instance, lenient Q-learning, regret minimization, etc. [38, 52].
Extension of the framework to multiple states (e.g., switching dynamics) and
continuous strategy spaces have been explored as well and can be found in, e.g.,
[30, 88, 93]. In this chapter we limit ourselves to describing the link between
stateless Q-learning (and its two variants FAQ and LFAQ) and the basic replicator
equations. More precisely we present the dynamical system of Q-learning. These
equations are derived by constructing a continuous time limit of the Q-learning
model, where Q-values are interpreted as Boltzmann probabilities for the action
selection. For reasons of simplicity we consider games between two players. The
equations for the first player are
dxi xj
= xi ατ((Ay)i − x · Ay) + xi α ∑ x j ln( ) (10.15)
dt j xi
dyi yj
= yi ατ((Bx)i − y · Bx) + yi α ∑ y j ln( ) (10.16)
dt j yi
Equations 10.15 and 10.16 express the dynamics of both Q-learners in terms
of Boltzmann probabilities. Each agent (or player) has a probability vector over
450 Chapter 10
its action set, more precisely x1 , ..., xn over action set a1 , ..., an for the first player
and y1 , ..., ym over b1 , ..., bm for the second player.
For a complete mathematical derivation and discussion of these equations we
refer to [84, 85]. Comparing (10.15) or (10.16) with the RD in (10.12), we see
that the first term of (10.15) or (10.16) is exactly the RD and thus takes care of
the selection mechanism (see [97]). The mutation mechanism for Q-learning is
therefore left in the second term, and can be rewritten as:
xi α ∑ x j ln(x j ) − ln(xi ) (10.17)
j
In equation (10.17) we recognize two entropy terms, one over the entire probabil-
ity distribution x, and one over strategy xi .
Other RL methods for which the dynamics have been derived are the follow-
ing:
Frequency Adjusted Q-learning (FAQ) [37] is a variation of the value-
based Q-learning method, which modulates the learning step size to be in-
versely proportional to the action selection probability. This modulation leads
to more rational behavior that is less susceptible to initial overestimation of
the action values. The update rule for FAQ-learning is Qi (t + 1) ← Qi (t) +
$ %
min xβi , 1 α r(t + 1) + γ max j Q j (t) − Qi (t) , where α and β are learning step
size parameters, and γ is the discount factor. The Boltzmann action-selection
−1
eQi ·τ
mechanism is used with a temperature τ: xi = Q j ·τ−1
.
∑j e
Lenient FAQ-learning (LFAQ) is a variation of FAQ-learning. Leniency
has been shown to improve convergence to the optimal solution in coordination
games [52]. Leniency is introduced by having the FAQ method collect κ rewards
for an action, before updating this action’s Q-value based on the highest perceived
reward.
Finite Action-set Learning Automata (FALA) [49] is a policy-based learn-
ing method. [49] considers the linear reward-inaction update scheme. It updates
its action selection probability based on a fraction α of the reward received. The
probability is increased for the selected action, and decreased for all other actions.
The update rules for FALA are xi (t + 1) ← xi (t) + αri (t + 1)(1 − xi (t)) if i is the
action taken at time t, and x j (t + 1) ← x j (t) − αri (t + 1)x j (t) for all actions j = i.
Regret Minimization (RM) [38] is another policy-based learning method.
It updates its policy based on the loss (regret) incurred for playing that policy,
with respect to some other policy. [38] studies the polynomial weights method,
which calculates the loss with respect to the optimal policy in hindsight. Again,
a learning step size parameter α controls the update process. The method updates
the weight of the actions by wi (t + 1) ← wi (t) (1 − αli (t + 1)). Normalization of
these weights gives the action selection probabilities.
Chapter 10 451
dxi
FALA dt = αxi [(Ay)i − xT Ay]
dxi λxi [(Ay)i −xT Ay]
RM dt = 1−λ[maxk (Ay)k −xT Ay]
Figure 10.5: Overview of the evolutionary dynamics of the studied learning meth-
ods. Only the dynamics of the first player are given; the dynamics of the second
player can be found by substituting B for A, swapping x and y, and swapping the
matrix indexes in the ui rule of LFAQ.
1 1 1
y y y
1 1 1
0.5 0.5 0.5
FAQ
0.25 0.25 0.25
0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x x x
1 1 1
1 1 1
y y y
1 1 1
0.5 0.5 0.5
LFAQ
0.25 0.25 0.25
0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
x x x
1 1 1
Figure 10.6: Policy trajectories of FAQ and LFAQ in three different games.
fitness) to find optimal solutions. The key difference though is how the reinforce-
ment signal is used to modify an individual’s behavior. In this chapter, we do not
delve into the details of this relationship.
Currently the most well-known swarm intelligence algorithms are pheromone-
based (stigmergic), such as ant colony optimization. For an overview, we refer
to [12, 23]. Recently, interest has grown in non-pheromone-based approaches,
mainly inspired by the foraging behavior of honeybees. For an overview we re-
fer to Lemmens et al. [41, 42]. Below we concisely explain the basics of both
approaches.
Figure 10.7: Ant foraging. Individual ants explore both paths. Then the shorter
path on the right gets reinforced more and becomes the dominant path.
Nodes of such a graph are called states, and in one iteration t an ant produces a so-
lution to the given problem (e.g., a route between cities in the traveling salesman
problem). At the start all edges are initialized with a certain amount of pheromone
τ0 . The probability for an ant k to move from state si to state s j at iteration t is
given by
with τi j (t) the amount of pheromone along the edge from state si to state s j at
iteration t. The parameter α controls the weight given to the pheromone part of
the equation. ηi j expresses the desirability of moving to state s j ; for instance in
the traveling salesman problem it would correspond to the visibility of the next
city, formally described by the inverse of the distance between cities (states) si
and s j , i.e., d(i,1 j) . β controls the weight given to the visibility part of the equation.
Nik is the set of unvisited nodes of ant k that are reachable from state si . The
intuition behind the rule is that ants prefer to move to nodes or states connected to
the current state by short edges with high pheromone presence.
Once all ants have found their own solution or route, pheromones are updated
on all edges according to the following equation:
Chapter 10 455
m
τi j (t + 1) = (1 − ρ)τi j (t) + ∑ Δτkij (t) (10.19)
k=1
where ρ controls evaporation of pheromone and Δτi j (t) is the amount per unit of
length of trail pheromone laid on edge (i, j) by the kth ant, i.e.,
1
Δτi j (t) = (10.20)
Lk (t)
where edge (i, j) belongs to the solution generated by ant k. Lk (t) is the length
of the solution found by ant k at iteration t. As with the biological counterpart,
shorter solutions or routes are preferred.
Figure 10.8: Path integration. By tracking all angles steered and all distances
covered (i.e., solid arrow), bees have an up-to-date vector indicating their nest at
all times (i.e., dashed arrow).
pends on four principles. First, individuals in a swarm only have local awareness.
Their view of the environment is limited to their direct surroundings and they do
not take their previous actions into account for future actions. Second, swarms
are decentralized. Although central control is possible over short distances with
a small number of individuals, the performance of such control deteriorates with
distance or increasing number of agents. Therefore, decentralized control is a ne-
cessity. Third, to implement control, social insects rely on local interaction. Local
interaction is used to solve a subproblem of the task at hand. For example, find
the shortest path in a segment in the total search space. Fourth, by using local
interaction the swarm displays self-organization on a global level.
• Replacement refers to the final step where the new individual is reinserted
into the population, usually at the expense of removing another individual.
For example, the worst individual can be removed to make room for the
new individual, or an individual may be removed based on a probability
that depends on its fitness.
1 Initialization Phase:
2 At t = 0
3 for Each policy πk where k ∈ (1, N pop ) do
4 Use policy πk for a fixed number of steps
5 Evaluate performance of πk
6 end
7 Training Phase:
8 for t < tmax do
9 Select a policy πi from population of policies:
10 with probability ε: πcurrent ← πi
11 with probability 1 − ε: πcurrent ← πbest
12 Modify the parameters of πcurrent to produce π (mutation)
13 Use policy π for a fixed number of steps
14 Evaluate performance of π
15 Replace πworst with π
16 t ← t+1
17 end
Algorithm 10.3: An evolutionary search algorithm to determine the parameters
of a policy. This general algorithm uses a population of N pop , selects the best
policy with probability 1 − ε at each step, generates a new solution by perturbing
the parameters of the selected policy, and deterministically replaces the worst
policy in the population at the end of each step.
in a manner that captures its salient features, the search will be far more effective.
In addition, it is important to select a representation where similar policies will be
“close” in parameter space, and where small changes to the parameters will not
lead to significantly different policies. Indeed, the effectiveness of the new indi-
vidual generation step is directly related to the effectiveness of the representation.
The evaluation step, on the other hand, applies the selective pressure to guide
the search. Having the appropriate evaluation function is critical to finding good
solutions, and in ensuring that policies that offer a new and useful advancement are
identified and kept in the population (this is another instance of the fundamental
credit assignment problem discussed in Section 2.2). The impact of the evaluation
function is even more critical in large coevolutionary systems where agents oper-
ate in collaboration or competition with one another. Indeed, the co-evolution step
brings the same concerns of a search in a non-stationary domain. Coevolution has
been successfully applied in many domains, including multi-rover coordination
where robots are not only evolved to learn good policies, but also to cooperate
with one another [39, 50, 51].
460 Chapter 10
7.1 Motivation
The efficient, safe, and reliable management of the ever increasing air traffic is
one of the fundamental challenges facing the aerospace industry today. On a typi-
cal day, more than 40,000 commercial flights operate within the US airspace [68],
and the scheduling allows for very little room to accommodate deviations from
the expected behavior of the system. As a consequence, the system is slow to re-
spond to developing weather or airport conditions, leading potentially minor local
delays to cascade into large regional congestions. Current air traffic management
Chapter 10 461
Of these, the regional and national flow are ideally suited for algorithmic improve-
ments as they fit between long-term planning by the FAA and the very short-term
decisions by air traffic controllers.
where Ba (z) is the delay of each aircraft caused by the agents’ controls. For con-
trols that delay aircraft (discussed in Section 7.3), the Ba (z) is simply the amount
of delay applied to that aircraft. For controls involving rerouting (Section 7.3), the
delay is the amount of additional time it takes an aircraft to go on its new route
instead of its scheduled route.
The total congestion penalty is a sum over the congestion penalties over the
sectors of observation, S:
C(z) = ∑ Cs (z) (10.23)
s∈S
where
Cs (z) = ∑ Θ(ks,t − cs )(ks,t − cs )2 , (10.24)
t
where cs is the capacity of sector s as defined by the FAA, and Θ(·) is the step func-
tion that equals 1 when its argument is greater or equal to zero, and has a value
of zero otherwise. Intuitively, Cs (z) penalizes a system state where the number of
aircraft in a sector exceeds the FAA’s official sector capacity. Each sector capacity
is computed using various metrics, which include the number of air traffic con-
trollers available. The quadratic penalty is intended to provide strong feedback to
return the number of aircraft in a sector to below the FAA-mandated capacities.
1. Their number can vary depending on need. The system can have as many
agents as required for a given situation (e.g., agents coming “live” around
an area with developing weather conditions).
2. Because fixes are stationary, collecting data and matching behavior to re-
ward is easier.
3. Because aircraft flight plans consist of fixes, agents will have the ability to
affect traffic flow patterns.
4. They can be deployed within the current air traffic routing procedures, and
can be used as tools to help air traffic controllers rather than compete with
or replace them.
The second issue that needs to be addressed is determining the action set of the
agents. Again, an obvious choice may be for fixes to “bid” on aircraft, affecting
their flight plans. Though appealing from a free flight perspective, that approach
makes the flight plans too unreliable and significantly complicates the scheduling
problem (e.g., arrival at airports and the subsequent gate assignment process).
Three key actions can be selected:
1. Miles in Trail (MIT): Agents control the distance aircraft have to keep from
each other while approaching a fix. With a higher MIT value, fewer aircraft
will be able to go through a particular fix during congested periods, because
aircraft will be slowing down to keep their spacing. Therefore setting high
MIT values can be used to reduce congestion downstream of a fix.
2. Ground Delays: An agent controls how long aircraft that will eventually go
through a fix should wait on the ground. Imposing a ground delay will cause
aircraft to arrive at a fix later. With this action, congestion can be reduced if
some agents choose ground delays and others do not, as this will spread out
the congestion. However, note that if all the agents choose the same ground
delay, then the congestion will simply happen at a later moment in time.
3. Rerouting: An agent controls the routes of aircraft going through its fix, by
diverting them to take other routes that will (in principle) avoid the conges-
tion.
The third issue that needs to be addressed is what type of learning algorithm
each agent will use. The selection of the agent and action space (as well as the
state space) directly impacts this choice. Indeed, each agent aims to select the
action that leads to the best system performance, G (given in Equation 10.21). For
delayed-reward problems, sophisticated reinforcement learning systems such as
Chapter 10 465
temporal difference may have to be used. However, due to our agent selection and
agent action set, the air traffic congestion domain modeled in this paper only needs
to utilize immediate rewards. As a consequence, a simple table-based immediate
reward reinforcement learner is used. Our reinforcement learner is equivalent to
an ε-greedy action-value learner [72]. At every episode an agent takes an action
and then receives a reward evaluating that action. After taking action a and receiv-
ing reward R an agent updates its value for action a, V (a) (which is its estimate of
the value for taking that action [72]) as follows:
where λ is the learning rate. At every time step, the agent chooses the action with
the highest table value with probability 1 − ε and chooses a random action with
probability ε. In the experiments described in this chapter, λ is equal to 0.5 and
ε is equal to 0.25. The parameters were chosen experimentally, though system
performance was not overly sensitive to these parameters.
The final issue that needs to be addressed is selecting the reward structure
for the learning agents. The first and most direct approach is to let each agent
receive the system performance as its reward. However, in many domains such
a reward structure leads to slow learning. We will therefore also set up a second
set of reward structures based on agent-specific rewards. Given that agents aim to
maximize their own rewards, a critical task is to create “good” agent rewards, or
rewards that when pursued by the agents lead to good overall system performance.
In this work we focus on “difference rewards,” which aim to provide a reward
that is both sensitive to that agent’s actions and aligned with the overall system
reward [3, 80, 102]. This difference reward is of the form [5, 78, 80, 102]
where a−i denotes the action of all agents other than i, and ci is a constant “action”
that replaces agent i’s actual action.2
In this domain, Di cannot be directly computed, as G cannot be expressed in
analytical form, and depends on sector counts obtained from a simulator. This is
a key issue with difference objectives, and various estimates have been proposed
to overcome this issue [6, 57, 77]. For example, precomputed difference rewards
or estimates based on the functional form of G have provided good results [6, 77].
Similarly, modeling G and using that model to estimate D has shown promise in
newer applications to air traffic [57]. Below, we will briefly provide results from
the first set of results.
2 This notation uses zero padding and vector addition rather than concatenation to form full
state vectors from partial state vectors.
466 Chapter 10
-3000
-4000
-5000
-6000
System Reward
-7000
-8000
-9000
-10000 D
D est
-11000 G
Monte Carlo
-12000
0 50 100 150 200 250 300 350 400 450 500
Number of Steps
-5000
-10000
-15000
System Reward
-20000
-25000
-30000
-35000 D
D est
-40000 G
Monte Carlo
-45000
10 15 20 25 30 35 40 45 50
Number of Agents
Figure 10.12: Scaling properties of agents controlling miles in trail on the con-
gestion problem, with α = 5. Number of aircraft is proportional to the number of
agents, ranging from 500 – 2,500 aircraft (note that sector capacities have been
also scaled according to the number of aircraft).
-5000
-10000
-15000
-25000
-30000
D
-35000 D est
G
Monte Carlo
-40000
0 50 100 150 200 250 300 350 400 450 500
Number of Steps
In all experiments the goal of the system is to maximize the system perfor-
mance given by G(z) with the parameters, α = 5. In all experiments to make the
agent results comparable to the Monte Carlo estimation, the best policies chosen
by the agents are used in the results. All results
√are an average of thirty indepen-
dent trials with the differences in the mean (σ/ n) shown as error bars, though in
most cases the error bars are too small to see.
We also performed experiments with real dates from the Chicago and New
York areas. We present the results from the New York area where congestion
peaks at certain times during the day. This is a difficult case, since in many situa-
tions slowing down aircraft to avoid a current congestion just adds to aircraft in a
future congestion. The results show (Figure 10.13) that agents using D were able
to overcome this challenge and significantly outperform agents using the other re-
wards. In addition agents using the estimate of D were able to perform better than
agents using G.
7.5 Summary
In this section we presented the applicability of the discussed methods to a diffi-
cult real-world domain. Indeed, the efficient, safe, and reliable management of air
traffic flow is a complex problem, requiring solutions that integrate control poli-
cies with time horizons ranging from minutes up to multiple days. The presented
results show that a multiagent learning approach provides significant benefits in
this difficult problem. The keys to success were in defining the agents (agents as
fixes), their actions (miles in trail), their learning algorithm (action-value learn-
ers), and their reward structure (difference rewards).
468 Chapter 10
8 Conclusions
Multiagent learning is a young and exciting field that has produced many interest-
ing research results and has seen a number of important developments in a rela-
tively short period of time. This chapter has introduced the basics, challenges, and
state-of-the-art of multiagent learning. Foundations of different multiagent learn-
ing paradigms have been introduced, such as reinforcement learning, evolutionary
game theory, swarm intelligence, and neuro-evolution. Moreover, examples and
pointers to the relevant literature of the different paradigms have been provided.
On top of this a real-world case study on multiagent learning, from the air traffic
control domain, has been extensively described.
Over the past years multiagent learning has seen great progress at the inter-
section of game theory and reinforcement learning due to its strong focus on this
intersection. Recently, the field also started to take a broader and more interdisci-
plinary approach to MAL, which is an important step toward efficient multiagent
learning in complex applications. As an example we have introduced and dis-
cussed the potential of evolutionary game theory, swarm intelligence, and neuro-
evolutionary approaches for MAL. Specifically, we believe that in order to be
successful in complex systems, explicit connections should be drawn between the
different paradigms.
9 Exercises
1. Level 1 Create your own MDP for a domain that you consider interesting.
You should clearly specify the states, actions, transition probabilities, and
rewards. There should be at least three states and at least three actions. Try
to make it a relatively “interesting” MDP, i.e., make it so that there are not
too many deterministic transitions, and that it is not immediately obvious
what the optimal policy is. Draw your MDP.
2. Level 1 Calculate the value V (i) for each state i in the table shown on page
469, given a discount factor of 0.9. Hint: draw a diagram first.
3. Level 1-2 The subsidy game can be described by the following situation.
There are two new standards that enable communication via different proto-
cols. The consumers and suppliers can be described by probability vectors
that show which standard is supported by which fractions. One protocol
is 20% more energy efficient, hence the government wants to support that
standard. Usually, the profit of the consumers and suppliers are directly pro-
portional to the fraction of the corresponding type that supports its standard.
However, the government decides to subsidize early adopters of the better
Chapter 10 469
protocol. Such subsidies are expensive and the government only wants to
spend as much as necessary. They have no market research information and
consider any distribution of supporters on both sides equally likely. Fur-
thermore, they know that the supporters are rational and their fractions will
change according to the replicator dynamics.
This game is a variation of the pure coordination game. A subsidy parameter
s ∈ {0, 11} is added, which can be used to make one action dominant. Figure
10.14 illustrates the game in its general form.
s1 s2
s1 10, 10 0, s
s2 s, 0 12, 12
• Identify the Nash equilibria in the game; is (are) there Pareto optimal
solution(s)?
• Which one(s) is (are) evolutionary stable? Why? What can you say
about the basins of attraction?
• What is the effect of adding the subsidy in the second instance of the
game?
470 Chapter 10
s1 s2 s1 s2
s1 10, 10 0, 0 s1 10, 10 0, 11
s2 0, 0 12, 12 s2 11, 0 12, 12
Instance 1 Instance 2
1 1
0.8 0.8
0.6 0.6
y1 y1
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x1 x1
Figure 10.16: The dynamics of the game without subsidy (left) and with subsidy
(right).
5. Level 4 Repeat the previous exercise (same games), but now perform the
comparison in mixed play. In the mixed play experiments, games are played
by heterogeneous pairs of players, meaning that both players implement dif-
ferent learning methods. Plot and compare the dynamics for the following
cases:
• Q-learning vs. FAQ-learning
• RM vs. FAQ-learning
• FAQ-learning vs. LFAQ-learning
• RM vs. LFAQ-learning
6. Level 2-3 Implement the joint action learning algorithm for stateless games
and compare its convergence behavior with (independent) Q-learning on the
matrix games shown in Figures 10.17 and 10.18 (draw clear conclusions).
b0 b1
a0 10 0
a1 0 10
a0 a1 a2
b0 11 −30 0
b1 −30 7 6
b2 0 0 5
7. Level 1-2 This question concerns the swarm intelligence paradigm. Forag-
ing is the task of locating and acquiring resources. Typically, this task has
to be performed in an unknown and possibly dynamic environment. The
problem consists of two phases. First, leaving the starting location (e.g., the
nest) in search for food. Second, returning to the starting location loaded
with food. In its simplest form (e.g., an open-field-like environment), this
is a problem domain that can be solved by a single agent or multiple, inde-
pendent agents which act in parallel. If agents want to solve the problem by
cooperation, getting to the solution of the problem becomes more complex.
472 Chapter 10
Performance can be measured in time used before all items are collected and
the number of items collected in a certain time span. Foraging can be seen as
an abstract problem with regard to more complex real-world problems such
as network routing, information retrieval, transportation, and patrolling.
• Explain the essential similarities and differences, and advantages and
disadvantages between ant and bee self-organization. Relate this to
the foraging problem.
• Would it be possible to combine the best of both worlds, i.e., of bee and
ant systems, into one new hybrid algorithm for the task of foraging? If
yes, describe how; if no, motivate your answer.
8. Level 2 Consider a congestion problem where each agent takes one of Nk
actions. Each action has a parameter bk and the number of agents that select
action k is given by xk . The total number of agents selecting the same action
as agent i is denoted by xki . The system-level objective function is:
Nk
G(z) = ∑ (xk − bk )2 (10.27)
k=1
Now, someone gives you the following agent objective functions for agent
i. Discuss each and why you would expect them to work or not work (use
factoredness and/or learnability to make your point if necessary).
(a) gi = (xki − bk )2 − (xki )2
(b) gi = (xki − bk )2 − (xki − bk − 1)2
(c) gi = (xki − bk )2 − b2k
(d) gi = ∑nk k (xk − bk )2 − ∑Nk 2
k bk
9. Level 2 Repeat the previous exercise for the case where bk = b ∀k. (That
is, the action parameters are the same for all actions.) How do your answers
change?
10. Level 2 Consider a multiagent system with N agents and the following
system-level objective function:
| ∑N
i=1 ai ei |
G(z) = , (10.28)
∑Nk=1 ak
state vector z in this case consists of the set of all the agents’ actions, or
z = {a1 , a2 , · · · , ai , · · · , aN }).
(a) Derive the difference objective for agent i given ci = 0 (i.e., the agent
removes itself from the system). Simplify your answer.
(b) Derive the difference objective for agent i given ci = 0.5 (i.e., the agent
takes half an action, an average over its possible choices). Simplify
your answer.
(c) Both difference objectives are factored by definition. What can you
deduce about their learnability? Why?
11. Level 2 Consider a 5 x10 gridworld. The agent starts at a random location
and has four actions (moves in each of the four directions). There is a reward
of 100 to catch T1, a target that starts at the bottom right square and moves
randomly by one square at each time step. In addition, there is a reward of
−2 for every time step the agent is in this gridworld.
(a) Devise a reinforcement algorithm to catch T1. Clearly state all system
parameters (inputs, states, outputs, etc.)
(b) Now, the target uses a different algorithm. Instead of moving ran-
domly, it moves in a direction opposite the agent in each time step,
if possible, and randomly if not. Can the agent still catch the target?
Explain the behavior of the system.
(c) Now, have two agents start at the same location and move in this grid.
What behavior do you observe now? Do the agents get a benefit from
each other? If so, is that direct or incidental? Do they cooperate?
(d) Suggest one simple modification to the learning that will make agents
cooperate more explicitly.
12. Level 3 Repeat the previous exercise when the agent uses a neural network
to map its states to actions directly and uses a neuro-evolutionary algorithm
to train this neural network. Can you suggest a method to coevolve the
agents so that they directly cooperate?
13. Level 3 Arthur’s El Farol bar problem [7] is a perfect example of a con-
gestion game. In this problem each agent i decides whether to attend a bar
by predicting, based on its previous experience, whether the bar will be too
crowded to be “rewarding” at that time, as quantified by a system reward
G. The congestion game structure means that if most agents think the at-
tendance will be low (and therefore choose to attend), the attendance will
474 Chapter 10
actually be high, and vice versa. Give a modified version of this problem
where the N agents pick one out of K nights to attend the bar every week.
The system reward in any particular week is [4, 102]:
K −xk (z)
G(z) ≡ ∑ xk (z)e b , (10.29)
k=1
(a) Derive a simple “local” reward for each agent in this problem (e.g.,
the agent reward should depend on information easily available to the
agent). Discuss the factoredness and learnability of the reward you
derived.
(b) Derive a difference reward for each agent. What is a good “fixed vec-
tor” ci for this case? For at least two values of ci , discuss the locality of
the information and the factoredness and learnability of the resulting
difference rewards.
(c) Perform a simulation for this problem with the following parameters:
(i) capacity of each night is 4 (b = 4), k = 6, and there are 30 agents
in the system; and (ii) capacity of each night is 4 (b = 4), k = 5, and
there are 50 agents in the system.
Plot the performance of three agent rewards (G, difference, and local) and a
histogram of sample attendance profiles. Discuss the simulation results.
where the states z and z only differ in the states of agent i, and u[x] is the
unit step function, equal to 1 if x > 0. The numerator counts the number
of states where gi (z) − gi (z ) and G(z) − G(z ) have the same sign, and the
denominator counts the total number of states. Intuitively, the degree of
factoredness gives the fraction of states in which a change in the state of
agent i has the same impact on gi and G. A high degree of factoredness
means that the agent reward gi moves in the same direction (up or down)
as the global reward G based on a change to the system state. A system in
which all the agent rewards equal G has a degree of factoredness of 1.
Chapter 10 475
Now, this definition of alignment does not take into account by how much
gi and G change, nor does it take into account which parts of the state space
are likely to be visited. Provide two improvements to the concept of fac-
toredness to allow better prediction of whether an agent reward will lead to
good system behavior:
(a) Define a new concept of reward alignment that is maximized when
large improvements to gi lead to large improvements of G.
(b) Define a new concept of reward alignment where states that are more
likely to be visited have higher weight in the computation of align-
ment, and having gi and G misaligned on states unlikely to be visited
do not impact the computation of alignment.
15. Level 4 Based on your new definition quantifying the degree of alignment
between a system reward and an agent reward, derive a new agent reward
that will lead to good system behavior. (Hint: the difference reward pre-
sented in Section 2 leads to a fully factored system; derive an agent reward
that will lead to a fully “aligned” system with your new definition of align-
ment.)
References
[1] S. Abdallah and V. R. Lesser. A multiagent reinforcement learning algorithm with
non-linear dynamics. Journal of Artificial Intelligence Research (JAIR), 33:521–
549, 2008.
[2] A. Agogino and K. Tumer. Learning indirect actions in complex domains: Action
suggestions for air traffic control. Advances in Complex Systems, 12:493–512,
2009.
[5] A. K. Agogino and K. Tumer. Efficient evaluation functions for evolving coordi-
nation. Evolutionary Computation, 16(2):257–288, 2008.
[8] A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic
programming. Artif. Intell., 72(1–2):81–138, 1995.
[11] C. Blum and M. Sampels. An ant colony optimization algorithm for shop schedul-
ing problems. Journal of Mathematical Modelling and Algorithms, 3(3):285–308,
2004.
[13] T. Borgers and R. Sarin. Learning through reinforcement and replicator dynamics.
Journal of Economic Theory, 77:115–153, 1997.
[15] M. Bowling and M. Veloso. Multiagent learning using a variable learning rate.
Artificial Intelligence, 136:215–250, 2002.
[18] K. Chellapilla and D. B. Fogel. Evolution, neural networks, games, and intelli-
gence. Proceedings of the IEEE, pages 1471–1496, September 1999.
[22] T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value func-
tion decomposition. Journal of Artificial Intelligence, 13:227–303, 2000.
[23] M. Dorigo and T. Stützle. Ant Colony Optimization. MIT Press, 2004.
[26] F. Gomez and R. Miikkulainen. Solving non-Markovian control tasks with neu-
roevolution. In Proceedings of the International Joint Conference on Artificial
Intelligence (IJCAI-99), pages 1356–1361, Stockholm, Sweden, 1999.
[27] F. Gomez and R. Miikkulainen. Active guidance for a finless rocket through neu-
roevolution. In Proceedings of the Genetic and Evolutionary Computation Confer-
ence, Chicago, Illinois, 2003.
[31] J. Hofbauer and K Sigmund. Evolutionary Games and Population Dynamics. Cam-
bridge University Press, 1998.
[32] F. Hoffmann, T.-J. Koo, and O. Shakernia. Evolutionary design of a helicopter au-
topilot. In Advances in Soft Computing - Engineering Design and Manufacturing,
Part 3: Intelligent Control, pages 201–214, 1999.
[38] T. Klos, G. J. van Ahee, and K. Tuyls. Evolutionary dynamics of regret minimiza-
tion. In ECML/PKDD (2), pages 82–96, 2010.
[40] M. Knudson and K. Tumer. Adaptive navigation for autonomous robots. Robotics
and Autonomous Systems, 59:410–420, 2011.
[46] J. Maynard-Smith and J. Price. The logic of animal conflict. Nature, 146:15–18,
1973.
[48] D. Moriarty and R. Miikkulainen. Forming neural networks through efficient and
adaptive coevolution. Evolutionary Computation, 5:373–399, 2002.
Chapter 10 479
[51] L. Panait, S. Luke, and R. P. Wiegand. Biasing coevolutionary search for op-
timal multiagent behaviors. IEEE Transactions on Evolutionary Computation,
10(6):629–645, 2006.
[53] S. Parsons and M. Wooldridge. Game theory and decision theory in multi-agent
systems. Autonomous Agents and Multi-Agent Systems, 5(12):243–254, 2002.
[54] M. Pechoucek, D. Sislak, D. Pavlicek, and M. Uller. Autonomous agents for air-
traffic deconfliction. In Proceedings of the Fifth International Joint Conference on
Autonomous Agents and Multi-Agent Systems, Hakodate, Japan, May 2006.
[56] W. B. Powell and B. van Roy. Approximate dynamic programming for high-
dimensional dynamic resource allocation problems. In J. Si, A. G. Barto, W. B.
Powell, and D. Wunsch, editors, Handbook of Learning and Approximate Dynamic
Programming. Wiley-IEEE Press, Hoboken, NJ, 2004.
[57] S. Proper and K. Tumer. Modeling difference rewards for multiagent learning
(extended abstract). In Proceedings of the Eleventh International Joint Conference
on Autonomous Agents and Multiagent Systems, Valencia, Spain, June 2012.
[59] R. Roberts, C. Pippin, and T. Balch. Learning outdoor mobile robot behaviors by
example. Journal of Field Robotics, 26(2):176–195, 2009.
[63] J. Shepherd III and K. Tumer. Robust neuro-control for a micro quadrotor. In Pro-
ceedings of the Genetic and Evolutionary Computation Conference, pages 1131–
1138, Portland, WA, July 2010.
[64] Y. Shoham, R. Powers, and T. Grenager. If multi-agent learning is the answer, what
is the question? Artif. Intell., 171(7):365–377, 2007.
[68] B. Sridhar, T. Soni, K. Sheth, and G. B. Chatterji. Aggregate flow model for air-
traffic management. Journal of Guidance, Control, and Dynamics, 29(4):992–997,
2006.
[71] P. Stone and M. Veloso. Multiagent systems: A survey from a machine learning
perspective. Autonomous Robots, 8(3):345–383, July 2000.
[77] K. Tumer and A. Agogino. Distributed agent-based air traffic flow management.
In Proceedings of the Sixth International Joint Conference on Autonomous Agents
and Multi-Agent Systems, pages 330–337, Honolulu, HI, May 2007.
[79] K. Tumer and N. Khani. Learning from actions not taken in multiagent systems.
Advances in Complex Systems, 12:455–473, 2009.
[80] K. Tumer and D. Wolpert, editors. Collectives and the Design of Complex Systems.
Springer, New York, 2004.
[81] K. Tumer and D. Wolpert. A survey of collectives. In Collectives and the Design
of Complex Systems, pages 1–42. Springer, 2004.
[83] K. Tuyls and S. Parsons. What evolutionary game theory tells us about multiagent
learning. Artificial Intelligence, 171(7):406–416, 2007.
[87] K. Tuyls and G. Weiss. Multiagent learning: Basics, challenges, and prospects. In
AI Magazine, 2012, to appear.
[88] K. Tuyls and R. Westra. Replicator dynamics in discrete and continuous strategy
spaces. In A. M. Uhrmacher and D. Weyns, editors, Agents, Simulation and Appli-
cations. Taylor and Francis, 2008.
[92] J. M. Vidal and E. H. Durfee. The moving target function problem in multi-agent
learning. In Proceedings of the Third International Conference on Multi-Agent
Systems, pages 317–324. AAAI/MIT press, July 1998.
[95] C. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University,
1989.
[98] S. Whiteson, M. E. Taylor, and P. Stone. Empirical studies in action selection for
reinforcement learning. Adaptive Behavior, 15(1), 2007.
[102] D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives.
Advances in Complex Systems, 4(2/3):265–279, 2001.
1 Introduction
Planning is important to an agent because its current and upcoming choices of
actions can intentionally establish, or accidentally undo, the conditions that later
actions depend upon to reach desirable states of the world. Hence, planning in
single-agent systems is concerned with how an agent can efficiently model and
select from alternative sequences of actions, preferably without considering ev-
ery possible sequence. In a multiagent setting, the added complication is that
decisions an agent makes about near-term actions can impact the future actions
that other agents can (or cannot) take. Similarly, knowing what actions other
agents plan to take in the future could impact an agent’s current action choices.
And, unlike single agents, multiple agents can act concurrently. Therefore an
agent’s choice of action at any given time can impact and be impacted by the ac-
tion choices of other agents at the same time. Because the space of possible joint
courses of action the agents could take grows exponentially with the number of
agents (as we will detail later), planning in a multiagent world is inherently in-
tractable, a problem that is compounded in dynamic, partially-observable, and/or
non-deterministic environments. Yet, when agents are cooperative, as we will as-
sume in this chapter, then they should strive to make decisions that collectively
over time achieve their joint objectives as effectively as possible.
486 Chapter 11
The above paragraph captures in a simplistic way the themes of this chap-
ter. By multiagent control, we refer to how agents in a multiagent system can
be provided with and utilize information to make better decisions about what to
do now so that their joint actions can further the achievement of joint objectives.
Multiagent planning focuses not just on current decisions, but also on sequences
of decisions, allowing an agent to “look ahead” so as to establish conditions for
another agent that allow it to achieve desired shared goals. From the cooperative
perspective, an agent should be willing to incur local cost if by doing so it en-
ables other agents to achieve benefits that more than offset that cost. Multiagent
execution extends and in some senses combines multiagent planning and control,
where agents both proactively plan their (inter)actions to guide the evolution of
their shared environment, but also reactively control their behaviors in response
to emergent or unlikely events.
This chapter builds on the topics of the previous chapters to describe the con-
cepts and algorithms that comprise the foundations of multiagent control, plan-
ning, and execution. We assume that the reader is already familiar with protocols
of interaction; here those protocols are used in the context of coordinating coop-
erative multiagent action. We also assume the reader is familiar with traditional
AI search techniques, planning algorithms and representations, and models for
reasoning under uncertainty. We make liberal use of the relevant concepts as we
delve into their multiagent analogues.
Chapter 11 487
izing entity (an agent, a human system designer, a group of people comprising
a standards body) will have devised and disseminated some guidelines, such as
interaction plans (aka protocols) and the rules for using them, which the agents
count upon to communicate and cooperate with each other. How and whether the
environment can itself provide the structure to allow dissimilar agents to converge
on cooperative plans for non-trivial problems, or can engender the unguided emer-
gence of languages and protocols that enable cooperation, is beyond the scope of
this chapter.
The preceding thus sets the table for our exploration of multiagent planning
and control techniques in this chapter. We begin in Section 3 with looking at the
process of creating centralized guidelines that push agents to make good control
decisions (about current actions) and/or planning decisions (about sequences of
actions). As we shall see, in some cases these guidelines can guarantee that the
control decisions or plans that each agent makes in adherence to the guidelines
must be jointly coordinated. In such cases, coordination precedes planning.
We then turn to the opposite (Section 4), when planning precedes coordination,
where the guidelines are weaker (typically, more general-purpose) and hence do
not constrain the space of joint plans much. Instead, each agent can elaborate its
own plan to achieve its assigned goals, and then these plans are coordinated by,
for example, adjusting the timing of agents’ activities to preclude interference.
Unfortunately, for a variety of interesting problems, including problems where
unexpected events can occur at runtime, (local) planning of individual actions
and (multiagent) coordination of interactions need to be done together, in an
interleaved manner. In Section 5, we look at how such multiagent sequen-
tial decision-making problems can be formulated as decentralized (partially-
observable) Markov decision processes, and describe techniques for finding opti-
mal and approximately-optimal solutions (joint plans) in such problems.
Finally, finding good joint plans is only useful if agents can successfully ex-
ecute them. Since planning is done using a model of the environment, and that
model might not correctly represent the actual environment at the time the plan is
executed, agents should monitor the progress of their plans against expectations,
and repair their joint plans in response to deviations. These ideas are familiar
(though still challenging) in the single-agent planning world; we conclude this
chapter (Section 6) describing strategies to handle similar problems in the multi-
agent setting.
tities to fill in and follow. Examples of such interaction plan templates abound
in this book. These templates can take the form of interagent protocols, defining
the possible sequences of communicative acts between agents, where the content
of these acts can be domain dependent. For example, agents solving a distributed
constraint satisfaction problem follow protocols for exchanging information about
tentative assignments of values to variables, or of no-good assignments that col-
lectively violate constraints [67]. As another example, agents solving a resource
allocation problem can work within auction mechanisms that have been designed
to cause information exchanges to converge on efficient allocations (Chapter 7).
Note that this process can recurse. If a precursor of a state to avoid leads
inexorably to the undesirable state, then the precursor state should be added to
the states to be avoided, and the algorithm should work backwards from it too.
Similarly, if there is a way to go assuredly to a sought state from its precursor, the
precursor can be added to the set of sought states.
or can be constructed from the top down, by tasking one or more organizational
designers with forming an organizational structure that the collection of agents
then adopts. Both cases involve a search over (part of) the space of designs.
To make the design process more concrete, we here summarize one approach
developed by Sims, Corkill, and Lesser [57]. A core idea is to view the design
of an organization much like the creation of a hierarchical plan: given goals and
environmental conditions, decompose the goals into component subgoals, identify
agent roles whose preconditions are met by the environment and whose expected
effects match the subgoals, and compose an organization out of the resultant roles.
Then, match agents to the roles to instantiate the organization.
More precisely, the ORGANIZATION S EARCH algorithm takes a sorted list of
candidate partial organizations, and steps through the list until the following pro-
cedure returns:
3. Then use information about agent capabilities to assign agents to the roles.
4. If all roles can be assigned, return the organizational design, else return fail-
ure, triggering the search to continue with other candidate decompositions.
The design search begins with the candidate partial plan corresponding to
the organization’s top-level goal(s), and terminates as soon as a completely in-
stantiated organizational design has been found. Because the candidate list is
kept sorted using a heuristic, the first successful returned design is adopted; even
though a better design might be possible, the costs of an exhaustive search for it
argues for heuristic termination.
The above algorithmic outline ignores a variety of details about how knowl-
edge about decompositions and agent capabilities are collected, stored, and re-
trieved, and about complications that arise from, for example, assigning a single
role to multiple agents that then themselves need to coordinate. Yet, it is funda-
mentally like a planning algorithm. As has been pointed out elsewhere, the dis-
tinction between an organizational role and an abstract plan step is blurry [21]. In
both cases, the agent is expected to dynamically elaborate the specification given
its current circumstances, with the expectation that any suitable elaboration will
fulfill the responsibilities to the rest of the organization/plan.
Chapter 11 493
Agents working in a distributed sensor network lack global awareness of the prob-
lem, and thus cooperate by forming local interpretations based on their local sen-
sor data and the tentative partial interpretations received from others, and then
sharing their own tentative partial interpretations. As a result, these agents need to
cooperate to solve their subtasks, and might formulate tentative results along the
way that turn out to be unnecessary. This style of collective problem solving has
been termed functionally accurate (it gets the answer eventually, but with possibly
many false starts) and cooperative (it requires iterative exchange) [40].
Functionally-accurate cooperation has been used extensively in distributed
problem solving for tasks such as interpretation and design, where agents only
discover the details of how their subproblem results interrelate through tentative
formulation and iterative exchange. For this method to work well, participating
agents need to treat the partial results they have formulated and received as tenta-
tive, and therefore might have to entertain and contrast several competing partial
hypotheses at once. A variety of agent architectures can support this need; in par-
ticular, blackboard architectures [15] have often been employed as semi-structured
repositories for storing multiple competing hypotheses.
Exchanging tentative partial solutions can impact completeness, precision, and
confidence. When agents can synthesize partial solutions into larger (possibly still
partial) solutions, more of the overall problem is covered by the solution. When
an agent uses a result from another to refine its own solutions, precision is in-
creased. And when an agent combines confidence measures of two (corroborat-
ing or competing) partial solutions, the confidence it has in the solutions changes.
In general, most distributed problem-solving systems assume similar representa-
tions of partial solutions (and their certainty measures), which makes combining
them straightforward, although some researchers have considered challenges in
crossing between representations, such as combining different uncertainty mea-
surements [68].
In functionally-accurate cooperation, the iterative exchange of partial results
is expected to lead, eventually, to some agent having enough information to keep
moving the overall problem solving forward. Given enough information ex-
change, therefore, the overall problem will be solved. Of course, without being
tempered by some control decisions, this style of cooperative problem solving
could incur dramatic amounts of communication overhead and wasted computa-
tion. For example, if agents share too many results, a phenomenon called dis-
traction can arise: it turns out that they can begin to all gravitate toward doing
the same problem-solving actions (synthesizing the same partial results into more
complete solutions). That is, they all begin exploring the same part of the search
space. For this reason, limiting communication is usually a good idea, as is giving
494 Chapter 11
agents some degree of skepticism in how they assimilate and react to information
from others. We address these issues next.
Organizational structuring can provide the basis for making good decisions
about where agents should direct their attention and apply their communication re-
sources. The organization defines control and communication protocols between
agents by providing messaging templates and patterns to agents that trigger appro-
priate information exchange. As a simple example, the organization can provide
an agent with simple communication rules, such that if the agent creates a lo-
cal hypothesis that matches the rule pattern (e.g., characterizes an event near a
boundary with other agents), then the agent should send that hypothesis to the
specified agents. Similarly, if an agent receives a hypothesis from another, the
organizational structure can dictate the degree to which it should believe and act
on (versus being skeptical about) the hypothesis.
Organization structures thus provide static guidelines about who is generally
interested in what results. But this ignores timing issues. When deciding whether
to send a result, an agent really wants to know whether the potential recipient
is likely to be interested in the result now (or soon). Sending a result that is
potentially useful but that turns out not to be at best clutters up the memory of
the recipient, and at worst can distract the recipient away from the useful work
that it otherwise would have done. On the other hand, refraining from sending a
result for fear of these negative consequences can lead to delays in the pursuit of
worthwhile results and even to the failure of the system to converge on reasonable
solutions at all because some links in the solution chain were broken.
When cluttering memory is not terrible and when distracting garden paths are
short, then the communication strategy can simply be to send all partial results.
On the other hand, when it is likely that an exchange of a partial result will distract
a subset of agents into redundant exploration of a part of the solution space, it is
better to refrain, and only send a partial result when the agent that generated it has
completed everything that it can do with it locally. For example, in a distributed
theorem-proving problem, an agent might work forward through a number of res-
olutions toward the sentence to prove, and might transmit the final resolvent that
it has formed when it could progress no further.
Between the extremes of sending everything and sending only locally-
complete results are a variety of gradations [22], including sending a small partial
result early on (to potentially spur the recipient into pursuing useful related results
earlier). For example, in a distributed vehicle monitoring problem, sensing agents
in neighboring regions need their maps to agree on how vehicles move from one
region to the other. Rather than waiting until it forms its own local map before
telling its neighbor, an agent can send a preliminary piece of its map near the
boundary early on, to stimulate its neighbor into forming a complementary map
Chapter 11 495
(or determining that no such map is possible and that the first agent is pursuing a
dubious interpretation path).
So far, we have concentrated on how agents decide when and with whom to
voluntarily share results. But the decision could clearly be reversed: agents could
only send results when requested. When the space of results formed is large and
only a few are really needed by others, then sending requests (or more generally,
goals) to others makes more sense. This strategy has been explored in distributed
vehicle monitoring [17], as well as in distributed theorem proving [25, 42].
It is also important to consider the delays in iterative exchange compared to a
blind inundation of information. A request followed by a reply incurs two commu-
nication delays, compared to the voluntary sharing of an unrequested result. But
sharing too many unrequested results can introduce substantial overhead. Clearly,
there is a trade-off between reducing information exchanged by iterative messag-
ing versus reducing delay in having the needed information reach its destination
by sending many messages at the same time. Sen, for example, has looked at this
in the context of distributed meeting scheduling [52]. Our experience as human
meeting schedulers tells us that finding a meeting time could involve a series of
proposals of specific times until one is acceptable, or it could involve having the
participants send all of their available times at the outset. Most typically, however,
practical considerations leave us somewhere between these extremes, sending sev-
eral well-chosen options at each iteration.
Finally, the communication strategies outlined have assumed that messages
are assured of getting through. If messages get lost, then results (or requests for
results) will not get through. But since agents do not necessarily expect mes-
sages from each other, a potential recipient will be unable to determine whether
or not messages have been lost. One solution to this is to require that messages be
acknowledged, and that an agent sending a message will periodically repeat the
message (sometimes called “murmuring”) until it gets an acknowledgment [41].
Or, a less obtrusive but more uncertain method is for the sending agent to pre-
dict how the message will affect the recipient, and to assume the message made it
through when the predicted change of behavior is observed.
things that they previously did not know, forging relationships and dependencies
that mutually benefit them, making commitments that allow each to pursue some
tasks with confidence that others will pursue related tasks, etc.
Formulating protocols is thus akin to formulating social laws or organizational
structures: given properties of desired states of the world, predefine patterns of
actions that if jointly followed will bring them about. In fact, the creation of
choreographed service-oriented computing frameworks can treat the problem of
composing and sequencing services as a planning problem (Chapter 3).
For the remainder of this section, however, we focus not on where protocols
come from, but rather on how they can serve to control the interactions of agents
toward a particular outcome. We will illustrate the ideas by examining one of the
very first multiagent protocols, the contract-net protocol, and one of its first ap-
plications, which is to establish a distributed sensor network [18]. In distributed
sensor network establishment (DSNE), roles (areas of sensing responsibility, re-
sponsibilities for integrating partial interpretations into more complete ones) need
to be assigned to agents, where the population of agents might be initially un-
known or dynamically changing. Thus, the purpose of the protocol is to exchange
information in a structured way to converge on assignments of roles to particular
agents.
At the outset, it is assumed that a particular agent is given the task of monitor-
ing a wide geographic area. This agent has expertise in how to perform the overall
task, but is incapable of sensing all of the area from its own locality. Therefore,
the first step is that an agent recognizes that to perform its task better (or at all) it
should enlist the help of other agents. As a consequence, it then needs to create
subtasks to offload to other agents. In the DSNE problem, it can use its repre-
sentation of the structure of the task to identify that it needs sensing done (and
sensed data returned) from remote areas. Given this decomposition, it then uses
the protocol to match these sensing subtasks with available agents.
The agent announces a request for bids for subtask. The important aspects
of the announcement for our purposes here are the eligibility specification, the
task abstraction, and the bid specification. (Attributes of message structures are
described more fully in Chapter 3.) To be eligible for this task requires that the
bidding agent have a sensor position within the required sensing area and that
it have the desired sensing capabilities. Agents that meet these requirements can
then analyze the task abstraction (what, at an abstract level, is the task being asked
of the bidders?) and can determine the degree to which it is willing and able to
perform the task. An eligible agent can then bid on the task, where the content of
a bid is dictated by the bid specification.
The agent with the task receives back zero or more bids. If it gets no bids,
then it can give up, try again (since the population of agents might be changing),
Chapter 11 497
1. Each agent builds its local plan as if it were alone in the environment.
3. For problems that can arise, agents inject additional constraints into their
plans (typically, over the timing of their actions relative to each other) to
prevent such problems.
4. If all problems are prevented, then the agents are done. Otherwise, if some
problems cannot be prevented, then one or more agents develops an alterna-
tive local plan, and the coordination problem repeats with the new portfolio
of agent plans.
step is a fully grounded (or variable free) instance of an operator from the agent’s
set of operators. An operator a in this representation has a set of preconditions
(pre(a)) and postconditions (post(a)), where each condition c ∈ pre(a) ∪ post(a)
is a positive or negative (negated) first-order literal. The set pre(a) represents the
set of preconditions that must hold for the agent to carry out operator a, and the
set post(a) represents the postconditions, or effects, of executing the operator on
an agent’s world state.
A standard formulation of a single-agent plan is a partial-order, causal-link
(POCL) plan. POCL plans capture temporal and causal relations between steps in
the partial-order plan. The definition of a POCL plan here is based on Bäckström
[2], though it follows common conventions in the POCL planning community [63]
to include special steps representing the initial and goal states of the plan.
s j , sk
to ≺T . That is, order the threatening step either before or after the link. An
open precondition c of a plan step s j ∈ S can be satisfied by adding a causal link
si , s j , c
where step si ∈ S establishes the needed condition (and si is not ordered
after s j ). If this requires that a new step si is added to S, then its preconditions
become new open preconditions in the plan.
As a simple example, consider the blocks world situation portrayed in Fig-
ure 11.1. Say Agent1 has a goal of achieving a state where block A is on block B.
Using the POCL algorithm, it creates the plan shown in Figure 11.2, where actions
and their parameters are given in the squares, their preconditions (postconditions)
are given to the left (right) of the square, causal links are the narrow black arrows,
and ordering constraints are the wide gray arrows. Notice that the plan is partially
ordered, in that before block A can be stacked on B, both blocks must be cleared,
but they can be cleared in either order. Finally, in Figure 11.3 is a plan for Agent2
to stack block B on C. Neither agent cares where block D ends up.
On(C,A) On(C,T)
Cl(C)
Move ~On(C,A)
Cl(T) (C,A,T) Cl(A)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(A,B)
On(B,T) Move ~On(A,T) On(A,B) Finish
Start On(B,T)
~Cl(B)
Cl(B)
Cl(C) Cl(A) (A T B)
(A,T,B)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(B,C)
On(B,T) Move ~On(B,T) On(B,C) Finish
Start On(B,T)
Cl(C)
~Cl(C)
Cl(C) Cl(C) (B T C)
(B,T,C)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
Definition 11.3 A parallel-step conflict exists in a parallel plan when there are
steps s j and si where post(si ) is inconsistent with post(s j ), s j ⊀T si , si ⊀ s j and
si , s j
∈
/ #.
However, unlike open conditions and causal-link conflicts, parallel-step con-
flicts can always be resolved, no matter what other flaw resolution choices are
made. Recall that to repair a parallel-step conflict between steps si and s j , we
need only ensure that the steps are non-concurrent, either by adding si ≺T s j or
s j ≺T si to the plan. Given an acyclic plan P, there will always be at least one way
of ordering every pair of steps in the plan such that the plan P remains acyclic.
This can be trivially shown by considering the four possible existing orderings of
any pair of steps si and s j in plan P. First, si and s j could be unordered. In this
case, we can add either si ≺T s j or s j ≺T si to the plan without introducing cycles
in the network of steps. Second, si ≺T s j is in the plan, in which case the parallel-
step conflict has already been resolved. The same is true when s j ≺T si is in the
plan. Finally, si ≺T s j and s j ≺T si could both hold in the plan, but in this case the
plan already has a cycle, and so repairing the parallel-step conflict becomes moot.
The parallel plan model captures the idea of concurrency, but it is not rich
enough to describe the characteristics of a multiagent plan, in which we also need
to represent the agents involved, and to which actions they are assigned. To do so,
we extend the definition of a parallel plan to a multiagent parallel POCL plan.
Now, just as in the single-agent case, the multiagent planning problem can
be solved by making an inconsistent multiagent parallel POCL plan consistent,
where each plan step is assigned to an agent capable of executing the step. More
formally, the multiagent plan coordination problem is the problem, given a set
of agents A and the set of their associated POCL plans P, of finding a consistent
and optimal multiagent parallel POCL plan M composed entirely of steps drawn
from P (in which agents are only assigned to steps that originate from their own
individual plans) that results in the establishment of all agents’ goals, given the
collective initial state of the agents. The MPCP can thus be seen as a restricted
form of the more general multiagent planning problem in which new actions are
not allowed to be added to any agent’s plan.
This definition of the MPCP imposes a set of restrictions on the kinds of multi-
agent plan coordination problems that can be represented. Because an agent can
only be assigned steps that originated in its individual plan, this definition does
not model coordination problems where agents would have to reallocate their ac-
tivities. Further, because only individually-planned steps are considered, the def-
inition does not capture problems where additional action choices are available if
agents work together; that is, an agent when planning individually will not con-
sider an action that requires participation of one or more other agents. Finally, in
keeping within the “classical” planning realm, the definition inherits its associated
limitations, such as assuming a closed world with deterministic actions where the
initial state is fully observable.
For any given multiagent parallel plan, there may be many possible consistent
plans one could create by repairing the various plan flaws. However, not all
consistent plans will be optimal. Based on the assumptions outlined previously
concerning the nature of the multiagent plan coordination problem (namely, that
the final plan must be assembled solely from the original agents’ plans), an op-
timal multiagent plan will be one that minimizes the total cost of the multiagent
plan:
Definition 11.5 Total step cost measures the cost of a multiagent parallel plan by
the aggregate costs of the steps in the plan.
This simple, global optimality definition is not the only one that could be used
for the MPCP, but correlates to the most widely-adopted single-agent optimality
criterion. Other relevant definitions include ones minimizing the time the agents
take to execute their plans (exploiting parallelism), maximizing the load balance
of the activities of the agents, or some weighted combination of various factors.
Chapter 11 505
On(C,A) On(C,T)
Cl(C)
Move ~On(C,A)
Cl(T) (C,A,T) Cl(A)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(A,B)
On(B,T) Move ~On(A,T) On(A,B) Finish
Start On(B,T)
~Cl(B)
Cl(C) Cl(A) (A,T,B)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(B,C)
On(B,T) Move ~On(B,T) On(B,C) Finish
Start On(B,T)
~Cl(C)
Cl(C) Cl(C) (B,T,C)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
As illustrated in Figure 11.4, an initial multiagent parallel plan can simply be the
union of the individual plan structures of the agents, and thus might contain flaws
due to potential interactions between the individual plans. The initial multiagent
plan can then be incrementally modified as needed (by both asserting new coordi-
nation decisions and retracting the individual planning decisions of the agents) to
resolve the flaws. We call this approach coordination by plan modification.
From the initial (as yet uncoordinated) multiagent plan, plan coordination
takes place by repairing any flaws due to interactions between the plans. The
types of flaws are exactly the same as in parallel POCL plans: open precondi-
tions, causal-link threats, and parallel-step flaws. Assuming each of the individual
plans are consistent, there should be no open precondition flaws to resolve, at least
to begin with. Causal link threats within each agent’s plan should not exist, but
new threats arise when an action in one agent’s plan threatens a link in another
506 Chapter 11
On(C,A) On(C,T)
Cl(C)
Move ~On(C,A)
Cl(T) (C,A,T) Cl(A)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(A,B)
On(B,T) Move ~On(A,T) On(A,B) Finish
Start On(B,T)
~Cl(B)
Cl(C) Cl(A) (A,T,B)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(B,C)
On(B,T) Move ~On(B,T) On(B,C) Finish
Start On(B,T)
~Cl(C)
Cl(C) Cl(C) (B,T,C)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
agent’s plan. In the running example (Figure 11.4), Agent1’s Move(A,T,B) step
results in block B no longer being clear (¬Cl(B)), which threatens the causal link
between Agent2’s Move(D,B,T) and Move(B,T,C) steps. The flaw can be resolved
by adding to the ordering constraints that Move(A,T,B) come after Move(B,T,C),
as shown in Figure 11.5. Similarly, as before, parallel-step flaws can also be han-
dled by adding in ordering constraints, though the running example has no such
flaws.
The multiagent parallel POCL plan in Figure 11.5 could still be considered
flawed, however, because Agent1 and Agent2 both are planning on moving block
D from block B to the table. In the best case, one of these agents would execute the
action, and then the other before attempting its action would recognize that it can
simply skip the action because the desired effects are done. However, some plan
execution systems would treat the situation as a deviation from expectations and
attempt to repair the plan by inserting actions to (re)establish the conditions that
the step expected. In other words, the second agent to execute the Move(D,B,T)
Chapter 11 507
action might put block D back onto block B just so that it can move it off. This
is obviously wasteful of time and energy. And, even worse, if the two agents
were to attempt their Move(D,B,T) actions at about the same time, their effectors
(grippers) might collide, and the agents might disable themselves!
The MPCP thus introduces a new type of flaw that affects the correctness, or at
least the optimality, of the multiagent plan. Specifically, a step from some agent’s
plan could be redundant given the presence of steps in others’ plans. Note that
redundancy does not require that the agents seek the same effect. For example, if
Agent1 had included action Move(D,B,T) to achieve On(D,T), while Agent2 had
planned that action to achieve Cl(B), the action taken by one agent has the side
effect of satisfying the other. In such cases, redundant steps may be able to be
removed without introducing new open precondition flaws.
allel plan, and by initializing the current best solution, Solution, to null. Then,
while the queue is not empty, it selects a multiagent plan M with the lowest total
step cost from the queue.
A bounding test is applied to M to determine whether it is possible for M to
have a lower total step cost than the best Solution found so far (if any). A lower
bound on total step cost is computed for plan M by working backwards from the
goal steps to find all plan steps that contribute flagged causal links (as will shortly
be explained) to achieving the goal. If the lower bound of M is below the cost of
Solution, the algorithm proceeds.
PMA next applies a SolutionTest to M, to derive a consistent solution from M,
where any flaws in M other than step redundancy flaws are iteratively resolved by
adding ordering constraints. The SolutionTest thus conducts a depth-first search
through the space of flaw resolutions to find a consistent solution, heuristically
ordering the search to prioritize flaws for which there are fewer alternative resolu-
tions, which is a minimum remaining values (MRV) heuristic [50]. If the consis-
tent solution from M has a lower total step cost than Solution, it replaces Solution.
The PMA algorithm then selects and adjusts a non-flagged causal link in
M. In Figure 11.5, consider the causal link into condition Cl(B) for Agent1’s
Move(A,T,B) step. That causal link could originate from either Agent1’s
Move(D,T,B) step (as it does in the figure) or from Agent2’s Move(D,T,B) step.
The PMA makes copies of M that differ only in this refinement of the source of
Chapter 11 509
On(C,A) On(C,T)
Cl(C)
Move ~On(C,A)
Cl(T) (C,A,T) Cl(A)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(A,B)
On(B,T) Move ~On(A,T) On(A,B) Finish
Start On(B,T)
~Cl(B)
Cl(C) Cl(A) (A,T,B)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T)
On(C,A)
On(D,B)
On(A,T) On(A,T) On(B,C)
On(B,T) Move ~On(B,T) On(B,C) Finish
Start On(B,T)
~Cl(C)
Cl(C) Cl(C) (B,T,C)
Cl(D) Cl(B)
~Cl(A)
~Cl(B)
Cl(T) On(D,B) On(D,T)
Cl(D)
Move ~On(D,B)
Cl(T) (D,B,T) Cl(B)
the causal link, and in each “flags” the causal link so that it is not branched on
again, because each of the refinements is added to the queue and might later be
further modified. If in a refinement of M the redirection of causal links results in
a plan step having no outgoing causal links, then that plan step is removed. When
that refinement is enqueued, it will move forward in the queue since its total step
cost will be lower.
The process of dequeuing, testing, and refining continues until the queue is
empty, at which point the current value of Solution represents the lowest cost
multiagent parallel plan derivable from the input plans. PMA then resolves any
parallel-step conflicts in Solution (which, as was previously noted, must be re-
solvable), and returns Solution. In our simple example, the input from Figure 11.4
will lead to two possible solutions, one of which is shown in Figure 11.6 and the
other where the Move(D,B,T) action is instead done by Agent1. These both have
the same number of steps. If the total step cost function also considers parallel
activity, the solution shown would be preferred.
510 Chapter 11
(a) Agents form a total order. Top agent is the current superior.
(b) Current superior sends down its plan to the others.
(c) Other agents change their plans to work properly with those of the current
superior. Before confirming with the current superior, an agent also
double-checks that its plan changes do not conflict with previous superiors.
(d) Once no further changes are needed among the plans of the inferior agents,
the current superior becomes a previous superior and the next agent in the
total order becomes the superior. Return to step (b). If there is no next agent,
then the protocol terminates and the agents have coordinated their plans.
sions at the primitive level. Thus, agents might communicate and coordinate at
an abstract planning level. This not only can have computational benefits (fewer
combinations of joint steps to reason about), but also can have flexibility benefits
at execution time. For instance, in our example of robots in a shared workspace, if
robots only coordinate at the level of entering and leaving rooms, then each robot
retains flexibility to change its planned movements within a room without needing
to renegotiate with the other. However, coordinating at an abstract level tends to
lead to less efficient joint plans (e.g., a robot idling waiting for another to exit a
room rather than carefully jointly working within the room). Further, to antici-
pate potential primitive interactions at abstract levels means that agents need to
summarize for abstract steps the repercussions of the alternative refinements that
might be made [11]. Algorithm 11.2 summarizes a simple algorithm for solving
the multiagent plan coordination problem for agents that have hierarchical plans.
512 Chapter 11
a1
1
o1
a2 World r Reward
2 o2
doors. Behind one door is a tiger and behind the other is a large treasure. Each
agent may open one of the doors or listen. If either agent opens the door with
the tiger behind it, a large penalty is given. If the door with the treasure behind it
is opened and the tiger door is not, a reward is given. If both agents choose the
same action (e.g., both open the same door), a larger positive reward or a smaller
penalty is given to reward this cooperation. If an agent listens, a small penalty is
given and an observation is seen that is a noisy indication of which door the tiger
is behind. Once a door is opened, the game resumes from its initial state with the
tiger, and treasure’s locations randomly reshuffled.
This class of problems has raised several questions about the feasibility of
decision-theoretic planning in multiagent settings: Are DEC-POMDPs signifi-
cantly harder to solve than POMDPs? What features of the problem domain af-
fect the complexity, and how? Is optimal dynamic programming possible? Can
dynamic programming be made practical? Can locality of agent interaction be
exploited to improve algorithm scalability? Research in recent years has signifi-
cantly increased the understanding of these issues and produced a solid foundation
for multiagent planning in stochastic environments. We describe below some of
the key results and lessons learned.
L L L
hl hr hl hr
L L
L L
hl hr hl hr
OR L L OL
L
hl hr
L L
hl hr hl hr
L L L L
hl hr hl hr hl hr hl hr
OR L L L L L L OL
Figure 11.9: Optimal policy trees for the multiagent tiger problem with horizons
1–4. The policy trees of both agents are the same in this case. The expected values
as a function of the horizon are: V1 = −2, V2 = −4, V3 = 5.19, V4 = 4.80.
L
hl hr
L L
hl hr hl hr
L L L L
hl hr hl hr hl hr hl hr
L L L L L L L L
hl hr hl hr hl hr hl hr hl hr hl hr hl hr hl hr
OR OR OR L OR L L OL OR L L OL L OL OL OL
Figure 11.10: Optimal policy tree for the multiagent tiger problem with horizon
5. The expected value in this case is V5 = 7.03.
memory state of the agent. Starting with an initial controller state, at each state
an agent chooses an action based on its internal state and then branches to a new
internal state based on the observation received. Both the action selection and
controller transitions could be deterministic or stochastic; higher value could be
obtained using stochastic mappings, but the optimization problem is harder.
Figure 11.11 shows optimal deterministic controllers for the infinite-horizon
multiagent tiger problem with a discount factor of 0.9. The large arrow points
to the initial state. The figure shows a one-node controller (same controller per
Chapter 11 517
L L L
hr, hl
hr, hl hr, hl
Figure 11.11: Optimal one-node and two-node deterministic controllers for the
multiagent tiger problem.
hr hl
L L OR L L OL
hl hl hr hr
hr hl
agent) with an expected value of −20, and a two-node controller, which repre-
sents the same policy and has the same value. With so little memory, the optimal
policy is to listen all the time and not risk opening a door. Figure 11.12 shows an
optimal deterministic solution with three-node controllers. The value in this case
is −14.12 and the policies of the agents are different in this case. Figure 11.13
shows an optimal deterministic solution with four-node controllers. The value in
this case is −3.66 and the policies of the agents are again different.
Figure 11.14 shows stochastic two-node controllers for this problem. The
large arrow points to the initial state. Each state leads to multiple actions shown in
rectangles with the probability of the action attached to the link. Each observation
then leads to a stochastic transition to one of the two states. The value in this case
is −19.30, a slight improvement over the two-node deterministic controller. A
three-node stochastic controller (not shown) can achieve a value of −9.94.
Formally, these solutions assign a local policy to each agent i, δi , which is a
mapping from local histories of observations or internal memory states to actions.
A joint policy, δ = δ1 , ..., δn
, is a tuple of local policies, one for each agent.
For a finite-horizon problem with T steps, the value of a joint policy δ with
initial state s0 is
T −1 !
V δ (s0 ) = E ∑ R(at , st )|s0 , δ .
t=0
hr hr hl hl
L L L OR L hr
L hr
L hr
OL
hl hl hl
hr hl
Agent 1 Agent 2
hr, hl hr, hl
OR OL
hr 1.0
hl 1.0
1.0 1.0
0.125 L 0.125 L
0.875 hl 0.875 hr
1.0 1.0
1.0 1.0
1.0 1.0
0.875 0.875
0.125 0.125
hr hl
hl hr
L L
largely independent agents. They move and take actions using their private actua-
tors, which are often totally independent of the actions of other robots operating in
the environment. This property is called transition independence. Another useful
property that is sometimes satisfied is observation independence – guaranteeing
that the observations of one agent depend only on a component of the underlying
state that is not affected by the actions of the other agents. Analysis of the problem
shows that these assumptions could lead to a problem of lower complexity. For
example, a DEC-MDP that satisfies transition and observation independence can
be solved in exponential time [30].
where a are the actions at the roots of trees q and qo are the subtrees of q after
obtaining observations o.
In multiagent settings, agents have to reason about the possible future policies
of the other agents in order to choose optimal actions. The standard belief-state
of a POMDP – a probability distribution over world states – is no longer suitable.
Instead, it is necessary to use a multiagent belief state, which is a probability
distribution over system states and policies of all other agents: bi ∈ Δ(S × Q−i ).
Other forms of multiagent belief states could be used to capture the uncertainty
about the beliefs or intentions of other agents, but the above form of belief state,
also called generalized belief state, is the one used in this chapter because it is
most commonly used in existing algorithms.
One of the early class of algorithms for solving DEC-POMDPs is the “Joint
Equilibrium-Based Search for Policies” (JESP), which seeks to find a joint policy
that is locally optimal. That is, the solution cannot be improved by any single
520 Chapter 11
agent, given the policies assigned to the other agents. The best algorithm in this
class, DP-JESP, incorporates three key ideas [44]. First, the policy of each agent
is modified while keeping the policies of the others fixed. Second, dynamic pro-
gramming is used to iteratively construct policies. Third, and most notably, only
reachable belief states of the DEC-POMDP are considered for policy construction.
This leads to a significant improvement, because there is only an exponential num-
ber of different belief states for one agent as opposed to the doubly exponential
number of possible joint policies. Algorithm 11.3 summarizes the operation of
DP-JESP [44]. This approach only guarantees local optimality and still leads to
exponential complexity due to the exponential number of possible belief points.
The algorithm could solve small benchmark problems up to horizon 7.
An exact dynamic programming (ExactDP) algorithm for solving DEC-
POMDPs has been developed as well [32]. In every iteration, this algorithm first
exhaustively backups the policy trees of the previous iteration, then prunes all the
dominated policies. A policy of an agent is dominated by another policy when the
value of every complete policy that includes it as a subtree can be improved (or
Chapter 11 521
preserved) by replacing it with the other policy. This property must be satisfied
for every set of policies of the other agents and every belief state. It is clear that
dominated policies are not needed to construct an optimal solution. Dominance
can be tested efficiently using a linear program. The algorithm can solve partially-
observable stochastic games with minimal changes, as it can produce all the non-
dominated policies for each agent. This process, summarized in Algorithm 11.4,
is the first DP algorithm that could produce a globally optimal solution of a DEC-
POMDP. Unsurprisingly, this approach runs out of memory very quickly because
the number of possible (non-dominated) joint policies grows doubly exponentially
over the horizon. Even with very significant pruning, the algorithm can only solve
small benchmark problems with horizons 4–5. But it introduces important prun-
ing principles that prove useful in designing effective approximations.
Several improvements of the ExactDP algorithm have been proposed. Since
some regions of the belief space are not reachable in many domains, the point-
based DP (PBDP) algorithm computes policies only for the subset of reachable
belief states [58]. Unlike DP-JESP, PBDP generates a full set of current-step
policies and identifies the reachable beliefs by enumerating all possible top-down
histories. This guarantees optimality with a somewhat more aggressive pruning.
The worst-case complexity is thus doubly exponential due to the large number of
possible policies and histories.
While the above algorithms introduced important ideas, it became clear that
to improve scalability, it is necessary to perform more aggressive pruning and
limit the amount of memory used by solution methods. The memory-bounded
DP (MBDP) algorithm presented a new paradigm that allowed the algorithm to
have linear time and space complexity with respect to the problem horizon [53].
MBDP, shown in Algorithm 11.5, employs top-down heuristics to identify the
most useful belief states and keeps only a fixed number of policies selected based
on these belief states. The number of policies maintained per agent is a constant
called maxTrees. To assure linear space, however, it is not sufficient to limit the
number of policies per agent, because the size of each policy tree grows expo-
522 Chapter 11
nentially with the horizon. To address that, MBDP deploys an efficient method
to reuse subpolicies in a given policy tree. At each level, the new branches of the
tree point to one of the maxTrees policies of the previous level, as illustrated in
Figure 11.15. This memory-bounded policy representation enables the algorithm
to solve much larger problems with essentially arbitrary horizons. A number of al-
gorithmic improvements in MBDP and its variants have made it possible in recent
years to solve effectively larger problems using dozens of maxTrees per level.
the state of the controller changes based on the observation sequence of the agent,
and in turn the agent’s actions are based on the state of its controller. When these
functions are deterministic, optimizing controllers can be tackled using standard
heuristic search methods. When these mappings are stochastic, the search for
optimal controllers becomes a continuous optimization problem. We describe in
this section two approaches for optimizing such stochastic controllers.
While agents do not have access to the local information of other agents in a
DEC-POMDP, they can benefit from performing correlated actions. This can be
achieved using a correlation device – a mechanism that can facilitate coordina-
tion using an additional finite-state controller whose state is accessible by all the
agents [6]. The correlation device mimics a random process that is independent
of the controlled system. Agents use the extra signal from the device to select
actions, but they cannot control the correlation device. Such mechanism can be
implemented in practice by giving each agent the same stream of random bits.
The definition of a local controller can be extended to consider the shared sig-
nal c provided by a correlation device. The local controller for agent i becomes
a conditional distribution of the form P(ai , qi |c, qi , oi ). A correlation device to-
gether with the local controllers for each agent form a joint conditional distribu-
tion P(c ,a,q |c,q,o), called a correlated joint controller.
524 Chapter 11
q1 q1 q1 q1
a1 o1 a1 o1
s s qc qc s s
a2 o2 a2 o2
q2 q2 q2 q2
V (s,q , c) =
!
∑ P(a|c,q) R(s,a) + γ
∑ P(s
,o|s,a)P(q |c,q,a,o)P(c |c)V (s ,q , c )
a s ,o,q ,c
on these principles by simply restricting the size of each controller and optimizing
value with a bounded amount of memory. We discuss one such approach below.
trollers given an initial state distribution and the DEC-POMDP model. The vari-
ables for this problem are the action selection and node transition probabilities for
each node of each agent’s controller as well as the joint value of a set of controller
nodes. Hence, these variables are for each agent i, P(ai |qi ) and P(qi |qi , ai , oi ), and
for the set of agents and any state, V (q, s). The NLP objective is to maximize the
value of the initial set of nodes at the initial state distribution. The constraints in-
clude the Bellman constraints and additional probability constraints. The Bellman
constraints, which are non-linear, ensure that the values are correct given the ac-
tion and node transition probabilities. The probability constraints ensure that the
action and node transition values are proper probabilities. It is straightforward to
add a correlation device to the NLP formulation simply by adding a new variable
for the transition function of the correlation device. As expected, a correlation
device can improve the value achieved by the NLP approach, particularly when
each controller is small [1].
6 Multiagent Execution
We conclude this chapter by turning to the actual execution of multiagent planning
and control decisions. To the extent that the knowledge used for decision making
was correct, the actual trajectory of the world state should mimic the expectations
of the agents. Even in cases where actions or observations are uncertain, as is
modeled in DEC-POMDPs, the planning and control decisions should have antic-
ipated the possibilities and formulated responses to the foreseen contingencies.
Of course, the rosy picture above can fail to materialize, when the model of
the world used by agents for making planning and control decisions is incorrect
or incomplete relative to the agents’ true world. In such situations, agents can
find themselves in unanticipated states, and need to decide how to respond in the
near-term, and perhaps also how to update their models so as to make better plan-
ning and control decisions in the future. In this section, we can only scratch the
surface of the challenges posed in multiagent execution, and of some approaches
to overcoming them.
ticipated trajectory. Or the agent could treat its new beliefs about its current state
as the starting point for building a new plan to achieve its objectives. See [50] for
techniques of this kind.
In a multiagent system, certainly the same kind of process could occur, but
now recovery is harder: an agent cannot in general inject new actions or replace
its old plan with a new one without coordinating with other agents, resolving any
new interagent faults that its changed plan introduces. Furthermore, it could be
that a better way of recovering from such a deviation would be to have one or more
other agents change their plans, even though their old plans were proceeding as
expected.
Even more problematic are situations where no agent in isolation perceives a
deviation from expectations (each foresaw that its current state might have arisen)
but the agents collectively have reached an unexpected joint state. Detecting such
a deviation requires not only that agents share local information, but that one or
more agents are knowledgeable about which (partial) joint states are expected and
which are not. Effectively, monitoring the execution of a multiagent plan can
amount to a non-trivial collaborative problem-solving effort among the agents.
A simple example of such a response can occur in the case of social laws.
The designer(s) of the laws used the models of goals/rewards to identify states
to avoid, and of actions to identify and prohibit bad precursor state-and-action
combinations. If a state to avoid is reached nonetheless, the agents can update
their transition models to build better plans/laws in the future, or might perform
Q-learning to directly learn what actions not to take in particular states.
A danger in doing such learning, as discussed in Chapter 10, is that if mul-
tiple agents learn simultaneously, their local adaptations might not combine into
a coherent joint adaptation. For example, if mobile robotic agents collide in a
particular location, there is a danger that they all might now avoid that location,
which would make deliveries to that location impossible. Techniques for con-
trolling learning and adaptation in multiagent systems (see Chapter 10) is thus
pertinent in this context.
disappear at boundaries, agents can help each other by providing partial interpre-
tations near their boundaries to others, to help others focus interpretation problem
solving on compatible extensions.
Of course, over time new sensor signals can arrive, or data might prove more
noisy and take longer to process than expected. As a result, an agent might change
its plan. If others know of this change, then they might in turn change their plans,
possibly leading to chain reactions (and even cycles) of plan changes. At times
these efforts can lead to significant improvements in joint behavior, but at other
times the improvements might be smaller than the communication and computa-
tion overhead of attaining them. PGP combats the costs and delays in coordinating
responses to such dynamics with various mechanisms, including:
the joint plan should change. A key assumption behind this strategy is that
all agents, given the same information, would formulate the same revised
partial global plan. Hence, unilateral changes to a plan in expectation that
other agents will make complementary changes to their plans is warranted
given sufficient propagation of planning information based on the MLO.
whether an agent should inform others about such changes [3]. Another was for
agents to explicitly model the probabilities of satisfying their contributions to joint
objectives, and updating each other as these probabilities decreased significantly
[43].
7 Conclusions
Multiagent planning and control with more agents, capable of more behaviors, op-
erating in uncertain and partially-observable worlds, introduces and compounds
daunting computational challenges. Research has sought to exploit structure in
problems that allow solutions to be composed from solutions to localized sub-
problems, and this chapter has illustrated various strategies for different types of
problem structures and performance requirements. Significant progress has been
made, and yet substantial challenges remain. We conclude by summarizing other
important past and ongoing work in this area.
Multiagent planning has been studied since the founding of the field of dis-
tributed AI. Some of the earliest work in this area includes that of Georgeff
[26, 27], who developed some of the earliest multiagent plan deconfliction tech-
niques, and of Corkill [14], who developed a distributed version of the NOAH
planner created by Sacerdoti [51]. Corkill and colleagues, especially Lesser, pi-
oneered the use of organizational techniques for multiagent control [16]. Decker
and Lesser generalized techniques for coordinating agent plans in their work on
GPGP, and for representing complex multiagent task networks in their work on
TAEMS [39].
Planning for teams of agents was investigated not only by Tambe [49, 59]
(Section 4.3), but also by Grosz and Kraus [31], building on concepts from Co-
hen and Levesque [12]. Multiagent planning and scheduling, involving dealing
with temporal constraints, also has a rich literature (e.g., [8, 61]). Other work for
coordinating plans that agents largely form separately includes that of Tonino et
al. [60].
Other techniques that have formulated the multiagent planning problem in
decision-theoretic terms include those that solve problems where agents inter-
act through the assignment (and reassignment) of resources [19, 66], and where
agents interact by changing shared state in structured ways that enable each other
[4, 64].
Memory-bounded dynamic programming (MBDP) [53] has been dramatically
improved in recent years by introducing a variety of methods to reduce the number
of observations considered by the policy, and employing efficient pruning tech-
niques. Point-based methods have been recently introduced to cope with the NP-
hardness of the backup operation [37]. This algorithm exploits recent advances
Chapter 11 533
in the weighted CSP literature to offer a polytime approximation scheme that can
handle a much larger number of belief points (MaxTrees). Another technique,
trial-based dynamic programming (TBDP) [65], combines the main advantages of
DP-JESP with MBDP to avoid the expensive backup operations, allowing prob-
lems with much larger state spaces to be tackled.
The locality of agent interaction – the fact that each agent interacts with a small
number of neighboring agents – has proved crucial to the development of DEC-
POMDP algorithms that can handle dozens of agents. Specialized models such
as network distributed POMDPs (ND-POMDPs) have been introduced to capture
structured interactions and develop early algorithms that can exploit such struc-
tures [45]. More recently, the constrained-based dynamic programming (CBDP)
algorithm has been shown to provide magnitudes of speedup thanks to its linear
complexity in the number of agents [36]. Algorithms for solving loosely-coupled
infinite-horizon problems have also been developed. One promising direction is
based on transforming the policy optimization problem to that of likelihood maxi-
mization in a mixture of dynamic Bayesian networks [38]. Based on this reformu-
lation, the expectation-maximization (EM) algorithm has been used to compute
the policy via a simple message-passing paradigm guided by the agent interaction
graph.
8 Exercises
1. Level 1 Analyze a multiperson problem that you have been involved in solv-
ing. Identify localities in the structure of the problem, and strategies for
composing an overall solution from solutions to the localized subproblems.
2. Level 1 Do you agree with the stance taken in this chapter that multi-
agent planning requires both that the plan formulation process be distributed
among agents and that the resulting plan construct be distributed as well?
If so, justify in your own words why you believe this stance is warranted.
If not, give counterexamples to this stance and justify why they arguably
embody multiagent planning.
(a) For an 8 x 8 gridworld (empty other than robots), flesh out an example
of the social laws that avoid collisions, indicating for each location
where a robot is allowed to go, along with any other laws that the
robots should follow.
534 Chapter 11
(b) Are there gridworld shapes (again, empty other than robots) for which
useful social laws cannot be constructed? If so, give examples, and if
not, explain why not.
(c) Now, say in the 8 x 8 gridworld there is a wall that partitions the envi-
ronment in half except there is one pair of cells on each side that are
connected. How would you build social laws to handle such a case,
assuming that a robot might need to visit locations on both sides of the
world?
(a) Name a real-life example where task announcement makes much more
sense than availability announcement. Justify why.
(b) Now name a real-life example where availability announcement makes
much more sense. Justify why.
(c) Let’s say that you are going to build a mechanism that oversees a dis-
tributed problem-solving system, and can “switch” it to either a task
or availability announcement mode.
i. Assuming communication costs are negligible, what criteria
would you use to switch between modes? Be specific about what
you would test.
ii. If communication costs are high, now what criteria would you
use? Be specific about what you would test.
Chapter 11 535
7. Level 2/3 We noted that task announcing can be tricky: If a manager is too
fussy about eligibility, it might get no bids, but if it is too open it might have
to process too many bids, including those from inferior contractors. Let us
say that the manager has n levels of eligibility specifications from which
it needs to choose one. Describe how it would make this choice based on
a decision-theoretic formulation. How would this formulation change if it
needed to consider competition for contractors from other managers?
9. Level 1 Consider the pursuit task, with four predators attempting to sur-
round and capture a prey. Define an organizational structure for the preda-
tors. What are the roles and responsibilities of each? How does the structure
indicate the kinds of communication patterns (if any) that will lead to suc-
cess?
10. Level 2 Consider the following simple instance of the distributed delivery
task. Robot A is at position α and robot B is at position β. Article X is at
position ξ and needs to go to position ψ, and article Y is at position ψ and
needs to go to ζ. Positions α, β, ξ, ψ, and ζ are all different.
(c) Using the operators, generate the partial-order plan that, when dis-
tributed, will accomplish the deliveries as quickly as possible. Is this
the same plan as in the previous part of this problem? Why or why
not?
11. Level 2 Given the previous problem, include in the operator descriptions
conditions that disallow robots to be at the same position at the same time
(for example, a robot cannot do a pickup in a location where another is
doing a dropoff). Assuming each robot was given the task of delivering a
different one of the articles, generate the individual plans and then use the
plan modification algorithm to formulate the synchronized plans, including
any synchronization actions into the plans. Show your work.
12. Level 2 Consider the delivery problem given before the previous problem.
Assume that delivery plans can be decomposed into 3 subplans (pickup,
dropoff, and return), and that each of these subplans can further be decom-
posed into individual plan steps. Furthermore, assume that robots should
not occupy the same location at the same time – not just at dropoff/pickup
points, but throughout their travels. Use the hierarchical behavior-space
search algorithm to resolve potential conflicts between the robots’ plans,
given a few different layouts of the coordinates for the various positions
(that is, where path-crossing is maximized and minimized). What kinds
of coordinated plans arise depending on at what level of the hierarchy the
plans’ conflicts are resolved through synchronization?
13. Level 3 Assume that distributed delivery robots are in an environment
where delivery tasks pop up dynamically. When a delivery needs to be
done, the article to be delivered announces that it needs to be delivered,
and delivery agents within a particular distance from the article hear the
announcement.
(a) Assume that the distance from which articles can be heard is small.
What characteristics would an organizational structure among the de-
livery agents have to have to minimize the deliveries that might be
overlooked?
(b) Assume that the distance is instead large. Would an organizational
structure be beneficial anyway? Justify your answer.
(c) As they become aware of deliveries to be done, delivery agents try to
incorporate those into their current delivery plans. But the dynamic
nature of the domain means that these plans are undergoing evolution.
Under what assumptions would partial global planning be a good ap-
proach for coordinating the agents in this case?
Chapter 11 537
(d) Assume you are using partial global planning for coordination in this
problem. What would you believe would be a good planning level for
the agents to communicate and coordinate their plans? How would the
agents determine whether they were working on related plans? How
would they use this view to change their local plans? Would a hill-
climbing strategy work well for this?
14. Level 1 Given a flawed multiagent plan M with more than one unflagged
causal link to adjust, which causal link should the plan modification algo-
rithm prefer to adjust? Justify your (heuristic) selection strategy.
16. Level 2 The DEC-POMDP model (Definition 11.7) does not include ex-
plicit communication. Suppose that each agent can broadcast certain mes-
sages to all the other agents in each action cycle. Define precisely this kind
of a DEC-POMDP with two agents and explain why it is not an extension of
the standard model (i.e., show that every DEC-POMDP with explicit com-
munication can be reduced to a standard DEC-POMDP).
17. Level 3 The communication model presented in the previous question al-
lows each agent to broadcast a message to all the other agents in each step.
This means that the space of possible joint messages received by each agent
grows exponentially with the number of agents. Consider a more scalable
communication model that allows only one agent to broadcast a message in
each cycle (e.g, when multiple agents try to broadcast messages simultane-
ously, this may either result in failure or success of just one agent). Define
precisely one such model and determine whether it is reducible to a standard
DEC-POMDP or not.
18. Level 2 Consider the complete specification of the multiagent tiger problem
shown in Table 11.2.
(a) Derive the values of the deterministic policies for horizons 1–3 shown
in Figure 11.9.
(b) Derive the values of the deterministic finite-state controller policies
shown in Figure 11.11.
538 Chapter 11
19. Level 2 In the multiagent tiger problem, suppose that the reward for opening
the correct door (e.g., <OR,OR> when the state is TR) is increased to 50. Is
the horizon 1 policy in Figure 11.9 still optimal? If not, what is the optimal
policy (and its value)? Repeat the question for horizons 2 and 3.
20. Level 2 In the multiagent tiger problem, the optimal policy is to listen for
several steps before opening any door. If the observation probabilities in-
crease from 0.85 to 0.9, does that change the optimal horizon 1 policy?
What about horizons 2 and 3?
21. Level 2/3 If all agents share their observations with each other at each step,
the problem becomes centralized. In the multiagent tiger problem, what
would the resulting observations (and their probabilities) be for each agent
when observations are shared? How does this change the optimal policies
for horizons 1 and 2? Would the agents ever choose to open different doors?
Is a centralized solution (with shared observations) always guaranteed to
have value at least as high as a decentralized solution?
22. Level 2/3 If the transition and observation probabilities are independent for
each agent and the reward values are additive between the agents, the prob-
lem can be solved as a set of independent problems whose solutions can
be summed together. In the multiagent tiger problem, the observations are
Chapter 11 539
independent, but the transitions and rewards depend on all agents. Consider
the case where the tiger does not transition after a door is opened and each
agent receives a reward of 10 for opening the correct door, −50 for opening
the incorrect door and −1 for listening. What are the optimal horizon 1, 2,
and 3 policies for this case?
23. Level 2/3 Given the same number of nodes, stochastic controllers often al-
low higher-valued policies to be constructed compared to deterministic con-
trollers. Is there a one-node stochastic controller with a higher value than
the optimal one-node deterministic controller in the multiagent tiger prob-
lem? If there is, construct one. Otherwise, prove that this is not possible in
this case.
Acknowledgments
We thank Christopher Amato for invaluable feedback on the presentation of de-
centralized POMDPs, and for his help generating the corresponding examples,
figures, and exercises. Jeffrey Cox’s work on multiagent partial-order causal-link
planning has been drawn on heavily in this chapter as well. We thank all our
former students and collaborators for the many fruitful discussions on the topics
covered in this chapter, which helped sharpen our understanding of multiagent
planning.
References
[1] Christopher Amato, Daniel S. Bernstein, and Shlomo Zilberstein. Optimizing fixed-
size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous
Agents and Multi-Agent Systems, 21(3):293–320, 2010.
[3] Laura Barbulescu, Zack Rubinstein, Stephen Smith, and Terry Lyle Zimmerman.
Distributed coordination of mobile agent teams: The advantage of planning ahead.
In Proceedings of the 9th International Conference on Autonomous Agents and
Multiagent Systems (AAMAS 2010), pages 1331–1338, 2010.
[4] Raphen Becker, Victor Lesser, and Shlomo Zilberstein. Decentralized Markov deci-
sion processes with event-driven interactions. In Proceedings of the Third Interna-
tional Joint Conference on Autonomous Agents and Multi-Agent Systems, volume 1,
pages 302–309, 2004.
540 Chapter 11
[5] Daniel S. Bernstein, Christopher Amato, Eric A. Hansen, and Shlomo Zilberstein.
Policy iteration for decentralized control of Markov decision processes. Journal of
Artificial Intelligence Research, 34:89–132, 2009.
[6] Daniel S. Bernstein, Eric A. Hansen, and Shlomo Zilberstein. Bounded policy it-
eration for decentralized POMDPs. In Proceedings of the Nineteenth International
Joint Conference on Artificial Intelligence (IJCAI-05), pages 1287–1292, 2005.
[7] Daniel S. Bernstein, Shlomo Zilberstein, and Neil Immerman. The complexity of
decentralized control of Markov decision processes. In Proceedings of the Sixteenth
Conference on Uncertainty in Artificial Intelligence (UAI-00), pages 32–37, 2000.
[8] Aurélie Beynier and Abdel-Illah Mouaddib. A polynomial algorithm for decentral-
ized Markov decision processes with temporal constraints. In Proceedings of the
4th International Joint Conference on Autonomous Agents and Multiagent Systems
(AAMAS-05), pages 963–969, 2005.
[9] Craig Boutilier. Planning, learning and coordination in multiagent decision pro-
cesses. In Proceedings of the Conference on Theoretical Aspects of Rationality and
Knowledge (TARK-96), pages 195–210, 1996.
[10] Kathleen M. Carley and Les Gasser. Computational Organization Theory. MIT
Press, 1999.
[11] Bradley J. Clement, Edmund H. Durfee, and Anthony C. Barrett. Abstract reasoning
for planning and coordination. Journal of Artificial Intelligence Research, 28:453–
515, 2007.
[12] Philip R. Cohen and Hector J. Levesque. Intention is choice with commitment.
Artificial Intelligence, 42(2-3):213–261, 1990.
[13] Susan E. Conry, Kazuhiro Kuwabara, Victor R. Lesser, and Robert A. Meyer. Mul-
tistage negotiation for distributed constraint satisfaction. IEEE Transactions on Sys-
tems, Man, and Cybernetics, SMC-21(6):1462–1477, 1991.
[16] Daniel Corkill and Victor Lesser. The use of meta-level control for coordination in a
distributed problem solving network. Proceedings of the Eighth International Joint
Conference on Artificial Intelligence, pages 748–756, 1983.
[18] Randall Davis and Reid Smith. Negotiation as a metaphor for distributed problem
solving. Artificial Intelligence, 20:63–109, 1983.
[19] Dmitri A. Dolgov and Edmund H. Durfee. Resource allocation among agents with
MDP-induced preferences. Journal of Artificial Intelligence Research, 27:505–549,
2006.
[22] Edmund H. Durfee, Victor R. Lesser, and Daniel D. Corkill. Cooperation through
communication in a distributed problem solving network. In M. Huhns, editor, Dis-
tributed Artificial Intelligence, chapter 2. Pitman, 1987.
[23] Eithan Ephrati and Jeffrey S. Rosenschein. Divide and conquer in multi-agent plan-
ning. In Proceedings of the 12th National Conference on Artificial Intelligence
(AAAI-94), pages 375–380, 1994.
[24] Eithan Eprhati, Martha Pollack, and Jeffrey S. Rosenschein. A tractable heuristic
that maximizes global utility through local plan combination. In Proceedings of the
First International Conference on Multiagent Systems (ICMAS-95), pages 94–101,
1995.
[27] Michael P. Georgeff. A theory of action for multiagent planning. In AAAI, pages
121–125, 1984.
[28] Piotr J. Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in
multiagent settings. Journal of Artificial Intelligence Research, 24:49–79, 2005.
[29] Claudia Goldman and Jeffrey S. Rosenschein. Emergent coordination through the
use of cooperative state-changing rules. In Proceedings of the 12th National Con-
ference on Artificial Intelligence (AAAI-94), pages 408–413, 1994.
542 Chapter 11
[31] Barbara J. Grosz and Sarit Kraus. Collaborative plans for complex group action.
Artificial Intelligence, 86(2):269–357, 1996.
[32] Eric A. Hansen, Daniel S. Bernstein, and Shlomo Zilberstein. Dynamic program-
ming for partially observable stochastic games. In Proceedings of the Nineteenth
National Conference on Artificial Intelligence, pages 709–715, 2004.
[34] Subbarao Kambhampati, Mark Cutkosky, Marty Tenenbaum, and Soo Hong Lee.
Combining specialized reasoners and general purpose planners: A case study. In
Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91),
pages 199–205, 1991.
[36] Akshat Kumar and Shlomo Zilberstein. Constraint-based dynamic programming for
decentralized POMDPs with structured interactions. In Proceedings of the Eighth
International Conference on Autonomous Agents and Multiagent Systems (AAMAS-
09), pages 561–568, 2009.
[37] Akshat Kumar and Shlomo Zilberstein. Point-based backup for decentralized
POMDPs: Complexity and new algorithms. In Proceedings of the Ninth Inter-
national Conference on Autonomous Agents and Multiagent Systems (AAMAS-10),
pages 1315–1322, 2010.
[38] Akshat Kumar and Shlomo Zilberstein. Message-passing algorithms for large struc-
tured decentralized POMDPs. In Proceedings of the Tenth International Conference
on Autonomous Agents and Multiagent Systems (AAMAS-11), pages 1087–1088,
2011.
[40] Victor R. Lesser and Daniel D. Corkill. Functionally accurate, cooperative dis-
tributed systems. IEEE Transactions on Systems, Man, and Cybernetics, 11:81–96,
1981.
Chapter 11 543
[41] Victor R. Lesser and Lee D. Erman. Distributed interpretation: A model and an
experiment. IEEE Transactions on Computers, C-29(12):1144–1163, 1980.
[42] Douglas MacIntosh, Susan Conry, and Robert Meyer. Distributed automated rea-
soning: Issues in coordination, cooperation, and performance. IEEE Transactions
on Systems, Man, and Cybernetics, SMC-21(6):1307–1316, 1991.
[43] Rajiv T. Maheswaran, Craig Milo Rogers, Romeo Sanchez, and Pedro A. Szekely.
Decision-support for real-time multi-agent coordination. In AAMAS’10, pages
1771–1772, 2010.
[44] Ranjit Nair, David Pynadath, Makoto Yokoo, Milind Tambe, and Stacy Marsella.
Taming decentralized POMDPs: Towards efficient policy computation for multi-
agent settings. In Proceedings of the Eighteenth International Joint Conference on
Artificial Intelligence (IJCAI-03), pages 705–711, 2003.
[45] Ranjit Nair, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. Net-
worked distributed POMDPs: A synthesis of distributed constraint optimization and
POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intel-
ligence (AAAI-05), pages 133–139, 2005.
[46] Raz Nissim, Ronen I. Brafman, and Carmel Domshlak. A general, fully distributed
multi-agent planning algorithm. In Proceedings of the 9th International Conference
on Autonomous Agents and Multiagent Systems (AAMAS 2010), pages 1323–1330,
2010.
[48] David V. Pynadath and Milind Tambe. The communicative multiagent team de-
cision problem: Analyzing teamwork theories and models. Journal of Artificial
Intelligence Research, 16:389–423, 2002.
[49] David V. Pynadath and Milind Tambe. An automated teamwork infrastructure for
heterogeneous software agents and humans. Autonomous Agents and Multi-Agent
Systems, pages 71–100, 2003.
[50] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.
Prentice-Hall, 2nd edition, 2003.
[51] Earl Sacerdoti. A Structure for Plans and Behavior. Elsevier North-Holland Inc.,
1977.
[52] Sandip Sen and Edmund H. Durfee. A contracting model for flexible distributed
scheduling. Annals of Operations Research, 65:195–222, 1996.
544 Chapter 11
[53] Sven Seuken and Shlomo Zilberstein. Memory-bounded dynamic programming for
DEC-POMDPs. In Proceedings of the Twentieth International Joint Conference on
Artificial Intelligence (IJCAI-07), pages 2009–2015, 2007.
[54] Sven Seuken and Shlomo Zilberstein. Formal models and algorithms for decen-
tralized decision making under uncertainty. Autonomous Agents and Multi-Agent
Systems, 17(2):190–250, 2008.
[55] Yoav Shoham and Moshe Tennenholtz. On the synthesis of useful social laws for ar-
tificial agent societies. In Proceedings of the 10th National Conference on Artificial
Intelligence (AAAI-92), pages 276–281, 1992.
[56] Yoav Shoham and Moshe Tennenholtz. On social laws for artificial agent societies:
Off-line design. Artificial Intelligence, 73:231–252, 1995.
[57] Mark Sims, Daniel Corkill, and Victor Lesser. Automated organization design for
multi-agent systems. Autonomous Agents and Multi-Agent Systems, 16(2):151–185,
2008.
[58] Daniel Szer and Francois Charpillet. Point-based dynamic programming for DEC-
POMDPs. In Proceedings of the Twenty-First National Conference on Artificial
Intelligence (AAAI-06), pages 1233–1238, 2006.
[59] Milind Tambe. Agent architectures for flexible, practical teamwork. In Proceedings
of AAAI/IAAI’97, pages 22–28, 1997.
[60] Hans Tonino, André Bos, Mathijs de Weerdt, and Cees Witteveen. Plan coordination
by revision in collective agent based systems. Artificial Intelligence, 142(2):121–
145, 2002.
[61] Ioannis Tsamardinos, Martha E. Pollack, and John F. Horty. Merging plans with
quantitative temporal constraints, temporally extended actions, and conditional
branches. In Proceedings of the Conference on AI Planning Systems, pages 264–
272, 2000.
[62] Guandong Wang, Weixiong Zhang, Roger Mailler, and Victor Lesser. Analysis of
negotiation protocols by distributed search. In Victor Lesser, Charles Ortiz, and
Milind Tambe, editors, Distributed Sensor Networks: A Multiagent Perspective,
pages 339–361. Kluwer Academic Publishers, 2003.
[64] Stefan J. Witwicki and Edmund H. Durfee. Influence-based policy abstraction for
weakly-coupled Dec-POMDPs. In Proceedings of the 20th International Conference
on Automated Planning and Scheduling (ICAPS 2010), pages 185–192, 2010.
Chapter 11 545
[65] Feng Wu, Shlomo Zilberstein, and Xiaoping Chen. Trial-based dynamic program-
ming for multi-agent planning. In Proceedings of the Twenty-Fourth Conference on
Artificial Intelligence (AAAI-10), pages 908–914, 2010.
[67] Makoto Yokoo, Edmund H. Durfee, Toru Ishida, and Kazuhiro Kuwabara. The
distributed constraint satisfaction problem: Formalization and algorithms. IEEE
Transactions on Knowledge and Data Engineering, 10:673–685, 1998.
[68] Chengqi Zhang. Cooperation under uncertainty in distributed expert systems. Arti-
ficial Intelligence, 56(1):21–69, 1992.
Chapter 12
1 Introduction
Constraints pervade our everyday lives and are usually perceived as elements that
limit solutions to the problems that we face (e.g., the choices we make everyday
are typically constrained by limited money or time). However, from a computa-
tional point of view, constraints are key components for efficiently solving hard
problems. In fact, constraints encode knowledge about the problem at hand, and
so restrict the space of possible solutions that must be considered. By doing so,
they greatly reduce the computational effort required to solve a problem.
Against this background, constraint processing can be viewed as a wide and
diverse research area unifying techniques and algorithms that span across many
different disciplines including planning and scheduling, operation research, com-
puter vision, automated reasoning, and decision theory. All these areas deal with
hard computational problems that can be made more tractable by carefully con-
sidering the constraints that define the structure of the problem.
Here we will focus on how constraint processing can be used to address op-
timization problems in multiagent systems. Specifically, we will consider dis-
tributed constraint optimization problems (DCOPs), whereby a set of agents must
come to some agreement, typically via some form of negotiation, about which
action each agent should take in order to jointly obtain the best solution for the
whole system. This framework has been frequently used in the MAS literature
548 Chapter 12
• The main exact solution techniques for DCOPs and the key differences be-
tween them in terms of benefits and limitations.
• Why and when approximate solution techniques should be used for DCOPs
and the main algorithms in this space.
framework and the bounded max-sum approach. Finally, Section 7 concludes the
chapter.
control only a subset of the variables Xi ⊆ X, and each variable is assigned to ex-
actly one agent. In other words, the assignment of variables to agents must be a
partition of the set of variables. Agents can control only the variables assigned to
them, meaning that they can observe and change the values of their assigned vari-
ables only. Moreover, agents are only aware of constraints that involve variables
that they can control. Such constraints are usually termed local functions and the
sum of these local functions is the local utility of the agent. Finally, two agents are
considered neighbors if there is at least one constraint that depends on variables
that each controls. Only neighboring agents can directly communicate with each
other.
Within this context, the goal for the agents is to find the optimal solution to the
constraint network, i.e., to find the assignment for all the variables in the system
that optimizes the global function. Thus, in a standard DCOP setting, agents are
assumed not to be self-interested, i.e., their goal is to optimize the global function
and not their local utilities.
Finding an optimal solution for a DCOP is an NP-hard problem, which can be
seen by reducing a DCOP to the problem of deciding on the 3-colorability of a
graph, a problem known to be NP-complete [26].
In the next section we will present a number of practical problems that can be
addressed using the DCOP framework, as well as some exemplar and benchmark-
ing DCOP instances.
Many real-world applications can be modeled using the DCOP framework, rang-
ing from human-agent organizations to sensor networks and robotics. Here we
focus on two such applications that have been frequently used as motivating sce-
narios for work in the MAS literature.
552 Chapter 12
The problem of scheduling a set of tasks over a set of resources (e.g., schedule
a set of lectures over a set of lecture halls or a set of jobs to a set of processors)
is a very common and important problem, which can be conveniently formalized
using constraint networks.
A typical example of this is the meeting scheduling problem, which is a very
relevant problem for large organizations (e.g., public administration, private com-
panies, research institutes, etc.), where many people, possibly working in different
departments, are involved in a number of work meetings. In more detail, people
involved in a meeting scheduling problem might have various private preferences
for meeting start times; for example, a given employee might prefer his or her
meetings to start in the afternoon rather than in the morning (to happily conju-
gate a late night social life with work!). Given this, the aim is to agree on a valid
schedule for the meeting while maximizing the sum of the individuals’ private
preferences. To be valid, a schedule must meet obvious hard constraints, for ex-
ample, two meetings that share a participant cannot overlap.
A possible DCOP formalization for the meeting scheduling domain involves
a set of agents representing the people participating in the meeting and a set of
variables that represent the possible starting time of a given meeting according
to a participant. Constraints force equality on variables that represent the start-
ing time of the same meeting across different agents and ensure that variables
that represent the starting times of different meetings for the same agent are non-
overlapping. Finally, preferences can be represented as soft constraints on meeting
starting times and the overall aim is to optimize the sum of all the soft constraints.
Notice that although in this setting we do have private preferences, we are maxi-
mizing the sum of preferences of all the agents, and thus we are still considering a
scenario where agents are fully cooperative, i.e., they are willing to diminish their
own local utility if it will maximize the global utility.
While this problem could be easily formalized as a centralized COP, in this
case a distributed approach not only provides a more robust and scalable solution,
but it can also minimize the amount of information agents must reveal to each
other (thus preserving their privacy). This is because, as mentioned above, in a
DCOP, agents are required to be aware only of constraints that they are involved
in. For example, consider a situation whereby Alice must meet Bob and Charles
in two separate meetings. In a centralized approach, Alice would have to reveal
the list of people she has to meet with. On the other hand, in a DCOP only people
involved in any particular meeting will be aware that the meeting is taking place.
Thus in our example, Bob does not need to know that Alice will also meet with
Charles.
Chapter 12 553
lem has been addressed by the DCOP community using benchmarking problem
instances inspired by practical applications, such as meeting scheduling and target
tracking.3
In addition, there are also a number of exemplar NP problems that are fre-
quently used to test solution techniques such as propositional satisfaction (SAT)
or graph coloring. Here we focus on the latter problem as it has been widely
used to evaluate the techniques that will be presented later in this chapter and is
particularly useful for illustrative purposes.
The graph coloring problem is an extremely simple problem to formulate and
is attractive, since the computational effort associated with finding the solution
can be easily controlled using few parameters (e.g., number of available colors,
and the ratio of number of constraints to the number of nodes). The constraint
satisfaction version of a graph coloring problem can be described as follows: given
a graph of any size and k possible colors, decide whether the nodes of the graphs
can be colored with no more than k colors, so that any two adjacent nodes have
different colors.
In the CSP formulation of the graph coloring problem, nodes are variables,
the set of k colors is the variable domain (which is the same for all the variables),
and constraints are not-equal constraints that hold between any adjacent nodes.
An assignment is a map from nodes to colors without constraint violations. The
optimization version is a max-CSP problem where the aim is to minimize the
number of constraint violations. The optimization version of the graph coloring
problem can be generalized in many ways, for example, by assigning different
weights to violated constraints or by giving different values to conflict violations
based on the color that causes the conflict (e.g., a penalty of 1 if both nodes are
blue and a penalty of 10 if both nodes are red).
of the work related to the ADOPT technique falls within this setting; therefore in
what follows we embrace these assumptions. Moreover, to maintain a close cor-
respondence with the original ADOPT description we assume that our task here
is a minimization problem. Hence constraints represent costs and agents wish to
find an assignment that minimizes the sum of these costs.
ADOPT is a search-based technique that performs a distributed backtrack
search using a best-first strategy; each agent always assigns to its variable the best
value based on local information. The key components of the ADOPT algorithm
are: (i) local lower-bound estimates, (ii) backtrack thresholds, and (iii) termina-
tion conditions. In particular, each agent maintains a lower-bound estimate for
each possible value of its variable. This lower bound is initially computed based
only on the local cost function, and is then refined as more information is passed
between the agents. Each agent will choose the value of its variable that mini-
mizes this lower bound, and this decision is made asynchronously as soon as the
local lower bound is updated.
Backtrack thresholds are used to speed up the search of previously explored
solutions. This can happen because the search strategy is based on local lower
bounds, and thus agents can abandon values before they are proven to be subopti-
mal. Backtrack thresholds are lower bounds that have been previously determined
and can prevent agents from exploring useless branches of the search tree.
Finally, ADOPT uses a bound interval to evaluate the search progress. Specifi-
cally, each agent maintains not only a lower bound but also an upper bound on the
optimal solution. Therefore, when these two values agree, the search process can
terminate and the current solution can be returned. In addition, this feature can be
used to look for solutions that are suboptimal but within a given predefined bound
of the optimal solution. The user can specify a valid error bound (i.e., the distance
between the optimal solution value and an acceptable suboptimal solution) and as
soon as the bound interval becomes less than this value the search process can be
stopped.
Before executing the ADOPT algorithm, agents must be arranged in a depth-
first search (DFS) tree. A DFS tree order is defined by considering direct par-
ent/child relationships between agents. DFS tree orderings have been frequently
used in optimization (see for example [30]) because they have two interesting
properties: (i) agents in different branches of the tree do not share any constraints,
and (ii) every constraint network can be ordered in a DFS tree and this can be
done in polynomial time with a distributed procedure [28]. The fact that agents
in different branches do not share constraints is an important property as it en-
sures that they can search for solutions independently of each other. Figure 12.1
shows an exemplar constraint network. Figure 12.2 reports a possible DFS order,
where solid lines show parent/child relationships and constraints are not repre-
Chapter 12 557
x1
F1,2 F1,3
Fi, j xi xj
x2 x3 2 0 0
F1,4 0 0 1
0 1 0
F2,4 1 1 1
x4
[0, 2, c2] x1 = 0
3 [0, ∞, c2] [0, 0, c3]
A1
2
2
x2 = 0− > 1 1 1
A3
A2
c2 : {(x1 = 0)} 3
1
2 1 x3 = 0
[0, 0, c4] c3 : {(x1 = 0)}
Parent/Child
x3 = 0 A4 Value
Figure 12.2: Message exchange in the ADOPT algorithm: Value and Cost mes-
sages for one possible trace of execution. Numbers within squares indicate the
(partial) order of the messages.
sented. Given a constraint network, the DFS ordering is not unique and ADOPT’s
performance (in terms of coordination overhead) depends on the actual DFS or-
558 Chapter 12
dering used. Finding the optimal DFS tree is a challenging problem, which the
ADOPT technique does not address.
Given a DFS ordering of the agents, the algorithm proceeds by exchanging
three types of messages: Value, Cost, and Threshold. When the algorithm starts,
all agents choose a random value for their variables, and initialize the lower bound
and upper bound of their variables’ possible values to zero and infinity, respec-
tively. These bounds are then iteratively refined as more information is transmit-
ted across the network. Figure 12.2 reports messages exchanged among the agents
during the first stages of the algorithm. Since the algorithm is asynchronous, we
report here one possible trace of execution, and numbers within squares indicate
the (partial) order of the messages.
In more detail, Value messages are sent by an agent to all its neighbors that
are lower in the DFS tree order than itself, reporting the value that the agent has
assigned to its variable. For example, in Figure 12.2 agent A1 sends three value
messages to A2 , A3 , and A4 , informing them that its current value is 0. Notice
that this message is sent to A4 even though there is no parent/child relationship
between A1 and A4 because A4 is a neighbor of A1 , who is lower in the DFS order.
Cost messages are sent by an agent to its parent, reporting the minimum lower
and upper bound across all the agent’s variable values, and the current context.
The current context is a partial variable assignment, and, in particular, it records
the assignment of all higher neighbors. For example, in Figure 12.2 the current
context for A4 , c4 , is {(x1 , 0), (x2 , 0)}. The minimum lower bound and minimum
upper bound are computed with respect to the current context. To compute the
minimum lower bound each agent evaluates its own local cost for each possible
value of its variable, adding all the lower-bound messages received from children
that are compatible with the current variable value. The local cost for an agent
is the sum of the values of local cost function for all the higher neighbors. For
example, consider the cost message sent by A4 . The minimum lower bound (which
is 0) is computed by finding the minimum between δ(x4 = 0) = 4 and δ(x4 = 1) =
0, where δ(a) is the local cost function when the variable assumes the value a. The
local cost function is computed by summing up the values of the cost functions
for all neighbors higher in the DFS order and by assigning their values according
to the current context. A similar computation is performed for the upper bound.
Cost messages for agents that are not leaves of the DFS tree (e.g., A2 ) also
include the lower and upper bound for each child. For example consider the cost
message sent by A2 to A1 and let LB represent the minimum lower bound across all
variables’ values. LB is then computed by finding the minimum between LB(x2 =
0) = δ(x2 = 0) + lb(x2 = 0, x4 ) = 2 and LB(x2 = 1) = δ(x2 = 1) + lb(x2 = 1, x4 ) =
0, resulting in LB = 0. Here lb(a, xl ) is the lower bound for the child variable xl
when the current variable is assigned to a in the current context.
Chapter 12 559
x1 = 0− > 1− > 0
A2 A3
Parent/Child
Threshold
A4
Threshold messages are sent from parents to children to update the agent’s
backtrack thresholds. Backtrack thresholds are particularly useful when a previ-
ously visited context is revisited. Each agent stores cost information (e.g., upper
and lower bounds) only for the current context and deletes previously stored in-
formation as soon as the context changes. In fact, if an agent did maintain such
information for every visited context, it will need an exponential space in mem-
ory. However, since a context might be visited multiple times during the search
process, whenever this happens the agent starts computing cost information from
scratch. Now, since this context was visited before, the agent reported the sum of
cost information to its parent and since the parent has that information stored, it
can now send it back to the agent via a threshold message. The threshold message
is used to set the agent’s threshold to a previous valid lower bound, and propagate
cost information down the tree, avoiding needless computation. Notice that the in-
formation that the parent stores is the accumulated cost information. Therefore, to
propagate information down the tree, the agent must subdivide this accumulated
cost across its children using some heuristic, as the original cost subdivision is
lost, and then correct this subdivision over time as cost feedback is received from
the children. For example, assume that during the search process, x1 changes its
value and then the context with x1 = 0 is visited again. Agent A1 will then send
threshold messages to A2 and A3 as shown in Figure 12.3. Notice that the value
of these messages is the lower-bound value sent by the corresponding child agent
560 Chapter 12
for that context, e.g. the message t(x1 = 0, x2 ) equals the lower bound sent by A2
to A1 with context {(x1 , 0)}.
Finally, agents asynchronously update a variable’s value whenever the stored
lower bound for the current value exceeds the backtrack threshold and the new
variable’s value is the one that minimizes the stored lower bounds. For example,
consider agent A2 in Figure 12.2, when receiving the cost message from A1 . In this
case, the lower bound for the current value (LB(x2 = 0) = 2) exceeds the threshold
(initially set to 0). Therefore, the agent updates its variable value to the one that
minimizes the lower bound, which in our case is x2 = 1. It then sends cost and
value messages accordingly. When the minimum lower bound for a variable value
is also an upper bound for that value, the agent can stop propagating messages as
that value will be optimal given the current context. When this condition is true at
the root agent, a terminate message is sent to all the children. Agents propagate the
termination message if the termination condition is true for them as well. When
the terminate message has propagated to all the agents, the algorithm stops, and
the optimal solution has been found.
ADOPT is particularly interesting because it is asynchronous and because the
memory usage of each agent is polynomial in the number of variables. Moreover,
messages are all of a fixed size. However, the number of messages that agents
need to exchange is, in the worst case, exponential in the number of variables.
This impacts on the time required to find the optimal solution. In particular, the
number of message synchronization cycles, defined as all agents receiving incom-
ing messages and sending outgoing messages simultaneously, is exponential in
the number of variables. This is a frequently used measure to evaluate DCOPs so-
lution techniques as it is less sensitive to variations in agents’ computation speed
and communication delays than the wall clock. As previously remarked, such ex-
ponential elements are unavoidable in complete approaches and they can severely
restrict the scalability of the approach.
Several works build on ADOPT attempting to reduce computation time. For
example, Yeoh et al. propose BnB-ADOPT, which is an extension of ADOPT that
consistently reduces computation time by using a different search strategy: depth-
first search with branch and bound instead of best-first search [39]. Moreover,
Ali et al. suggest the use of preprocessing techniques for guiding ADOPT search
and show that this can result in a consistent increase in performance [3]. Finally,
Gutierrez and Meseguer show that many messages that are sent by BnB-ADOPT
are in fact redundant and most of them can be removed, resulting in significant
reduction in communication costs [16].
In the next section we describe a completely different approach based on dy-
namic programming.
Chapter 12 561
x4 x1
x1
F1,2 F1,3
F2,3 x2 x2
x2 x3
F1,4
F2,4 x3 x3
x4
x1 x4
Figure 12.4: (a) Exemplar constraint network, with (b) induced graph with DFS
order {x4 , x2 , x3 , x1 }, and (c) induced graph with DFS order {x1 , x2 , x3 , x4 }.
whose size is exponential in the induced width of the DFS tree ordering.
More specifically, DPOP can operate on a pseudo-tree ordering of the con-
straint network. A pseudo-tree ordering is one where nodes that share a constraint
fall in the same branch of the tree. DFS tree ordering is thus a special case of a
pseudo-tree that can be easily obtained with a DFS traversal of the original graph.
Now, the complexity of the DPOP algorithm is strongly related to the DFS ar-
rangement on which the algorithm is run, and, in particular, it is exponential in
the induced width of the DFS tree ordering. Given a graph and an ordering of
its nodes, the width induced by the ordering is the maximum induced width of a
node, which is simply given by how many parents it has in the induced graph. The
induced graph can be computed by processing the nodes from last to first, and for
each node, adding edges to connect all the parents of that node (i.e., neighbors
that precede the node in the order).
In particular, Figure 12.4 shows a constraint network and two induced graphs
given by different orderings. The induced width for the graph in Figure 12.4(b)
is 3, while the induced width for the graph in Figure 12.4(c) is 2. Notice the
dashed edge between x3 and x4 in Figure 12.4(b) that was added when building
the induced graph. While there are various heuristics to generate DFS orderings
with small induced width, finding the one with the minimum induced width is
an NP-hard problem. Figure 12.5 reports a DFS arrangement for the constraint
network shown in Figure 12.4(a), along with messages that will be exchanged
during the following phases. Dashed edges represent constraints that are part of
Chapter 12 563
A1
V1→2
U2→1
A2
V2→4 V2→3
U4→2 U3→2
A4 A3
the constraint network but are not part of the DFS tree. These are usually called
back-edges.
Once the variables have been arranged in a DFS tree structure, the Util prop-
agation phase starts. Util propagation goes from leaves, up the tree, to the root
node. Each agent computes messages for its parent considering both the mes-
sages received from its children and the constraints that the agent is involved in.
In general, the Util message Ui→ j that agent Ai sends to its parent A j can be com-
puted according to the following equation:
⎛ ⎞
Ui→ j (Sepi ) = max ⎝ Uk→i ⊕ Fi,p ⎠ (12.2)
xi
Ak ∈Ci A p ∈Pi ∪PPi
where Ci is the set of children for agent Ai , Pi is the parent of Ai , PPi is the set
of agents preceding Ai in the pseudo-tree order that are connected to Ai through
a back-edge (pseudo-parents), and Sepi is the set of agents preceding Ai in the
pseudo-tree order that are connected with Ai or with a descendant of Ai .5 The
⊕ operator is a join operator that sums up functions with different but overlap-
ping scopes consistently, i.e., summing the values of the functions for assignments
that agree on the shared variables. For example, considering again Figure 12.5,
agent A3 sends the message U3→2 (x1 , x2 ) = maxx3 (F1,3 (x1 , x3 ) ⊕ F2,3 (x2 , x3 )) be-
cause there are no messages from its children, while agent A2 sends the message
5 Thisset is called the separator because it is precisely the set of agents that should be removed
to completely separate the subtree rooted at Ai from the rest of the network.
564 Chapter 12
U2→1 (x1 ) = maxx2 (U3→2 (x1 , x2 ) ⊕ U4→2 (x1 , x2 ) ⊕ F1,2 (x1 , x2 )). It is possible to
show that the size of the largest separator in a DFS tree equals the induced width
of the tree, which clarifies the exponential dependence on the induced width of
message size.
Finally, the Value message propagation phase builds the optimal assignment
proceeding from root to leaves. Root agent Ar computes xr∗ , which is the argu-
ment that maximizes the sum of the messages received by all its children (plus
all unary relations it is involved in) and sends a message Vr→c = {xr = xr∗ } con-
taining this value to all its children Ac ∈ Cr . The generic agent Ai computes xi∗ =
arg maxxi (∑A j ∈Ci U j→i (x∗p ) + ∑A j ∈Pi ∪PPi Fi, j (xi , x∗j )), where x∗p = A j ∈Pi ∪PPi {x∗j } is
the set of optimal values for Ai ’s parent and pseudo-parents received from Ai ’s
parent. Finally, the generic agent Ai sends a message to each child A j with
value Vi→ j = {xi = xi∗ } ∪ xs ∈Sepi ∩Sep j {xs = xs∗ }. For example, assume agent A1 ’s
optimal value is x1∗ = 1, then agent A2 computes x2∗ = arg maxx2 (U3→2 (1, x2 ) ⊕
U4→2 (1, x2 ) ⊕ F1,2 (1, x2 )) and propagates the message {(x1 = 1), (x2 = x2∗ )} to
agents A3 and A4 . Notice that the maximization performed by agent A4 in the
value propagation phase is the same as the one previously done to compute the
Util messages, but now with the aim of finding the value that maximizes the equa-
tion. Hence computation can be reduced by storing the appropriate values during
the Util propagation phase.
As discussed above, DPOP message size, and hence the computation that
agents need to compute it, is exponential. However, it is only exponential in the
induced width of the DFS tree ordering used, which, in general, is much less than
the total number of variables. Furthermore, there are many extensions of DPOP
that address various possible trade-offs in the approach. In particular, MB-DPOP
exploits the cycle-cut set idea to address the trade-off between the number of mes-
sages used and the amount of memory that each message requires [31]. On the
other hand, A-DPOP addresses the trade-off between message size and solution
quality [29]. Specifically, A-DPOP attempts to reduce message size by optimally
computing only a part of the messages and approximating the rest (with upper
and lower bounds). Given a fixed approximation ratio, A-DPOP can then reduce
message size to meet this ratio, or, alternatively, given a fixed maximum message
size, it propagates only those messages that do not exceed that size.
As a final remark, note that there is a close relationship between DPOP and
the generalized distributive law (GDL) framework, which we shall discuss further
in Section 5.2 [2]. GDL represents a family of techniques frequently used in in-
formation theory for decoding error correcting codes6 [21], and solving graphical
6 Decodingturbo codes is probably the most important representative application for which
GDL techniques are used. See [21], Section 48.4, for details.
Chapter 12 565
DSA can also be used in an asynchronous context and empirical results show that
the algorithm is still effective if the rate of variable change is low with respect to
the communication latency, thus allowing information to be propagated coherently
in the system. Moreover, in most work, the activation probability is not decided
at each optimization step, but is fixed at the beginning of the execution and is the
same for all agents. The main strength of the DSA algorithm is its extremely low
Chapter 12 567
• Send its current value ai to neighbors and receive values from neighbors.
• Choose a value a∗i such that the local gain g∗i is maximized (assuming neigh-
bors do not change value).
• Send the gain g∗i to neighbors and receive gain from neighbors.
• If the gain for the agent is the highest in the neighborhood, update the value
of xi to a∗i .
A common characteristic of both DSA and MGM is that decisions are made
considering local information only, i.e., when deciding the next value for its vari-
able each agent optimizes only with respect to the current assignments of its neigh-
bors. This provides for extremely low cost and scalable techniques; however,
solution quality is strongly compromised by local maxima, which can, in gen-
eral, be arbitrarily far from the optimal solution. In the next section, we present
an algorithm for solving DCOP, based on the generalized distributive law (GDL)
framework, that overcomes this limitation.
where αi j is a normalization constant, N(i) is the set of indices for variables that
are connected to xi , and Fi j is the constraint defined over the variables controlled
Chapter 12 569
m1→2(x2) x1
m3→1(x1)
m2→1(x1)
x2 x3
m4→1(x1)
x4
This is then used to obtain the max-sum assignment, x̃, which, for every variable
xi ∈ X is given by:
x̃i = arg max zi (xi ) (12.5)
xi
Figure 12.6 shows input and output messages for agent A1 . In this example, the
message to agent A2 is computed as m1→2 (x2 ) = maxx1 (F1,2 (x1 , x2 ) + m3→1 (x1 ) +
m4→1 (x1 )) and z1 (x1 ) = m2→1 (x1 ) + m3→1 (x1 ) + m4→1 (x1 ).
The max-sum technique is guaranteed to solve the problem optimally on
acyclic structures, but when applied to general graphs that contain loops, only
limited theoretical results hold for solution quality and convergence. Nonetheless,
extensive empirical evidence demonstrates that, despite the lack of convergence
guarantees, the max-sum algorithm does in fact generate good approximate solu-
tions when applied to cyclic graphs in various domains [11, 13, 19]. When the
570 Chapter 12
algorithm does converge, it does not converge to a simple local maximum but
rather to a neighborhood maximum that is guaranteed to be greater than all other
maxima within a particular large region of the search space [38]. Characteriz-
ing this region is an ongoing area of research and to date has only considered
small graphs with specific topologies (e.g., several researchers have focused on
the analysis of the algorithm’s convergence in graphs containing just a single loop
[1, 38]).
The max-sum algorithm is attractive for decentralized coordination of compu-
tationally and communication constrained devices since the messages are small
(they scale with the domain of the variables), and the number of messages ex-
changed typically varies linearly with the number of agents within the system.
Moreover, when constraints are binary, the computational complexity to update
the messages and perform the optimization is polynomial. In the more general
case of n-ary constraints, this complexity scales exponentially with just the num-
ber of variables on which each function depends (which is typically much less
than the total number of variables in the system). However, as with the previously
discussed approximate algorithms, the lack of guaranteed convergence and guar-
anteed solution quality in general cases limits the use of the standard max-sum
algorithm in many application domains.
A possible solution to address this problem is to remove cycles from the con-
straint graph by arranging it into tree-like structures such as junction trees [20] or
pseudo-trees [30]. However, such arrangements result in an exponential element
in the computation of the solution or in the communication overhead. For exam-
ple, DPOP is functionally equivalent to performing max-sum over a pseudo-tree
formed by depth-first search of the constraint graph, and the resulting maximum
message size is exponential with respect to the width of the pseudo-tree. This ex-
ponential element is unavoidable in order to guarantee optimality of the solution
and is tied to the combinatorial nature of the optimization problem. However, as
discussed in the introduction of this chapter, such exponential behavior is undesir-
able in systems composed of devices with constrained computational resources.
To address these issues, low overhead approximation algorithms that can pro-
vide quality guarantees on the solution are a key area of research, and we discuss
the most prominent approaches in this area in the next section.
dressing this trade-off is particularly important in dynamic settings and when the
agents have severe constraints on computational power, memory, or communica-
tion (which is usually the case for applications involving embedded devices, such
as mobile robots or sensor networks). Moreover, having a bound on the quality
of the provided solutions is particularly important for safety-critical applications
(such as disaster response, surveillance, etc.) because pathological behavior of the
system is, in this case, simply unacceptable.
Guarantees that can be provided by approximate algorithms can be broadly di-
vided in two main categories: off-line and online. The former can provide a char-
acterization of the solution quality without running any algorithm on the specific
problem instances. In contrast, the latter can only provide quality guarantees for
a solution after processing a specific problem instance. Off-line guarantees rep-
resent the classical definition of approximation algorithms [8], and they provide
very general results not tied to specific problem instances. In this sense they are
generally preferred to online guarantees. However, online guarantees are usually
much tighter than off-line ones, precisely because they can exploit knowledge on
the specific problem instance, and, thus, better characterize the bound on solution
quality.
Here we present two representative approaches for these two classes: the k-
optimality framework and the bounded max-sum approach.
As mentioned above, Equation 12.6 is valid for every possible constraint net-
work. This is because the bound is the result of a worst-case analysis on a com-
pletely connected graph. If we restrict our attention to specific constraint net-
work topologies it is possible to obtain better bounds. For example, for a network
with a ring topology, where each agent has only two constraints, we have that
F(x̂) ≥ k−1 ∗
k+1 F(x ). This is a much better bound as it does not depend on the num-
ber of agents in the system, but applies only to a very specific topology. Similar
considerations hold for assumptions about the reward structure. Specifically, bet-
ter guarantees can be provided assuming some a priori knowledge on the reward
structure. For example, Bowring et al. show that the approximation bounds can
be improved by knowing the ratio between the minimum reward to the maximum
reward [5]. In addition, they also extend the k-optimality analysis to DCOPs that
include hard constraints. However, while they are able to significantly improve
on the accuracy of the bound by exploiting a priori knowledge on the reward, the
bound is still dependent on the number of agents, and decreases as the number of
agents grows; thus, it is of little help for large-scale applications.
574 Chapter 12
1. Define the weight of each constraint as: wi j = min{wi j , wi j } where wi j =
$ % $ %
maxx j maxxi Fi j − minxi Fi j and wi j = maxxi maxx j Fi j − minx j Fi j . For
example, Figure 12.7 reports a constraint network and the weights that
the BMS algorithm would compute. Specifically, for the constraint be-
tween x1 and x4 we have that w14 = maxx4 [maxx1 G14 − minx1 G14 ] = 3 and
w14 = maxx1 [maxx4 G14 − minx4 G14 ] = 1 therefore w14 = 1.
Fi, j xi xj
x1 2 0 0
F12 0 0 1
F13 0 1 0
1 1 1
x2 x3
G23
Gi, j xi xj
4 0 0
F24 G14 3 0 1
1 1 0
2 1 1
x4
Gm
1
x1
F12 2
2 F13
Gm
2 1
x2 x3
G23
1
F24 2 G14
x4
Figure 12.8: Tree-like constraint network formed by the BMS algorithm when run
on the loopy constraint network of Figure 12.7. Numbers represent weights for
binary constraints.
the sum of the weights of the removed edges, pursuing the objective of
576 Chapter 12
W= ∑ wi j (12.8)
ci j ∈Cr
G2 = minx3 G23 .
m
All the above steps can be performed using a decentralized approach. In par-
ticular, the computation of the maximum spanning tree can be performed in a
distributed fashion using various message-passing algorithms, such as, for exam-
ple, the minimum spanning tree algorithm by Gallager, Humblet and Spira (GHS),
modified to find the maximum spanning tree [14].
As previously mentioned, this approach is able to provide guarantees on solu-
tion quality that are instance-based. Therefore the algorithm must be run on the
specific problem instance in order to obtain the bound. By exploiting knowledge
of the particular problem instance, BMS is able to provide bounds that are very
accurate. For example, the BMS approach has been empirically evaluated in a
mobile sensor domain, where mobile agents must coordinate their movements to
Chapter 12 577
sample and predict the state of spatial phenomena (e.g., temperature or gas con-
centration). When applied in this domain, BMS is able to provide solutions that
are guaranteed to be within 2% of the optimal solution.
Other data dependent approximation approaches with guarantees have also
been investigated. For example, Petcu and Faltings propose an approximate ver-
sion of DPOP [29], and Yeoh et al. provide a mechanism to trade off solution
quality for computation time for the ADOPT and BnB-ADOPT algorithms [40].
Such mechanisms work by fixing an approximation ratio and reducing computa-
tion or communication overhead as much as possible to meet that ratio.
More specifically, BnB-ADOPT fixes a predetermined error bound for the op-
timal solution, and stops when a solution that meets this error bound is found. In
this approach, the error bound is fixed and predetermined off-line, but the num-
ber of cycles required by the algorithm to converge is dependent on the particular
problem instance, and, in the worst case, remains exponential. The BMS approach
discussed above, in contrast, is guaranteed to converge after a polynomial num-
ber of cycles, but the approximation ratio is dependent on the particular problem
instance.
Similar considerations hold with respect to A-DPOP [29]. A-DPOP attempts
to reduce message size (which is exponential in the original DPOP algorithm in
the width of the pseudo-tree) by optimally computing only a part of the messages,
and approximating the rest (with upper and lower bounds). In this case, given a
fixed predetermined approximation ratio, A-DPOP reduces message size to meet
this ratio. Alternatively, given a fixed maximum message size, A-DPOP propa-
gates only those messages that do not exceed that size. As a result of this, the
computed solution is not optimal, but approximate. If the algorithm is used by
fixing a desired approximation ratio, the message size remains exponential. In
contrast, if we fix the maximum message size, the approximation ratio is depen-
dent on the specific problem instance.
7 Conclusions
The constraint processing research area comprises powerful techniques and algo-
rithms that are able to exploit problem structure, and, thus, solve hard problems
efficiently. In this chapter we focused on the DCOP framework where constraint
processing techniques are used to solve decision-making problems in MAS.
This chapter provides an overview of how DCOPs have been used to ad-
dress decentralized decision making problems by first presenting the mathemat-
ical formulation of DCOPs and then describing some of the practical problems
that DCOPs can successfully address. We detailed exact solution techniques for
DCOPs presenting two of the most representative algorithms in the literature:
578 Chapter 12
ADOPT and DPOP. We then discuss approximate algorithms, including DSA and
MGM, before presenting GDL and the max-sum algorithm. Finally, we presented
recent ongoing work that is attempting to provide quality guarantees for these
approaches.
Overall, the DCOP framework, and the algorithms being developed to solve
such problems, represent an active area of research within the MAS community,
and one that is increasingly being applied within real-world contexts.
8 Exercises
1. Level 1 Consider the coordination problem faced by intervention forces in
a rescue scenario. In particular, consider a set of fire fighting units that
must be assigned to a set of fires in order to minimize losses to buildings,
infrastructure, and civilians. Each fire fighting unit can be assigned to just
one fire. However, if more than one unit works on the same fire at the same
time, they can extinguish it faster (collaboration has a positive synergy).
Furthermore, a fire fighting unit can only be assigned to fires that are within
a given distance from its initial position (due to the travel time required
to reach the fire). As a result, any particular fire fighting unit can only be
assigned to a subset of the fires that exist.
• X = {x1 , .., x6 }
• D = {d1 = d3 = d4 = d6 = {Red, Blue}, d2 = d5 = {Red, Blue, Green}}
• C = {< x1 , x2 >, < x1 , x3 >, < x1 , x6 >, < x2 , x3 >, < x3 , x4 >, < x4 , x5 >,
< x4 , x6 >, < x5 , x6 >}
Assume that every node in the graph is controlled by one agent. Find the
computational complexity of DPOP with the following pseudo-tree order-
ing: o1 =< x1 , x2 , x3 , x4 , x5 , x6 >. State whether there is a pseudo-tree order-
ing that would result in less computational complexity and, if so, present an
instance of one.
5. Level 1 Show a complete execution of DPOP that solves the max-CSP for-
mulation of the constraint network provided in Exercise 4 (use either o1 or
a pseudo-tree ordering of your choice).
6. Level 1 Give an execution example of MGM where the algorithm finds the
global optimum, and another where it gets trapped in a local minimum.
9. Level 2 Provide a DCOP problem with at least five variables and a solution
that is 3-optimal but not 4-optimal.
10. Level 3 Consider the bound expressed by Equation 12.9 for the bounded
max-sum approach. Assuming that you know the constraint network topol-
ogy, and the maximum and minimum value for all the functions (but not the
actual values of the functions) in a particular DCOP example, modify the
bounded max-sum technique to provide a bound in this setting. Can you
provide some bounds even before running the max-sum algorithm? Can
you extend the analysis assuming you do not know the constraint network
topology?
11. Level 3 Consider the bounded max-sum technique presented in Section 6.2.
Elaborate on how this technique can be applied to problems that include
hard constraints. In particular, how is the approximation ratio affected by
hard constraints? Under which assumptions can this technique still provide
useful bounds?
12. Level 4 Consider the problem of minimizing the running time of a DCOP
solution algorithm when agents have heterogeneous computation and com-
munication. In this setting it might be beneficial to delegate computation to
agents that have additional computation capabilities, or to minimize mes-
sage exchange between agents that are connected by poor communication
links. Formalize this problem and provide an approach that can take such
heterogeneity into account.
References
[1] S. M. Aji, S. B. Horn, and Robert J. McEliece. On the convergence of iterative de-
coding on graphs with a single cycle. In Proceedings of the International Symposium
on Information Theory (ISIT), pages 276–282, 1998.
[2] S. M. Aji and R. J. McEliece. The generalized distributive law. IEEE Transactions
on Information Theory, 46(2):325–343, 2000.
[3] S. M. Ali, S. Koenig, and M. Tambe. Preprocessing techniques for accelerating the
DCOP algorithm ADOPT. In Proceedings of the Fourth International Joint Confer-
ence on Autonomous Agents and Multiagent Systems, pages 1041–1048, 2005.
[4] Fahiem Bacchus, Xinguang Chen, Peter van Beek, and Toby Walsh. Binary vs.
non-binary constraints. Artif. Intell., 140(1/2):1–37, 2002.
Chapter 12 581
[6] A. Chapman, A. Rogers, N. R. Jennings, and D. Leslie. A unifying framework for it-
erative approximate best response algorithms for distributed constraint optimisation
problems. The Knowledge Engineering Review, 26(4):411–444, 2011.
[7] A. Chechetka and K. Sycara. No-commitment branch and bound search for dis-
tributed constraint optimization. In Proceedings of the Fifth International Joint Con-
ference on Autonomous Agents and Multiagent Systems, pages 1427–1429, 2006.
[10] R. Dechter and R. Mateescu. AND/OR search spaces for graphical models. Artificial
Intelligence, 171:73–106, 2007.
[13] B. J. Frey and D. Dueck. Clustering by passing messages between data points.
Science, 315(5814):972–976, 2007.
[15] A. Gershman, A. Meisels, and R. Zivan. Asynchronous forward bounding for dis-
tributed COPs. Journal Artificial Intelligence Research, 34:61–88, 2009.
[17] Katsutoshi Hirayama and Makoto Yokoo. Distributed partial constraint satisfaction
problem. In Principles and Practice of Constraint Programming, pages 222–236,
1997.
582 Chapter 12
[19] R. J. Kok and N. Vlassis. Using the max-plus algorithm for multiagent decision
making in coordination graphs. In RoboCup-2005: Robot Soccer World Cup IX,
2005.
[20] F. R. Kschischang, B. J. Frey, and H. A. Loeliger. Factor graphs and the sum-product
algorithm. IEEE Transactions on Information Theory, 42(2):498–519, 2001.
[23] R. Mailler and V. Lesser. Solving distributed constraint optimization problems using
cooperative mediation. In Proceedings of the Third International Joint Conference
on Autonomous Agents and MultiAgent Systems, pages 438–445, 2004.
[24] P. J. Modi. Distributed Constraint Optimization for Multiagent Systems. PhD thesis,
Department of Computer Science, University of Southern California, 2003.
[27] J. P. Pearce and M. Tambe. Quality guarantees on k-optimal solutions for distributed
constraint optimization problems. In Proceedings of the Nineteenth International
Joint Conference on Artificial Intelligence, pages 1446–1451, 2007.
[28] A. Petcu. A Class of Algorithms for Distributed Constraint Optimization. PhD the-
sis, Swiss Federal Institute of Technology (EPFL), Lausanne (Switzerland), 2007.
[30] A. Petcu and B. Faltings. DPOP: A scalable method for multiagent constraint opti-
mization. In Proceedings of the Nineteenth International Joint Conference on Arti-
ficial Intelligence, pages 266–271, 2005.
Chapter 12 583
[31] A. Petcu and B. Faltings. MB-DPOP: A new memory-bounded algorithm for dis-
tributed optimization. In Proceedings of the Twentieth International Joint Confer-
ence on Artificial Intelligence, pages 1452–1457, 2007.
[40] W. Yeoh, X. Sun, and S. Koenig. Trading off solution quality for faster computation
in DCOP search algorithms. In Proceedings of the Twenty-First International Joint
Conference on Artificial Intelligence, pages 354–360, 2009.
1 Introduction
The first two chapters of this book covered the general ideas of the agent comput-
ing paradigm, how it differs from traditional approaches such as object orientation,
and what features are expected of multiagent organizations. Chapters 3–12 dis-
cussed other very important aspects of agents: communication, learning, planning,
coalition formation, and many others. In this chapter, we consider the question of
how to effectively program multiagent systems: what are the new techniques and
principles suitable for multiagent systems as opposed to traditional software engi-
neering methods?
All current trends in computing point toward a vision of the future whereby
autonomous software will be needed everywhere, and required to work together
with huge numbers of other autonomous software entities regardless of their lo-
cations. Traditional software engineering and programming languages were not
built with this vision in mind and therefore lack the appropriate methods and tools
to deal with these challenges.
Agent-oriented programming was first introduced by Shoham in 1993 (this is
discussed below in greater detail). While the first decade saw mainly theoretical
approaches, many of which were not yet mature enough for practical program-
ming, the creation of the ProMAS and DALT workshop series (both held with
AAMAS since 2003) and related activity helped to change the picture. With those
efforts, there was a substantial increase in the numbers of researchers interested
in doing research in multiagent programming over the years. ProMAS in par-
588 Chapter 13
Although this chapter is self-contained and does not depend on the rest of the
book, some knowledge of Chapter 2 (on organizations) helps to understand the
organizational level of programming. An important part of this chapter is to pro-
vide the program code (in JAC A M O, a combination of JASON, CA RTAG O and
MOISE) of the example given in Chapter 15 about an assembly cell of a manufac-
turing plant. Thus the reader can compare the agent-oriented software engineer-
ing approach to dealing with this example with our solution using a fully-fledged
multiagent programming language. This scenario is also used as an example for
model checking a multiagent system in Chapter 14.
ming individual agents, and very little was available in terms of programming
abstractions covering the social and environmental dimensions of a multiagent
system as well as agents individually. Important work in that direction is [24, 37],
and we will discuss this progression throughout this chapter.
the time the agent is actually about to act. Agent languages often use par-
tially instantiated plans so that not only the details of a plan but also the
particular (sub)plan to be used for each (sub)goal is only chosen when the
agent is about to act on achieving a particular goal.
Dealing with Plan Failure: Even delaying the decision on particular courses of
action might not be enough to ensure that the agent has chosen a suitable
course of action in a dynamic environment. While executing a plan, the
agent may realize a failure has occurred, so agent languages still need to
provide mechanisms to deal with plan failure.
Rational Behavior: In general, agent applications will require that agents be-
have rationally. Of course even for humans the notion of rational behavior
can be arguable, but the BDI literature [53] has pointed to very concrete
aspects of rationality, and agent programming languages facilitate the task
of programmers who want to ensure rational behavior of their autonomous
software. One example is that if an agent has an intention (i.e., is committed
to the goal of achieving a particular state of affairs) we expect it to reason
about how to achieve that intention, and we do not expect the agent to give
up before the intention is believed to have been effectively achieved, unless
there is good reason to believe it will not be possible to achieve it at all, or
the reason why it became an intention in the first place no longer holds.
(again by both humans and agents who can reason about their own organi-
zation). Needless to say, this means there is the potential for truly adaptive
autonomous systems from a practical perspective.
ming was published by Braubach and Pokhar [18], with much work following it,
e.g., by van Riemsdijk [67].
Finally, at the agent level there are two more important (and related) notions:
plan and intention. A plan is a course of action that under specific circumstances
might help the agent handle a particular event (achieving a long-term goal or re-
acting to changes in beliefs, for example, about the environment). An intention
is an instance of a plan that has been chosen to handle a particular event and has
been partially instantiated with information about the event. This intended means
may contain further goals to achieve, but the particular plan among several possi-
ble alternative plans to achieve the same goal will only be chosen at the time the
goal is selected to be achieved in a later reasoning cycle. This ensures the agent
uses information as up-to-date as possible when committing to particular means
to achieve its goals. The agent abstractions will be discussed in more detail in
Section 4.1, using JASON as a concrete example.
4.1 JASON
As seen above, agent-oriented programming is based on various abstractions that
facilitate the development of software that is both autonomous and social. The
Chapter 13 597
G1: PerformSoilExpAt(A)
P1: SoilExpPlan
4.1.1 Beliefs
In JASON, a belief is represented as a predicate, similar to a Prolog fact. For
example,
lander_signal(low).
However, in JASON beliefs are annotated automatically with the source of the
information represented by the belief (the source can be either sensing, commu-
nication, or created by the agent program itself). The belief above would actually
appear in the belief base as:
1A goal-plan tree is a structure that shows alternative plans to achieve a goal and the goals that
must all be achieved in order for a plan to finish successfully. Such structure is very useful for
depicting an agent programmed in a BDI-based agent-oriented programming language.
598 Chapter 13
lander_signal(low)[source(percept)]
if the information was obtained by perceiving the state of the environment by the
agent’s own sensing mechanisms. Programmers can also create their own anno-
tations if other meta-level information about individual beliefs are useful in their
particular applications. Examples of meta-level information related to a belief are
the time it was created and the degree of certainty with which the belief is held.
In JASON, beliefs may also include reasoning rules, similar to Prolog, and
beliefs can be stored in databases when useful. Of course there are variations
for different agent programming languages but the essential aspect of beliefs is to
represent part of the system state and, importantly, be able to update such repre-
sentation as often as possible.
4.1.2 Goals
!battery(charged)
means that the agent wishes to bring about a state of affairs in which the battery is
believed to be charged (which normally implies that the agent believes the battery
is not currently charged and therefore this will lead the agent to commit itself to
acting so as to bring about such state of affairs).
As mentioned earlier, the best-known type of goal in agent programming refers
to the notion of declarative achievement goals. This is precisely the type of goal
in the example above: the agent currently believes battery(charged) is not
true and will act so as bring about a state of the world where it does believe so.
It is up to the programmer to use the right pattern of plans [39] to ensure that the
agent will continue to act rationally towards achieving that goal, even if initial
attempts fail to do so.
In the goal-plan tree example, at least by the way the goals were written, they
would all seem to be perform goals (i.e., the goal to execute a sequence of ac-
tions without further consideration about the state of affairs), although some of
them might be better modeled as achievement goals and some even as mainte-
nance goals. For example, the goal to be at a particular location is typically an
achievement goal, and the plan to transmit results could have used a maintenance
goal for the rover to keep sufficiently close to the lander while transmitting data
to earth (i.e., the goal to maintain a position sufficiently close to the lander).
Chapter 13 599
4.1.3 Plans
An AgentSpeak plan has three parts: the trigger, the context, and the body. The
trigger part of the plan is used to formulate in which kind of events the plan is to
be used (events are additions or deletions of beliefs or goals). The context says
in which circumstances the plan is expected to succeed, so that the best course of
action might be chosen in different circumstances. The body, finally, has a course
of action, containing also further (sub)goals that the agent should commit to in
order to handle the particular event. The syntax is as follows:
The examples below show the plans required for an agent to handle the declar-
ative achievement goal of positioning itself at some particular location.
The simplest JASON agent implementation that would lead to a goal-plan tree
for the goal to do a soil experiment as shown in Figure 13.1 is as follows, where
the actions in the plans have been invented so as to keep the example as simple as
possible. Note that the preconditions for the plans are not shown in the diagram,
so the plans’ contexts are all empty although it is not difficult to guess what they
should be; also, the ‘@’ symbol is used to label (i.e., to name) a plan in JASON,
which is not necessary but included here as it might help with matching the goal-
plan tree to the code.2
@p1_SoilExpPlan
+!performSoilExpAt(A)
<- !moveToLoc(A);
!performSoilAnalysisAt(A);
!transmitResults.
@p2_MoveToPlan(A)
+!moveToLoc(A) <- move_to(A).
2 It
is not clear in this example whether the authors meant for plans P6 and P8 to be different
means of achieving the same goal, but we assume in this code excerpt that this is not the case.
600 Chapter 13
@p3_AnalyseSoilPlan(A)
+!performSoilAnalysisAt(A) <- perform_analysis.
@p4_TransmitResultsPlan1
+!transitResults <- !transmitData.
@p5_TransmitResultsPlan2
+!transitResults <- !moveCloseToLander; !transmitData.
@p6_TransmitDataPlan
+!transmitData <- transmit_all_data.
@p7_MoveClosePlan
+!moveCloseToLander <- move_to(lander).
4.1.4 Semantics
For most of the BDI-based programming languages that have formal semantics, the
semantics has traditionally been given using structural operational semantics [50].
Operational semantics for AgentSpeak including some of the extended features
made available in JASON has appeared, e.g., in [13, 15, 68], and see also [14]. We
do not aim at showing the semantics of an agent programming language in this
chapter, but we show a couple of rules of the semantics of JASON to give a flavor
of what the semantics of such languages look like. In the references above, full
details of the semantics of AgentSpeak/JASON can be found, and the references
in Section 4.2 can be used to find the semantics of other BDI-based languages
(out of the ones discussed in that section, only G OAL and 2APL also have formal
semantics).
Operational semantics is given by a transition system where a transition rela-
tion on configurations of the system is induced by logical rules. A configuration of
the system is formally defined as representing a state and abstractly also the archi-
tecture of an agent programmed in that particular programming language. As for
rules, the part of the rule below the horizontal line states that an agent in a given
state (i.e., configuration) can transition to another (the transition relation is repre-
sented by a long right arrow) if the conditions above the line are met. Examples
of the semantic rules for the AgentSpeak language are as follows.
An agent’s reasoning cycle starts with the selection of a particular event (rep-
resenting changes in beliefs or goals of the agent) to be handled in that particular
cycle. An event is represented by a plan triggering event te and an intention i. The
rule for event selection below assumes the existence of a (user-defined) selection
function SE that selects events from a set of events E (which is a component of
the configuration element C representing the agent’s current circumstances). The
Chapter 13 601
selected event is removed from E and it is assigned to the ε component of the el-
ement of the configuration T that is used to record temporary information needed
in later stages of the reasoning cycle. Note how the last component of the config-
uration keeps track of the various stages a reasoning cycle goes through: from the
stage where an event is being selected we go to the one where the relevant plans
(plans written to handle that type of event) will be selected. Another rule, SelEv2
(not shown here), skips to the intention execution part of the cycle, in case there
is no event to handle.
SE (CE ) = te, i
(S EL E V1 )
ag,C, M, T, SelEv
−→ ag,C , M, T , RelPl
where: CE = CE \ {te, i
}
Tε = te, i
The next example we give is from a much later part of the reasoning cycle
where the agent has already selected one of its intentions to execute. The formula
that appears at the beginning of the body of the plan at the top of the intention will
be executed, and there are various rules depending on the type of that particular
formula. The rule below is for the case where an action is to be executed. The
action a in the body of the plan is added to the set of actions A. The action is
removed from the body of the plan and the intention is updated to reflect this re-
moval. It is part of the overall agent architecture to then take that action execution
request from the A component and to associate that request with the particular
agent “effectors” to which the action corresponds.
Tι = i[head ← a; h]
(ACTION)
ag,C, M, T, ExecInt
−→ ag,C , M, T , ClrInt
where: CA = CA ∪ {a}
Tι = i[head ← h]
CI = (CI \ {Tι }) ∪ {Tι }
for agents, the agent platform, and an extensive run-time tool suite. It allows
for programming intelligent software agents in XML and Java and can be de-
ployed on different kinds of middleware such as JADE. The developed software
framework is available under GNUs LGPL license, and is continuously evolv-
ing (available on SourceForge.net). We refer the interested reader to http:
//jadex-agents.informatik.uni-hamburg.de/ and [17, 51].
2APL provides programming constructs both (1) to specify a multiagent sys-
tem in terms of a set of individual agents and a set of environments, as well
as (2) to implement cognitive agents based on the BDI architecture (agent’s be-
liefs, goals, plans, actions, events, and a set of rules through which the agent
can decide which actions to perform). 2APL is a modular programming lan-
guage allowing the encapsulation of cognitive components in modules. Its graph-
ical interface, through which a user can load, execute, and debug 2APL multi-
agent programs using different execution modes and several debugging/obser-
vation tools, is shown in Figure 13.3. For further detail we refer the reader to
http://apapl.sourceforge.net/ and [2, 21].
AGENT FACTORY (see Figure 13.4) has at its core a FIPA-standards-based
run-time environment (RTE), which provides support for the deployment of het-
erogeneous agent types, ranging from pure Java-based agents to customizable
agent architectures and agent programming languages. While much work in the
past has focused on the development of the AGENT FACTORY agent program-
Chapter 13 603
ming language (AFAPL), more recent work has resulted in the common language
framework, a suite of components for AGENT FACTORY that are intended to help
simplify the development of diverse logic-based agent programming languages
(APLs). For further references see http://www.agentfactory.com and
[41, 44].
B RAHMS (see Figure 13.5) can be seen both as a programming language
as well as a behavioral modeling language. It allows users to model com-
plex agent organizations to simulate people, objects, and environments. Agents
can deal with time and can be easily integrated with Java agents. A partic-
ularly exciting application is the multiagent system OCAMS that was devel-
oped with B RAHMS and is running continually in NASA’s ISS mission con-
trol: http://ti.arc.nasa.gov/news/ocams-jsc-award/. Refer to
http://www.agentisolutions.com/ and [19, 61, 66] for details.
Java VM
Brahms VM
Agent
Agent
Agent
init module{
A G OAL agent program knowledge{
is a set of modules clear(table) . clear(X) :- block(X), not(on(_, X)), not(holding(X)) .
…
that consist of vari- }
The init module
% no initial beliefs about block configuration.
ous sections including goals{ initializes the agent,
on(a,b), on(b,c), on(c,table), on(d,e), on(e,f), on(f,table). here by defining
knowledge, beliefs, }
actionspec{
knowledge, an initial
goals, a program section pickup(X) { pre{ clear(X), not(holding(_)) } post{ true } } goal, and action
that contains action }
… specifications
rules, and action speci- }
fications. Each of these % moving X on top of Y is a constructive move if that move results in X being in position.
#define constructiveMove(X, Y) a-goal(tower([X, Y|T])), … .
sections is represented Macro definitions to create
main module{
in a knowledge repre- program{ more readable code.
if a-goal( holding(X) ) then pickup(X) . % put a block you're holding down.
sentation language such if bel( holding(X) ) then {
as Prolog, answer set if constructiveMove(X,Y) then putdown(X, Y) . The main module is
if true then putdown(X, table) .
used to code the
programming, SQL (or }
}
agent’s deliberation
Datalog), or the plan- } using rules for
selecting actions.
ning domain definition event module{
program{
language. The figure #define inPosition(X) goal-a( tower([X|T]) ) . % block in position if it achieves a goal.
on the right illustrates % rules for processing percepts (assumes full observability).
forall bel( block(X), not(percept(block(X))) ) do delete( block(X) ) .
these sections. For forall bel( percept(block(X)), not(block(X)) ) do insert( block(X) ) .
…
details, refer to http: } Rules in the event
}
//mmi.tudelft. module are used to
module adoptgoal{ process percepts and
nl/trac/goal and … messages that the
[34, 36]. } User-defined modules. agent receives.
The second one is ConGolog, a language based on the situation calculus of first-
order logic (to be more precise it is a dynamic logic combined with ideas from the
situation calculus).
Buyer Seller
contractProposal
accept
refuse
contractProposal
acknowledge
accept
refuse
contractProposal
acknowledge
chandise to the buyer (if no concurrent actions are available, answering and ship-
ping merchandise will be executed sequentially). (5) If there is not enough mer-
chandise in the warehouse or the price is lower or equal to a min value, the seller
agent refuses by sending a refuse message to the buyer. (6) If there is enough mer-
chandise in the warehouse and the price is between min and max, the seller sends a
contractProposal to the buyer with a proposed price evaluated as the mean of the
price proposed by the buyer and max. (7) The merchandise to be exchanged are
oranges, with minimum and maximum price of 1 and 2 euro, respectively. The
initial amount of oranges that the seller possesses is 1,000.
Chapter 13 607
4.3.1 METATEM
The other two temporal logic rules look very similar. They formalize (1) If
there was a previous state where Buyer sent a contractProposal message to seller,
and in that previous state the conditions were not met to accept the Buyer’s pro-
posal, then send a refuse message to Buyer, and (2) If there was a previous state
where Buyer sent a contractProposal message to seller, and in that previous state
the conditions were met to send a contractProposal back to Buyer, then send a
contractProposal message to Buyer with a new proposed price.
ConGolog ([32]) and IndiGolog ([33]) are languages extending Golog, a language
based on the situation calculus introduced by McCarthy. Golog stands for alGOl
in LOGic.
Actions are described as in the classical STRIPS approach: they have precon-
ditions that must be satisfied in order to apply the action. The postcondition then
describes the change of the world. The evolution of the world is described within
the logical language by fluents, which are terms in the language. The effects of an
action is formalized by successor-state axioms: they describe what the successor
state of a given state looks like if an action is applied.
Whereas first-order logic as a specification language using deduction (i.e., a
theorem prover) as the underlying procedural mechanism is often too slow and
difficult to handle for the non-expert, Golog is a programming language that hides
the application of the situation calculus and is thus much more user-friendly. Pro-
cedures in Golog actions are reduced to primitive actions which refer to actions in
the real world, such as picking up objects, opening doors, moving from one room
to another, and so on. Golog allows programmers to state procedures of the form
proc seller-life-cycle
if receiving(Buyer, seller,
contractProposal(Merchandise, Required-amount, Price), now)
then
if storing(Merchandise, Amount, now)
∧ Amount ≥ Required-amount
∧ Price ≥ max-price(Merchandise)
then ship(Buyer, Merchandise, Required-amount) .
send(seller, Buyer, accept(Merchandise, Required-amount, Price))
else
if storing(Merchandise, Amount, now) ∧ Amount ≥ Required-amount
∧ min-price(Merchandise) < Price < max-price(Merchandise)
then send(seller, Buyer, contractProposal(Merchandise,
Required-amount, (Price + max-price(Merchandise))/2))
else nil
5.1 Organizations
Organizations for multiagent systems and normative systems for agent societies
have turned into major research topics in multiagent systems in the last few years.
Many different approaches and related frameworks have been developed through
multidisciplinary research. As is common in computer science, some research
work focuses exclusively on theoretical aspects, and only some frameworks are
610 Chapter 13
worked out so as to became sufficiently practical for actual use in the develop-
ment of software for multiagent applications. In this section, we concentrate on
a few approaches available in the literature that have direct relevance to agent
programming. Chapter 2 of this book covers some of the topics in multiagent
organizations.
5.1.1 MOISE
MOISE is one of the best-known approaches for modeling and programming
multiagent organizations. The MOISE organization modeling language explicitly
decomposes the specification of organizations into structural, functional, and nor-
mative dimensions [38, 40]. The modeling language is accompanied by a graph-
ical language (examples of MOISE diagrams will appear in Section 6) and XML
is used to store the organizational specifications. Those specifications are then
managed by an organization management infrastructure at run-time. Because the
organization is managed at run-time from its explicit representation, agents can
in principle reason about their own organization and change it during the system
execution.
The structural dimension specifies the roles, groups, and links (e.g., communi-
cation) that exist within the organization. The definition of roles is such that when
an agent chooses to play some role in a group, it is accepting some behavioral con-
straints and rights related to this role. The functional dimension determines how
the global collective goal should be achieved, i.e., how these goals are decom-
posed (through social plans) and grouped into coherent sets of subgoals (called
missions) to be distributed among the agents. Such decomposition of a global goal
results in a goal tree, called scheme, where the leaf-goals can be achieved individ-
ually by the agents. The normative dimension binds the structural dimension with
the functional one by means of the specification of permissions and obligations
toward missions assigned to particular roles. When an agent chooses to play some
role in a group, it accepts these permissions and obligations.
A mission defines all the goals an agent playing a given role commits to when
participating in the execution of a scheme. The normative specification relates
roles and missions through norms. Note that a norm in MOISE is always an obli-
gation or permission to commit to a mission. Goals are therefore indirectly linked
to roles since a mission is a set of goals. Prohibitions are assumed “by default”
with respect to the specified missions: if the normative specification does not in-
clude a permission or obligation for a particular role-mission pair, it is assumed
that the role does not grant the right to commit to the mission.
We do not give details of MOISE here although some MOISE diagrams appear
in Section 6 where we give an example of a multiagent program written for the
JAC A M O platform. Further details can be found in the references given above.
Chapter 13 611
5.2 Environments
Sometimes an agent is developed for a whole class of applications, not just for a
single one. When an agent should be usable for many applications, it might be sit-
uated in different environments where it should do its duties accordingly. From an
abstract point of view, it would be desirable if agents could be developed as inde-
pendently of a particular environment as possible. That would make it possible to
develop interesting environments and powerful multiagent systems independently
of each other. Unfortunately, this is not normally the case as there is no general
agreement upon what belongs to shared environments and what belongs to the
agent platforms, and how exactly they interact.
In this section we describe two approaches that complement each other:
Environment Interface
APL EIS
Platform 1 Client
EIS
··· ··· Environment
Server
APL EIS
Platform n Client
Figure 13.8: Ideal world: Heterogenous agents acting within a shared environ-
ment.
5.2.1 CA RTAG O
CLOCK WHITEBOARD
BAKERY
artifact artifact workspace
ARCHIVE
artifact
COM. CHANNEL
artifact
RESOURCE
artifact
TASK SCHEDULER
artifact
Figure 13.10: A bakery used as a metaphor to frame the notion of workspaces and
artifacts as resources and tools used by agents to work.
has
create
Workspace Artifact dispose
link
link
generate
Observable
perceive
Property
observe
join
quit
/* Jason agent creating and using // Jason agent observing the counter
the counter */
!observe
!create_and_use.
+!observe
+!create_and_use : true <- lookupArtifact("c0",Id);
<- makeArtifact("c0","Counter",[],Id); focus(Id).
inc;
inc[artifact_id(Id)]. +count(N)
<- println("perceived new value: ",N).
Figure 13.13: Snippets of two JASON agents creating and using a shared counter
(on the left) and observing the shared counter (on the right).
are then directly perceived as changes in the beliefs related to those properties.
5.2.2 EIS
The overall idea of EIS is to set up a standard interface for connecting agent plat-
forms to environments. As a result, agent platforms that support the interface can
connect to any environment that implements the interface. This will significantly
reduce the effort required from agent and environment programmers: the environ-
ment code needed to implement the interface needs to be written only once for
each platform.
EIS has been crafted based upon a detailed analysis of some agent languages
shown in Table 13.1 (taken from [7]).
618 Chapter 13
Platform
Environment
Environment
Interface
Model
Standard
Agents
Figure 13.14: Components of EIS. The platform containing the agents is separated
from the environment model. The interface layer acts as a link to facilitate the
interaction of the components.
Figure 13.14 presents the meta-model schematically. It also depicts the re-
lations that need to be supported between the various components. One of the
most important relations that should be part of an agent-environment interface
is what we call the agents-entities-relation. This relation associates agents with
controllable entities in the environment. The relation is maintained to provide
basic bookkeeping functionality. This bookkeeping functionality provides a key
role as it determines which agents are allowed to control which entities and also
determines which percepts should be provided to which agents.
For further details on EIS, we refer the reader to http://sf.net/
projects/apleis/, http://cig.in.tu-clausthal.de/apleis,
[4, 6, 69, 70] and the references therein.
is a lot more than three platforms working together. By offering first-class ab-
stractions for the agent (JASON), social (MOISE), and environment (CA RTAG O)
levels of a multiagent system, it has for the first time in fully working operation
unraveled the full potential of MAOP: it allows the development of fully-fledged
multiagent systems using very high-level programming. JAC A M O can be down-
loaded from http://jacamo.sourceforge.net.
1. robot1 loads an A part into one of the jigs on the rotating table
7. the flipper flips the part over (“BA”) at the same time as robot1 loads a C
part into the jig
10. robot2 joins the C and BA parts, yielding a complete ABC part
simple
flipper
controller
1..1
loader joiner rotator
inheritance
role
legend
assembly_cell_group role cardinality
group min..max
manufacture ABC
The plan above says that whenever the agent comes to believe that it has a new
obligation toward an organizational goal Goal (note the use of JASON higher-
order variables here), it just tries to achieve that goal and, if all goes well, the
agent (through an ORA4MAS artifact) tells the organization that the goal it was
obliged to achieve has been achieved (this is important so that the organization
can then delegate further goals to be achieved, possibly by other agents).
In this application, the actual behavior for agents “loader,” “joiner,” and “flip-
per” is to simply adopt its predetermined role (done by the first plan below) and
then do whatever it is asked to do. For example, when the MOISE scheme de-
termines the adoption of goal !a_loaded, the agent should just do the action
a_loaded that activates the loading mechanism in the actual factory. This is
possible because the name of such an operation (i.e., an external action for the
JASON agent that executes the corresponding artifact operation) in the artifact
simulating the manufacturing cell is the same as the goal itself. The second plan
below shows that this can be done in a generic way (through the use of the higher-
order variable G below) for any organizational goal received. The first plan makes
the agent join the ORA4MAS workspace so as to take part in the organization;
626 Chapter 13
These three agents have only the code above, nothing else. The only agent
that requires more complex behavior is the “rotator.” In the MOISE scheme for the
manufacturing process, the rotator is assigned two different goals: to wait for an
empty jig and to get the table rotated. Below we show the various plans needed
to achieve these goals. In the beginning there are two simple Prolog-like rules
used to facilitate the plan contexts. They check the number of instances of the
manufacturing scheme in MOISE so as to check if there are one or two concurrent
orders being manufactured by this cell (each order is handled by one scheme in-
stance; this will be further detailed later in this section). In the code below, note
that the name of the scheme that requested the achievement of a particular goal is
annotated in the new goal events generated by the agent architecture.
// rule to check if we have two concurrent orders (2 Moise schemes)
two_orders :- schemes(L) & .length(L)==2.
// or only one order so far
one_order :- schemes(L) & .length(L)==1.
// avoid conflicts when 2 orders are simultaneously waiting for empty jigs
+!wait_for_empty_jig[scheme(S1)] :
.desire(wait_for_empty_jib[scheme(S2)]) & S1\==S2 <-
.wait(500); // wait a bit
!wait_for_empty_jig[scheme(S1)]. // and try again
// will have to wait until the jig at the loader end is empty
+!wait_for_empty_jig[scheme(S)] <-
.wait({+jig_loader("empty")}); // wait until this event happens
reserve_jig(S); // make sure empty jig is allocated to this order
// if there are pending requests to rotate the table
Chapter 13 627
if (.desire(table_rotated[scheme(S)])) {
// might need reconsidering which plan to use for rotating
.drop_desire(table_rotated[scheme(S)]);
!!table_rotated[scheme(S)];
}.
// Let it rotate if another job needs it and we’re waiting for an empty jig
+!table_rotated :
two_orders & .desire(wait_for_empty_jig) & not jig_loader("empty") <-
table_rotated.
The comments in the code above explain all of the details. It will be noted that
as a new order can be started at any time during the manufacturing of another, it
is a rather difficult synchronization problem that the rotator has to solve.
Finally, the cell manager agent has mostly procedural code to create the sim-
ulation artifacts and initialize the organization. Other than that, it has only a few
plans, the following one being the longest:
// each order generates an instance of the Manufacture scheme
@op1[atomic] // needs to be an atomic operation: changing the no. of schemes
+order(N) :
formationStatus(ok)[artifact_id(GrArtId)]
& schemes(L) & .length(L)<=1 <- // no more than 1 order under way
// wait until empty jig is correctly positioned at loader robot
.concat("order", N, SchemeName);
makeArtifact(SchemeName, "ora4mas.nopl.SchemeBoard",
["src/manufacture-os.xml", manufacture_schm, false, true], SchArtId);
focus(SchArtId); // get all info about this Moise scheme
addScheme(SchemeName)[artifact_id(GrArtId)].
This plan accepts at most two concurrent manufacturing orders, and creates the
necessary ORA4MAS scheme artifact to handle a new (simulated) manufacturing
order. Focusing on the scheme allowed the cell manager to automatically perceive
the state of the scheme; for example, it needs to know when the order has been
628 Chapter 13
completed so that a new one can be accepted. This plan also needs to add the
newly created scheme to the (well-formed) MOISE group.
void init() {
taskId = 0;
}
6 Available at http://cartago.sourceforge.net.
Chapter 13 629
Note that in this section we only included excerpts, although all the important
parts of the code were covered. Still, we strongly encourage the reader to look
at the complete (fully commented) code and run the system. The working ex-
ample for one manufacturing cell can be downloaded from http://www.inf.
ufrgs.br/~bordini/WeissBookChapter13Ex, and we leave as an ex-
ercise to use the CNP artifacts for extending to multiple cells.
7 Conclusions
The type of programming made available in practice for the first time with
JAC A M O is effectively very new. Although agent-oriented programming has
been around for many years, it was only with the combination of agent-oriented
programming with organization-oriented programming and environment-oriented
programming that the true potential of a programming paradigm inspired by multi-
agent systems was unraveled.
Of course there are many open issues in this programming paradigm. There
are specific issues at the various levels, for example, the issue of encapsulation
of agent code has some proposed solutions but more is required in the way of
experimentation so that the best approaches can be determined. The combination
of the three levels of abstraction, being rather recent, is also likely to require
much further research still. Equally, we should expect much progress in practical
programming tools in order to make the approach usable in industry.
However, with the trends in computing toward a vision of the future where
autonomy and large-scale interaction will be required by so much software, it is
reasonable to expect that MAOP will play an important role in the mainstream
computing industry in that future.
630 Chapter 13
Acknowledgments
We thank Alessandro Ricci for providing us with some of the material that went
into Section 5.2.1. Section 5.2.2 was mainly based on [5, 7] and we thank the
authors for allowing us to use a few excerpts from those papers. We also thank
Koen Hindriks, Maarten Sierhuis, Lars Braubach, Alexander Pokahr, Mehdi Das-
tani, and Rem Collier for providing us with some material that went into Section
4.2. Tristan Behrens, Michael Köster, and Federico Schlesinger proofread parts
of this chapter and gave us useful comments.
Rafael Bordini acknowledges the support of CNPq grant 307924/2009-2. Jür-
gen Dix acknowledges that this work was partly funded by the NTH School for IT
Ecosystems. (NTH [Niedersächsische Technische Hochschule] is a joint univer-
sity consisting of Technische Universität Braunschweig, Technische Universität
Clausthal, and Leibniz Universität Hannover.)
8 Exercises
1. Level 1 Discuss how the programming techniques presented in this chapter
address each of the required features of autonomous systems presented in
Subsection 2.2.
2. Level 1 Create a mind map of all the programming abstractions at the three
different levels of a multiagent system, available in full MAOP as well as
related concepts and ideas.
3. Level 1 Using JASON, give the Mars rover agent used as an example in
Section 4.1 further know-how (for example taking panoramic pictures and
performing spectrometric analysis of rock samples) and write a complex
plan with a particular mission for the rover using both the old and new
know-how.
8. Level 3 Redesign the whole manufacturing system so that the levels of ab-
straction in JAC A M O take responsibility for different aspects of the appli-
cation. For example, avoid coordination at the organization level and leave
it for direct agent communication, or avoid using contract-net at the level of
the environment. Compare the performance of the system (as in the exercise
above) and, most importantly, compare also the elegance of the new version
of the code with the original solution.
10. Level 3 Design and implement a multiagent system for a complex applica-
tion using JAC A M O.
11. Level 4 In the version of JAC A M O used in this book, there are no program-
ming constructs to create interaction protocols. Propose a general mech-
anism for this and integrate it within the JAC A M O platform as a concrete
implementation.
13. For the next few exercises, we refer to the multiagent contest site http:
//www.multiagentcontest.org. There, a whole platform, contest
scenarios, as well as several dummy agents to start with (in several agent
languages) are available. The following exercises can be based on your
favorite agent language.
14. Level 1 In this exercise you learn how to start the server, the monitor, and
some dummy agents. Please note that a Unix-Shell, preferable a bash, is
needed. For Windows you can use MSYS http://www.mingw.org/
wiki/MSYS.
632 Chapter 13
23. Level 2 Design a strategy for defending your own zones and attacking the
zones of the opponents.
References
[1] Huib Aldewereld and Virginia Dignum. OperettA: Organization-Oriented Develop-
ment Environment. In Mehdi Dastani, Amal El Fallah-Seghrouchni, Jomi Hübner,
and João Leite, editors, LADS, volume 6822 of Lecture Notes in Computer Science,
pages 1–18. Springer, 2010.
[2] Natasha Alechina, Mehdi Dastani, Brian Logan, and John-Jules Ch. Meyer. Rea-
soning about Agent Deliberation. Autonomous Agents and Multi-Agent Systems,
22(2):356–381, 2011.
[3] J. L. Austin. How to Do Things with Words. Oxford University Press, London, 1962.
[4] Tristan Behrens. Towards Building Blocks for Agent-Oriented Programming. PhD
thesis, Department of Computer Science, Clausthal University of Technology, Ger-
many, 2012.
[5] Tristan Behrens, Rafael Bordini, Lars Braubach, Mehdi Dastani, Jürgen Dix, Koen
Hindriks, Jomi Hübner, and Alexander Pokahr. An Interface for Agent-Environment
Interaction. In Rem Collier, Jürgen Dix, and Peter Novák, editors, Proceedings of
the 8th International Workshop on Programming Multi-agent Systems (ProMAS),
volume 6599 of LNCS, pages 170–185, Heidelberg, Germany, 2011. Springer Ver-
lag.
[6] Tristan Behrens, Mehdi Dastani, Jürgen Dix, and Peter Novák. The Multi-agent
Programming Contest from 2005–2010. Annals of Mathematics and Artificial Intel-
ligence, 59(3–4):277–311, 2010.
[7] Tristan Behrens, Koen Hindriks, and Jürgen Dix. Towards an Environment Interface
Standard for Agent Platforms. Annals of Mathematics and Artificial Intelligence,
61(1–2):3–38, 2011.
[9] Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni,
editors. Multi-agent Programming: Languages, Platforms and Applications, vol-
ume 15 of Multiagent Systems, Artificial Societies, and Simulated Organizations.
Springer, 2005.
[10] Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni.
Preface – Special Issue on Programming Multiagent Systems. International Journal
of Agent-Oriented Software Engineering, 1(3/4), 2007.
[11] Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni,
editors. Multi-agent Programming: Languages, Tools and Applications. Springer,
2009.
[12] Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni.
Preface. Autonomous Agents and Multi-Agent Systems, 23(2):155–157, 2011.
[13] Rafael H. Bordini and Jomi Fred Hübner. Semantics for the JASON Variant of
AgentSpeak (Plan Failure and some Internal Actions). In Helder Coelho, Rudi
Studer, and Michael Wooldridge, editors, ECAI, volume 215 of Frontiers in Arti-
ficial Intelligence and Applications, pages 635–640. IOS Press, 2010.
[14] Rafael H. Bordini, Jomi Fred Hübner, and Michael Wooldridge. Programming
Multi-agent Systems in AgentSpeak Using JASON. Wiley Series in Agent Tech-
nology. John Wiley & Sons, 2007.
[15] Rafael H. Bordini and Álvaro F. Moreira. Proving BDI Properties of Agent-Oriented
Programming Languages. Annals of Mathematics and Artificial Intelligence, 42(1-
3):197–226, 2004.
[16] Michael E. Bratman. Intentions, Plans and Practical Reason. Harvard University
Press, Cambridge, MA, 1987.
[18] Lars Braubach, Alexander Pokahr, Daniel Moldt, and Winfried Lamersdorf. Goal
Representation for BDI Agent Systems. In Rafael H. Bordini, Mehdi Dastani, Jürgen
Dix, and Amal El Fallah-Seghrouchni, editors, ProMAS, volume 3346 of Lecture
Notes in Computer Science, pages 44–65. Springer, 2004.
[19] William J. Clancey, Maarten Sierhuis, Charis Kaskiris, and Ron van Hoof. Ad-
vantages of Brahms for Specifying and Implementing a Multiagent Human-Robotic
Exploration System. In Ingrid Russell and Susan M. Haller, editors, Proc. FLAIRS
Conference, pages 7–11. AAAI Press, 2003.
Chapter 13 635
[20] Rem Collier, Jürgen Dix, and Peter Novák, editors. Programming Multi-agent Sys-
tems, Revised Selected and Invited Papers of ProMAS 2010, volume 6599 of Lecture
Notes in Computer Science. Springer, 2011.
[22] Mehdi Dastani, Amal El Fallah-Seghrouchni, João Leite, and Paolo Torroni, edi-
tors. Languages, Methodologies, and Development Tools for Multi-agent Systems,
Second International Workshop, LADS 2009, Torino, Italy, September 7-9, 2009, Re-
vised Selected Papers, volume 6039 of Lecture Notes in Computer Science. Springer,
2010.
[24] Mehdi Dastani, Davide Grossi, John-Jules Ch. Meyer, and Nick A. M. Tinnemeier.
Normative Multi-agent Programs and Their Logics. In John-Jules Ch. Meyer and Jan
Broersen, editors, KRAMAS, volume 5605 of Lecture Notes in Computer Science,
pages 16–31. Springer, 2008.
[25] Jürgen Dix and Michael Fisher. Where Logic and Agents Meet. Annals of Mathe-
matics and Artificial Intelligence, 61(1):15–28, 2011.
[26] Jürgen Dix and Joao Leite. Special Issue: Selected Papers of CLIMA 2010. Annals
of Mathematics and Artificial Intelligence, 62(1/2), 2011.
[27] Michael Fisher. Temporal Semantics for Concurrent MetateM. Journal of Symbolic
Computation, 22(5/6):627–648, 1996.
[28] Michael Fisher. A Normal Form for Temporal Logics and its Applications in
Theorem-Proving and Execution. Journal of Logic and Computation, 7(4):429–456,
1997.
[29] Michael Fisher, Rafael H. Bordini, Benjamin Hirsch, and Paolo Torroni. Computa-
tional Logics and Agents: A Road Map of Current Technologies and Future Trends.
Computational Intelligence, 23(1):61–91, 2007.
[30] Jakub Gemrot, Rudolf Kadlec, Michal Bída, Ondrej Burkert, Radek Píbil, Jan
Havlícek, Lukás Zemcák, Juraj Simlovic, Radim Vansa, Michal Stolba, Tomás Plch,
and Cyril Brom. Pogamut 3 Can Assist Developers in Building AI (Not Only) for
Their Videogame Agents. In Frank Dignum, Jeffrey M. Bradshaw, Barry G. Silver-
man, and Willem A. van Doesburg, editors, AGS, volume 5920 of Lecture Notes in
Computer Science, pages 1–15. Springer, 2009.
[31] Michael P. Georgeff and Amy L. Lansky. Reactive Reasoning and Planning. In
AAAI, pages 677–682, 1987.
636 Chapter 13
[33] Giuseppe De Giacomo, Yves Lespérance, Hector J. Levesque, and S. Sardinia. In-
diGolog: A High-Level Programming Language for Embedded Reasoning Agents.
In R.H̃. Bordini, M. Dastani, J. Dix, and A. El Fallah-Segrouchni, editors, Multi-
agent Programming: Languages, Tools and Applications, pages 31–72. Springer,
2009.
[35] Koen V. Hindriks, Frank S. de Boer, Wiebe van der Hoek, and John-Jules Ch.
Meyer. Agent Programming in 3APL. Autonomous Agents and Multi-Agent Sys-
tems, 2(4):357–401, 1999.
[36] Koen V. Hindriks and Tijmen Roberti. GOAL as a Planning Formalism. In Lars
Braubach, Wiebe van der Hoek, Paolo Petta, and Alexander Pokahr, editors, MATES,
volume 5774 of Lecture Notes in Computer Science, pages 29–40. Springer, 2009.
[37] Jomi Fred Hübner, Olivier Boissier, and Rafael H. Bordini. From Organisation Spec-
ification to Normative Programming in Multi-agent Organisations. In Jürgen Dix,
João Leite, Guido Governatori, and Wojtek Jamroga, editors, CLIMA XI, volume
6245 of Lecture Notes in Computer Science, pages 117–134. Springer, 2010.
[38] Jomi Fred Hübner, Olivier Boissier, Rosine Kitio, and Alessandro Ricci. Instru-
menting Multi-agent Organisations with Organisational Artifacts and Agents. Au-
tonomous Agents and Multi-Agent Systems, 20(3):369–400, 2010.
[39] Jomi Fred Hübner, Rafael H. Bordini, and Michael Wooldridge. Plan Patterns for
Declarative Goals in AgentSpeak. In Hideyuki Nakashima, Michael P. Wellman,
Gerhard Weiss, and Peter Stone, editors, AAMAS, pages 1291–1293. ACM, 2006.
[40] Jomi Fred Hübner, Jaime Simão Sichman, and Olivier Boissier. Developing Organ-
ised Multiagent Systems using the MOISE. IJAOSE, 1(3/4):370–395, 2007.
[41] Howell R. Jordan, Jennifer Treanor, David Lillis, Mauro Dragone, Rem W. Collier,
and Gregory M. P. O’Hare. AF-ABLE in the Multi-Agent Programming Contest
2009. Annals of Mathematics and Artificial Intelligence, 59(3-4):389–409, 2010.
[42] John E. Laird. Extending the SOAR Cognitive Architecture. In Pei Wang, Ben
Goertzel, and Stan Franklin, editors, AGI, volume 171 of Frontiers in Artificial In-
telligence and Applications, pages 224–235. IOS Press, 2008.
Chapter 13 637
[43] Hector J. Levesque, Raymond Reiter, Yves Lespérance, Fangzhen Lin, and
Richard B. Scherl. GOLOG: A Logic Programming Language for Dynamic Do-
mains. J. Log. Program., 31(1-3):59–83, 1997.
[44] David Lillis, Rem W. Collier, Mauro Dragone, and Gregory M. P. O’Hare. An
Agent-Based Approach to Component Management. In Proc. 8th International Joint
Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 529–
536, 2009.
[45] Viviana Mascardi, Maurizio Martelli, and Leon Sterling. Logic-Based Specification
Languages for Intelligent Software Agents. Theory and Practice of Logic Program-
ming, 4(4):429–494, 2004.
[46] Jörg P. Müller, Michael Wooldridge, and Nicholas R. Jennings, editors. Intelli-
gent Agents III, Agent Theories, Architectures, and Languages, ECAI ’96 Workshop
(ATAL), Budapest, Hungary, August 12-13, 1996, Proceedings, volume 1193 of Lec-
ture Notes in Computer Science. Springer, 1997.
[47] Andrea Omicini, Alessandro Ricci, and Mirko Viroli. Artifacts in the A&A Meta-
model for Multi-agent Systems. Autonomous Agents and Multi-Agent Systems,
17(3), 2008.
[48] Andrea Omicini, Sebastian Sardina, and Wamberto Weber Vasconcelos, editors.
Declarative Agent Languages and Technologies VIII - 8th International Workshop,
DALT 2010, Toronto, Canada, May 10, 2010, Revised, Selected and Invited Papers,
volume 6619 of Lecture Notes in Computer Science. Springer, 2011.
[49] Samuel J. Partington and Joanna Bryson. The Behavior Oriented Design of an Un-
real Tournament Character. In IVA, pages 466–477, 2005.
[51] Alexander Pokahr, Lars Braubach, and Winfried Lamersdorf. Jadex: A BDI Reason-
ing Engine. In Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-
Seghrouchni, editors, Multi-Agent Programming, volume 15 of Multiagent Systems,
Artificial Societies, and Simulated Organizations, pages 149–174. Springer, 2005.
[52] Anand S. Rao. AgentSpeak(L): BDI Agents Speak Out in a Logical Computable
Language. In Walter van de Velde and John W. Perram, editors, Proc. 7th European
Workshop on Modelling Autonomous Agents in a Multi-agent World (MAAMAW),
volume 1038 of Lecture Notes in Computer Science, pages 42–55. Springer, 1996.
[53] Anand S. Rao and Michael P. Georgeff. BDI Agents: From Theory to Practice. In
Victor R. Lesser and Les Gasser, editors, Proc. First International Conference on
Multiagent Systems (ICMAS), pages 312–319. The MIT Press, 1995.
638 Chapter 13
[54] Alessandro Ricci, Michele Piunti, Daghan L. Acay, Rafael H. Bordini, Jomi Fred
Hübner, and Mehdi Dastani. Integrating Heterogeneous Agent Programming Plat-
forms within Artifact-based Environments. In Proc. 7th International Joint Con-
ference on Autonomous Agents and Multiagent Systems (AAMAS), pages 225–232.
IFAAMAS, 2008.
[55] Alessandro Ricci, Michele Piunti, Mirko Viroli, and Andrea Omicini. Environment
Programming in CA RTAG O. In Rafael H. Bordini, Mehdi Dastani, Jürgen Dix,
and Amal El Fallah-Seghrouchni, editors, Multi-agent Programming: Languages,
Platforms and Applications, Vol. 2, pages 259–288. Springer, 2009.
[56] Stuart Russell and Peter Norvig. Artificial Intelligence, A Modern Approach (2nd
ed.). Prentice Hall, 2003.
[61] Richard Stocker, Maarten Sierhuis, Louise A. Dennis, Clare Dixon, and Michael
Fisher. A Formal Semantics for Brahms. In Proc. 12th International Workshop
on Computational Logic in Multi-agent Systems (CLIMA), volume 6814 of Lecture
Notes in Computer Science, pages 259–274. Springer, 2011.
[62] V.S. Subrahmanian, Piero Bonatti, Jürgen Dix, Thomas Eiter, Sarit Kraus, Fatma
Özcan, and Robert Ross. Heterogenous Active Agents. MIT Press, 2000.
[63] John Thangarajah, James Harland, David N. Morley, and Neil Yorke-Smith. Op-
erational Behaviour for Executing, Suspending, and Aborting Goals in BDI Agent
Systems. In Omicini et al. [48], pages 1–21.
[64] John Thangarajah, Lin Padgham, and Michael Winikoff. Detecting & Exploiting
Positive Goal Interaction in Intelligent Agents. In AAMAS, pages 401–408. ACM,
2003.
Chapter 13 639
[65] Nick A. M. Tinnemeier, Mehdi Dastani, and John-Jules Ch. Meyer. Roles and
Norms for Programming Agent Organizations. In Carles Sierra, Cristiano Castel-
franchi, Keith S. Decker, and Jaime Simão Sichman, editors, Proc. AAMAS, pages
121–128. IFAAMAS, 2009.
[66] Bart-Jan van Putten, Virginia Dignum, Maarten Sierhuis, and Shawn R. Wolfe.
OperA and Brahms: A Symphony? In Proc. 9th International Workshop on Agent-
Oriented Software Engineering (AOSE), volume 5386 of Lecture Notes in Computer
Science, pages 257–271. Springer, 2008.
[67] Birna van Riemsdijk, Mehdi Dastani, and John-Jules Ch. Meyer. Semantics of
Declarative Goals in Agent Programming. In Proc. 4th International Joint Con-
ference on Autonomous Agents and Multiagent Systems (AAMAS), pages 133–140.
ACM, 2005.
[68] Renata Vieira, Álvaro F. Moreira, Michael Wooldridge, and Rafael H. Bordini. On
the Formal Semantics of Speech-Act Based Communication in an Agent-Oriented
Programming Language. Journal of Artificial Intelligence Research, 29:221–267,
2007.
[71] Danny Weyns, Andrea Omicini, and James J. Odell. Environment as a First-class
Abstraction in Multi-agent Systems. Autonomous Agents and Multi-Agent Systems,
14(1):5–30, Feb 2007. Special Issue on Environments for Multi-agent Systems.
Chapter 14
1 Introduction
As we have seen from the previous chapter, there are many ways to implement a
multiagent system. Although certainly not trivial, the task of producing a working
multiagent system is relatively straightforward. However, producing a multiagent
system that always “works” is much more challenging. But how can we assess
this? How can we describe exactly what we want our system to do? And then how
can we ensure that any system we build actually conforms to this description? A
further aspect that is increasingly becoming important is how we can assess that a
multiagent system built by someone else conforms to our requirements.
In this chapter we will address these important problems. Specifically, we will
show how formal logics and logical procedures can form the basis for the compre-
hensive analysis of multiagent systems with respect to their formal requirements.
While we aim to focus on the analysis, through verification, of systems, we must
naturally explore ways of describing our requirements in a formal way. Thus, we
begin by considering the formal specification of agents and multiagent systems.
From this we move on to the formal verification of these systems, examining the
many different ways that designs, models, and programs can be exhaustively as-
sessed with respect to a formal specification.
Although this can be quite a technical topic, we will present only the basic
ideas and exhibit them through simple examples. In addition, there are many
642 Chapter 14
existing verification systems that we will point the reader toward as we proceed
through the chapter. As this area is still very much at the forefront of research
activity, many of the tools are prototypes under active development. As such, they
are not so refined or stable; nevertheless, we encourage you to try some of these
(leading edge) tools and aid their development.
AT&T Telephone Network Outage (1990): There was a 9-hour network outage
in New York City of large parts of the US telephone network [90]. It cost
several 100 million US$ and the cause was the incorrect interpretation of a
break statement in the C programming language [73].
Pentium FDIV BUG (1994): Problems occurred with the floating point division
unit (FDIV) of Pentium chips [91, 130]. Under certain circumstances, a flaw
in this unit resulted in incorrect results. This cost an estimated 500 million
US$ and resulted in a serious image loss for the company. (Following this,
chip designers invested heavily in formal verification techniques.)
Ariane 5 Disaster (1996): This is the famous crash of the Ariane-5 rocket [56,
127]. The source of the crash is believed to be in the data conversion from a
64-bit floating point to a 16-bit signed integer. It cost over 500 million US$
and had a very negative impact on the image of space reliability.
These, along with several other less high profile problems, have led to companies
increasingly employing formal verification techniques to build confidence, find
bugs early, improve efficiency, etc. Yet another important aspect is that verified
Chapter 14 643
software is likely to inspire more confidence in the public and so lead to increased
“trust” in computational solutions and so greater opportunities for uptake/use of
software in general. In the particular case of multiagent systems, this is especially
important. Since autonomy is a central aspect of agent-based systems, public
confidence will be quickly eroded if we cannot guarantee that the autonomous
choices made by an agent are both safe and secure.
Consequently, we require some specification to describe our requirements and
some verification process to match these requirements against a system to be ana-
lyzed. We use logic as our basis for both specification and verification. But why?
1. Formal logic provides a clear, concise, and unambiguous notation for de-
scribing systems and scenarios. Compare this to a functional documenta-
tion sheet, which is often written in natural language and prone to all sorts
of misunderstandings.
2. The formal properties of logical descriptions are well understood. For ex-
ample, the expressive capabilities of various logics (what can be described
at all, what can be easily described in the logic, and what cannot) have been
comprehensively explored.
3. Along with a formal logic comes a range of logical procedures that we can
utilize in verification, such as decision procedures, proof systems, model
checkers, etc. Again, most of these have well-established complexity and
completeness results and often have implementations that we can take ad-
vantage of.
4. Finally, one aspect of formal logics that we find particularly useful when
considering multiagent systems is the flexibility of logic. There are very
many different logics, capturing many different aspects. Logics have been
developed to capture time, space, belief, desire, wishes, cooperation, inten-
tion, probability, etc. [15, 43, 122]. Indeed, we can design our own logics
to capture relevant, new activities.
These four properties make logic a suitable candidate. In particular, they help to
devise an appropriate logic providing a level of abstraction close to the key con-
cepts of the multiagent system. In addition, we can combine logics together to
represent multifaceted systems. As we have seen already in this book, agents are
often multifaceted and their description typically requires combinations of logics.
For example, the BDI theory combines several modal logics (for beliefs, desires,
and intentions) with a temporal (or dynamic) logic describing the underlying sys-
tem evolution.
644 Chapter 14
2 Agent Specification
In this section we first provide a description and the basic terminology to talk
about formal, concise, and relevant specifications of agent-based and multiagent
systems. While Section 2.1 deals with logics and specifications in general, we
introduce in Section 2.2 several variants of temporal logics. We briefly describe
some approaches based on dynamic logics in Section 2.3. An important aspect
is to combine temporal logics with other logics, such as logics of knowledge and
belief; we elaborate on this in Section 2.4. Finally, the running example from the
last chapter is discussed in Section 2.5.
typical approach involves a logical basis that is, in turn, a combination of log-
ics [42, 52]. Primarily this is a logic describing the core abstract “state” of an
agent, combined with a logic showing how such abstract states change or evolve
dynamically.
As an example, let us imagine that our agent is primarily concerned with ac-
cessing, selecting, and distributing information. We might decide that the best way
to formalize such an agent is in terms of a logic of knowledge. This will allow us
to describe what the agent knows, what it knows that it knows, what it knows that
others know, and so on. A standard approach to representing such knowledge is to
use a multimodal logic of the S5 style. Here, the modal necessitation operator, Ki ,
can be parametrized by an agent, i. Thus, KJürgen ϕ means that “Jürgen knows ϕ”,
while KMichael ψ means that “Michael knows ψ”. With such a logic we can describe
a variety of interesting agent behaviors concerning knowledge, for example
We can also consider schemata of the form KJürgen Φ → KMichael Φ for all formulae Φ.
This means that whatever Jürgen knows, Michael knows and so Michael knows at
least as much as Jürgen.
This gives quite a strong mechanism for describing static agent knowl-
edge [40], but it is not yet enough. We need to add a more dynamic dimension,
allowing our agent “state” to evolve or change to a new “state.” This might be due
to some action occurring, some time passing, or any other dynamic event. Thus
we need a logic that captures such aspects; the primary candidates for this are
dynamic logic [59] or temporal logic [36, 47, 62, 115].
Continuing our example, let us use a simple linear temporal logic [51] to de-
scribe dynamic change. Recall that typical connectives in such a logic are “”,
meaning “at the next moment in time,” and “♦”, meaning “at some future moment
in time.” Such operators allow us to navigate between states at distinct moments
in time. Once we combine this logic with our logic of knowledge (technically a
fusion of the two logics [42]), we can describe statements such as
2. a logical dimension describing the information the agent has, for example,
a logic of belief or logic of knowledge (as above); and
For example, the logical basis for the BDI approach comprises a temporal logic, a
modal logic of belief (for the information dimension), and modal logics of desire
and intention (for the motivational dimensions) [106]. Alternatively, the KARO
framework (for Knowledge, Abilities, Results, and Opportunities) [88, 123] com-
bines a dynamic logic basis with a modal logic of knowledge (information) and a
modal logic of wishes (motivation).
1 2
pos0
q0 pos0
1
2
pos1
pos2
q2 q1
2
1
pos2 pos1
Figure 14.1: Two robots and a carriage: a schematic view (left) and a transition
system M0 that models the scenario (right).
In this section we introduce the logics that we investigate later more thoroughly.
We do not give precise semantics here (we refer to Chapter 16 and to [18]).
We illustrate these logics with the following example.
Example 14.1 (Robots and Carriage) Two robots push a carriage from opposite
sides (Figure 14.1). As a result, the carriage can move clockwise or counter-
clockwise, or it can remain in the same place – depending on who pushes with
more force (and, perhaps, who refrains from pushing). We identify 3 different
positions of the carriage, and associate them with states q0 , q1 , and q2 . The
arrows in transition system M0 indicate how the state of the system can change in
a single step. We label the states with propositions pos0 , pos1 , pos2 , to refer to the
current position of the carriage.
Definition 14.1 (Kripke Model, Path) A Kripke model (or unlabeled transition
system) is given by M = St, R, Π, π
where St is a non-empty set of states (or
possible worlds), R ⊆ St × St is a serial transition relation on states, Π is a set of
atomic propositions, and π : Π → 2St is a valuation of propositions. A path λ (or
computation) in M is an infinite sequence of states that can result from subsequent
transitions, and refers to a possible course of action. For q ∈ St we use ΛM (q) to
denote the set of all paths of M starting in q and we define ΛM as q∈St ΛM (q).
The subscript “M” is often omitted when clear from the context.
648 Chapter 14
2.2.1 LTL
Definition 14.2 (Language LLTL [99]) The language LLTL is given by all formu-
lae generated by the following grammar, where p ∈ Π is a proposition: ϕ ::= p |
¬ϕ | ϕ ∧ ϕ | ϕ U ϕ | iϕ.
The logic is termed linear-time because formulae are interpreted over infinite lin-
ear orders of states. It allows us to reason about a particular computation of a
system: there is always exactly one next time moment. A model is an infinite
sequence of states, such as
pos0 pos1 pos2 pos0 pos1 pos2
q0 q1 q2 q0 q1 q2
This describes a computation that consists of the same 3 states occurring again
and again: the carriage moves forever in a cycle from position 0 via position 1 to
position 2.
What about the formulae ipos1 , ♦pos2 , and ♦pos2 ? They are all true in
this model (we evaluate these formulae in the first state of the series).
There are several important classes of property that can be expressed easily in
LTL:
Reachability: A particular state is reachable from the present state.
Liveness: A (good) property will eventually be satisfied by some state in the fu-
ture.
LTL can be viewed in many ways [48], for example, as a decidable fragment of
first-order logic (see one of the exercises).
There is one important observation to make here: models of LTL are infinite
paths (see one of the exercises where we show how to make finite paths infinite
Chapter 14 649
by adding loops). How can we represent such an infinite path? Clearly we need a
finite representation of it. In fact, we use the Kripke model M (see Definition 14.1)
and define
M, q |=LTL ϕ
to mean that ϕ is true on all resulting paths starting in q in the model M. Note
again that LTL formulae are always evaluated on one individual path: in the
definition just presented we consider, in addition, all possible paths the system
could take.
Clearly, ipos1 is not true at q0 in M as there is a path from q0 where pos1 is
not true (using the transition that leads into q0 or the one that leads into q2 ).
Definition 14.3 (Language LCTL∗ [38]) The language LCTL∗ is given by all for-
mulae generated by the following grammar: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | Eγ where γ ::=
ϕ | ¬γ | γ ∧ γ | γ U γ | iγ and p ∈ Π. Formulae ϕ (respectively, γ) are called state
(respectively, path) formulae.
For example, E♦ϕ states that there is at least one path on which ϕ holds at some
(future) moment in time.
We now consider the formulae E♦pos1 , A♦pos1 , and ¬A♦pos2 and evaluate
them in our Kripke model and in state q0 . While the first one is true (from state
q0 it is possible to reach position 1 in the future via a computation), the second is
not (because this is not possible for all paths). The third formula is also true (as
its negation is false).
Finally, we mention that there is a fragment2 of CTL*, called CTL [23],
which is strictly less expressive but has significantly better computational prop-
erties. The language is restricted so that each temporal operator must be directly
preceded by a path quantifier. For example, AE ip is a LCTL -formula whereas
A♦p is not. The complexity of these logics is investigated in Section 4.3.
2 To be precise, it is not just a fragment, i.e., a subset, of the language. In order to make sure
that all operators are definable, the definition of the language requires some care.
650 Chapter 14
So CTL formulae are directly evaluated in a Kripke model and we can express
“on all paths it is always true that there exists a path such that . . . ”: AEpos1 .
This cannot be expressed in LTL.
Thus we can formulate statements of the form it is possible that a certain group
of agents is able to bring about a certain formula ϕ. This is to be understood as,
whatever all other agents are doing, this group can make sure that ϕ holds. The
recursive definition of the language syntax is given below.
Definition 14.4 (Language LATL∗ [6]) The language LATL∗ is given by all
formulae generated by the following grammar: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ |
A
γ where γ ::= ϕ | ¬γ | γ ∧ γ | γ U γ | iγ, A ⊆ Agt, and p ∈ Π. Formulae ϕ
(respectively, γ) are called state (respectively, path) formulae.
For example, the formula A
♦p expresses the statement that coalition A can
guarantee that p is satisfied infinitely many times (ever and ever again in the fu-
ture).
The semantics for LATL∗ is defined over a variant of transition systems where
transitions are labeled with combinations of actions, one per agent.
So, it is assumed that all the agents execute their actions synchronously: the com-
bination of the actions, together with the current state, determines the transition to
the next state of the system.
Chapter 14 651
wait,wait
1 2 1 2
push,push pos0 halt
halt,push
q0 halt,wait
qh
pos0 halt
pu
ait
wait,wait
sh,
h
wa
,w
us
wa
sh
it,p
it,p
pu
it
u
wait,wait
wa
1
2
sh
pos1 wait,wait
pos2 push,push push,push
q2 wait,push q1
2
1
Figure 14.2: Two robots and a carriage: a refined version of our example and a
concurrent game structure (CGS).
Example 14.2 (Robots and Carriage, ctd.) Consider the modified version of
our example, as shown in Figure 14.2. What about the following formulae?
1. pos0 → 1 ¬pos1
2. pos0 → 1, 2 ipos1
3. pos0 → ¬2 ipos1
The first one expresses that when one is in position 0, then agent 1 alone can
ensure that position 1 will never be reached in the future. Indeed, agent 1 should
not push while in position 0, but should do so in position 2 (otherwise it might end
up in position 1 if agent 2 does not push). The appropriate strategy is s1 (q0 ) =
wait, s1 (q2 ) = push (the action that we specify for q1 is irrelevant). The second
formula above says that both agents can make sure that they reach position 1 in
the next step. Indeed, agent 1 should push and agent 2 should refrain from doing
so. Clearly they need to work together. The final formula above says that, in
position 0, it is not possible for agent 2 on its own to ensure that the carriage ends
up in position 1 in the next time-point. Indeed, the next position of the carriage
depends on the action carried out by agent 1.
2.4 Combinations
As we have said already, the key logical foundation for agent specification is the
combination of logical domains [78]. Typically, formalisms for agent specifica-
tion consist of a temporal/dynamic basis combined at least with a logic of in-
formation (e.g., knowledge or belief) and usually at least one logic of motivation
(e.g., goals, intentions, desires, wishes). In the following subsections we highlight
useful and popular combinations.
2.4.1 BDI
As the BDI approach to representing and implementing rational agents [107] is the
predominant one within the area, this is described in detail in several other chap-
ters. We will not recap all this but just mention how the formal BDI framework fits
in with this section. As described above, the core of any agent formalism is some
dynamic base, and there are variations of the BDI approach using either dynamic
logics or temporal logics. On top of this, we usually need a logical framework
for the information the agent has and, again, the BDI approach uses beliefs cap-
tured logically by KD45 modal logic. For rational agents that must have some
explicit motivation for making their choices, an additional logical dimension for
this is required. Indeed, in the BDI approach, there are two varieties of motiva-
tion: desires, representing long-term goals; and intentions, representing goals that
the agent is actively undertaking. Both of these are formally represented by (dis-
tinct) KD modal logics. The combination of all these logical dimensions provides
a logical basis for specifying BDI agents [106], in particular, a basis upon which
the key aspect of deliberation can be described. In the BDI approach, delibera-
tion consists of two aspects: deciding which desires will be selected to become
intentions; and deciding the best way (plan) to achieve these intentions.
2.4.2 KARO
The KARO approach (Knowledge, Abilities, Results, and Opportunities) [88, 123]
is based on dynamic logic. Essentially it is a formal system aimed at specifying
and reasoning about the behavior of rational agents.
In the basic framework [123], an agent has knowledge, usually expressed
through an S5 modal logic. The dynamic logic basis provides the action lan-
guage, thus allowing analysis of whether the agent is able to perform a certain
654 Chapter 14
action or has the opportunity to perform it. Beyond this the motivational aspect of
a rational agent is described via a KD modal logic of wish.
received(offer) ⇒ iaccept(offer) .
However, this is quite a strong requirement. More likely, we will require the agent
to accept one of the reasonable offers, and so, using some additional first-order
syntax,
More typically, within an agent working in the real world, we will not have cer-
tainty about information or environmental constraints, and so will use belief. In
addition, since we often do not know what might stop the agent from accepting an
offer, we often require that the seller agent intends to accept the best offer. So:
BSeller best(offer) ⇒ ISeller accept(offer) .
Separately, we might require that
(ISeller accept(x) ∧ no_problem_with(x)) ⇒ ♦accept(x) .
Now, we can also specify the multiagent interaction, for example
send(seller, offer) ⇒ ♦KSeller received(offer)
and by combining all the specifications together, we can (ideally) describe the
chain of steps to achieve
∃O. ♦accept(O) .
Krobot in_front_of (robot2 , C) ∧
2 ⇒ iKrobot in_front_of (robot2 , ABC)
Krobot in_front_of (robot2 , AB) 2
2
" #
Krobot in_front_of (robot2 , ABC) ∧
2 ⇒ ♦unloaded(ABC)
unload(ABC)
656 Chapter 14
And so on. An important aspect to notice here is that the overall system re-
quirement is dependent on the appropriate actions of the environment. Specifi-
cally, that “table_rotates” is true at relevant times. Typically, we might require
♦table_rotates, i.e., that the table rotates infinitely often.
This leads us to consider the cooperation between the robots required in order
to achieve the (completed and) unloaded part “ABC”. Intuitively, we might expect
a requirement such as
However, as we have seen, this is not the case and we must take into account the
movements of the table. Actually, we can consider the table to be a separate agent
since it can choose to rotate when it likes. Given this, we might require
and so the two robots and the table can together ensure that the “ABC” part is
completed and unloaded. Finally, we note that the table can, in principle, stop the
part being made:
table
¬completed(ABC)
since the table can choose to move at exactly the wrong places.
be the main focus of our discussion in Section 4 and beyond. Before that, how-
ever, we look at several ways to generate an agent implementation from a logical
specification.
3.2 Refinement
Given some logical specification of agent behavior, ϕS , then we might choose
to refine this to a new, perhaps more detailed, specification. As we move to-
ward a “real” implementation we would like to increasingly be specific about the
agent’s behavior. So, while ϕS might be quite vague and high level, we would aim
xxxxxxx for the refined specification, say ϕR , to describe behaviors that still cor-
respond to some of those in ϕS but likely remove some of those we now consider
irrelevant. Thus, it is typical (and expected) that ϕR ⇒ ϕS . So, we know that all
implementations satisfying ϕR will also still satisfy ϕS , though there may well be
some implementations allowed by ϕs that are now disallowed by ϕR . Two things
are important here:
1. whatever logical properties we established for ϕS can, because we know that
ϕR ⇒ ϕS , also be established for ϕR ; and
3.3 Synthesis
Ideally, we would like to automatically synthesize an agent program directly from
an agent specification. This sounds ideal, especially if we can guarantee that
the agent will definitely implement its specification. This is, of course, a very
appealing direction in traditional formal methods but has some underlying diffi-
culties [84]. A typical approach is to synthesize a finite-state automaton from a
logical (usually temporal) specification [100, 101]; and though in some cases this
can be automatic and effective, in many situations the complexity of undertak-
ing this is very large. Thus, the underlying synthesis problem even for a system
of two very simple agents may well be quite complex (2-EXPTIME). So, as yet,
such approaches are impractical. However, current work looking at both reduced
658 Chapter 14
scenarios and at bounded search for an implementation [98, 109] have promise
both in the non-agent and agent cases.
built from the beginning; the model is constructed step by step, starting from the
initial state. In the basic case this is complete in that the temporal specification for
an agent can be executed if, and only if, the specification is satisfiable.
As we have seen, however, a temporal specification on its own is not enough
and, consequently, M ETATE M has been extended and developed over the years.
The basic specification is extended with beliefs, which provide the information
the agent decides upon. An interesting aspect of these beliefs is that a form of
resource-bounding can be captured by considering the depth of nesting allowed
in reasoning about such beliefs [44]. In addition, motivations are developed. In-
deed two varieties are considered: the temporal “♦” modality, which provides a
very strong motivation since the semantics of “♦g” require that g will definitely
happen; and the combination “B♦”, where “B” is the belief operator, which pro-
vides a weaker motivation for the agent. Overlaying all this is a framework, called
Concurrent M ETATE M, which takes a set of such agents, each executing its own
formal specifications asynchronously, and allows them to communicate, cooper-
ate, and self-organize [48]. See Chapter 13 for more details.
4 Formal Verification
Once we decide to analyze a system with respect to a formal property, there are
a number of ways to achieve this. One particularly popular approach is to carry
out testing [8, 16, 61]. Here, the system/program is executed under a specific set
of conditions and the execution produced is compared to an expected outcome.
The skill in testing is to carry this out for enough different conditions so that the
developer can be relatively confident that the program/system is indeed correct.
While testing is, of course, very useful, it only examines a subset of all the
possible executions. What if we want to be sure that the logical specification is
met, whichever way the program/system executes? Assessing whether this is the
case or not is the core of formal verification.
Product
Model of the System Model of "Bad" paths
Operation
to satisfy the specification, then this gives us an execution that violates the formal
requirement.
The typical way of visualizing this is in terms of finite-state automata, in par-
ticular Büchi automata. The essential idea here is to capture all the possible exe-
cutions of the system to be verified as a B´’uchi automaton (a finite-state automa-
ton with infinite runs) and generate a separate B´’uchi automaton describing all
bad runs, i.e., executions that do not satisfy the property being verified. Then we
take the synchronous product of these two B´’uchi automata [113, 125] (see Fig-
ure 14.3). If the product automaton is empty, then there is no sequence that is a
legal run of the system while at the same time satisfying the “bad” property. How-
ever, if the product automaton is non-empty, then there is indeed a sequence that
is a legal run of the system while at the same time satisfying our “bad” property.
This highlights a failing run of the system.
The model-checking approach has been extremely successful, not only in ana-
lyzing hardware systems [70] and protocols [64], but increasingly in software sys-
tems [12, 126]. While the basic idea is quite simple, the success of the technology
is, to a large part, due to the improvements in implementation and efficiency that
have occurred over the last 25 years. As well as a characterization in terms of
automata [113], on-the-fly [53], symbolic [87], and SAT-based [102] techniques
have all improved the efficacy of model checkers.
The “on-the-fly” approach will be particularly interesting with respect to our
later descriptions, and so we will say a little more about this here. Recall from
Figure 14.3 that the basic automata-theoretic view of model checking involves
constructing the product of two B´’uchi automata. In many practical cases, this
product turns out to be much too large to realistically construct. So, rather than
constructing the actual product automaton, the idea with the “on-the-fly approach”
is to explore paths through this product automaton without actually constructing
it! This is achieved by exploring the two automata in parallel (see Figure 14.4).
662 Chapter 14
||
Parallel
Model of the System Model of "Bad" paths
Exploration
To see how this works, recall that a run of the product automaton must, simul-
taneously, be a run of each of the automata separately. Thus, we explore the “sys-
tem” automaton, ensuring that every transition we take is mirrored by a transition
in the “bad” automaton. We keep exploring this pair synchronously until either
a path has been found that satisfies both, or until the exploration of the “system”
automaton can go no further. In the former case, we have found our “bad” path;
in the latter case, we roll back our execution to any previous choice point in the
“system” automaton and continue exploration. If we have explored all possible
paths through the “system” automaton and none of them have yielded a run of
the “bad” automaton, then we can assert that no execution has the “bad” property.
Notice that in order to be able to achieve this, the model-checking implementation
needs to have (a) a way to synchronously step through two representations, and
(b) a mechanism for backtracking the execution. The predominant model checker
exhibiting this technology is the S PIN model checker [64], though we will see
how this approach is used in Section 4.4.
Finally, while we have described model checking as a technique for analyzing
finite-state systems, there has been considerable work in providing coherent ab-
straction mechanisms to reduce infinite-state systems down to a finite-state form
suitable for model checking (see [20, 21]).
||
Parallel
Execution of the System Model of "Bad" paths
Monitoring
1. For our particular agent, what logic should “Ag” be described in, and how
do we actually generate “Ag”?
2. What logic should Req be described in, and can we be sure this is sufficient
to allow us to say what we want?
Some of these are, of course, quite difficult and fundamental questions, but let us
start to describe answers to some of the above, beginning with (1). For any formal
method, we need some variety of formal semantics that provides a formal (often
logical) representation of all the behaviors of the agent. If agents are described
in terms of enhanced finite-state machines, then this is fairly straightforward. If,
however, we have an agent program, then we require a semantics for the agent
programming language. In the case of deductive verification we consider here,
we specifically need a logical semantics for the agent programming language.
As with traditional formal methods, other varieties of formal semantics, notably
operational semantics, are more popular. Indeed, there are few agent languages
with logical semantics; some exceptions are [4, 33, 46, 116, 117]. In general, it
is easier to develop an operational semantics for an agent programming language
Chapter 14 665
(especially since such an operational semantics can form the basis for language
implementation) than to develop a logical semantics; consequently, much more
verification work has occurred via operational semantics. Nevertheless, there are
quite a few areas where agent verification based on some form of proof has been
achieved, and we will mention a selection of these below.
Concerning some of the other questions mentioned above, the decision about
what logical basis to use must clearly be driven by the requirements of both the
logical semantics (i.e., what logic the semantics is provided in) and the formal
requirements (i.e., what logic allows us to state the questions we wish to ask). As
we saw earlier, some logical basis combining a temporal/dynamic dimension with
at least a knowledge/belief dimension and probably a motivational dimension is
often used.
2APL, 3APL: In [4] the authors consider a fragment of 3APL and define a
series of propositional dynamic logics that can be used to prove safety and live-
ness properties of programs in this fragment under different deliberation strate-
gies. This is done by relating the operational semantics of programs to models
in the appropriate logic. It requires, among others, the axiomatization of fully
interleaved strategies.
I MPACT: Agents are specified in I MPACT through agent programs. These have
the form of rules and are treated as clauses with negation in logic programming.
Therefore, the semantics is given by the well-known fix-point semantics (the least
Herbrand model in the case of Horn clauses or stable semantics in the case of rules
with negation-as-failure). Whereas the basic language of I MPACT does not allow
666 Chapter 14
Abductive logic programming [72]: Here, standard logic programs are ex-
tended with abducible predicates. These are predicates whose values can be set
in such a way as to explain certain observations. Thus, given a program and a set
of observations, an abduction process is used to suggest which of our abducible
predicates explain the observations. This is particularly useful in “intelligent”
agent computation, where agents often have only partial knowledge of their en-
vironment and so must work out what is the most reasonable explanation for the
things it perceives. Importantly, for our purposes, an abductive proof procedure is
used as part of this process.
KGP and SCIFF: The KGP agent approach is based on logic programming but
extended with specific agent aspects: Knowledge, Goals, and Plans [108]. Ab-
ductive logic programming is used via the SCIFF procedure for interaction veri-
fication [3]. SCIFF was originally developed to verify the compliance of agents
to interaction protocols; it uses (1) abducibles to represent hypotheses about agent
Chapter 14 667
behavior, (2) CLP constraints, and (3) existentially quantified variables in integrity
constraints.
Action logics: In a series of papers [11, 54, 55], the authors tackle the prob-
lem of specifying and verifying systems of communicating agents and interaction
protocols (e.g., verification of a priori conformance to the agreed upon protocol).
This applies to the case where protocols are specified with finite-state automata
or when the policies can be implemented in DY LOG, a computational logic. The
last approach [55] is based on a dynamic linear-time temporal logic.
5.4 Example
Recall the example of two robots working together to manufacture an artifact,
introduced in Chapters 13 and 15. We considered some of the requirements of
such a scenario in Section 2.5. Now, if we wish to apply deductive verification to
assess some of these requirements, we need a logical description of the system in
question. Typically, this would contain logical representations of all the steps of
the robots, for example
⎡ ⎤
Krobot in_front_of (robot1 , A) ∧
⎢ 1 ⎥
⎣ Krobot1 in_front_of (robot1 , B) ∧ ⎦ ⇒ iin_front_of (robot1 , AB)
do(robot1 , load(A, B))
Once we have a suitable specification of the system (say Sys), possibly comprising
formulae such as the above, then we can verify this with respect to some of the
formal requirements (say Req) from Section 2.5 in the way described earlier, i.e.,
Sys ⇒ Req
Of course, we require suitable, preferably automated proof systems for the rele-
vant logics. For example, the above will need at least a proof in temporal logics
of knowledge [40].
M, q |=L Φ.
668 Chapter 14
This variant of model checking is called local, because we evaluate the formula
in a given state q. Global model checking, on the other hand, is the problem of
computing all states q, such that the above relation holds (where M, a property Φ
in a logic L are given).
In Section 6.1, we deal with the problem of how exactly to measure the in-
put of a model-checking problem. Subsection 6.2 extends the logics introduced
in Section 2.1 along two dimensions: taking into account the past history of the
agent; and taking into account that the perceptions of an agent are not perfect. In
Subsection 6.3 we introduce an approach with a highly compact representation of
a model. Finally, in Subsection 6.4 we present a table highlighting the complexi-
ties of the logics considered so far.
Highly compact: For many systems, some symbolic and thus very compact rep-
resentations are possible. The model can be defined in terms of a compact
high-level representation, plus an unfolding procedure that defines the pre-
cise relationship between representations and explicit models of the logic.
Of course, unfolding a higher-level description to an explicit model involves
usually an exponential blowup in its size.
We are now ready to tackle the questions at the beginning of this subsection. Tak-
ing only the number of states into account would give a misleading measure. Let n
be the number of states in a concurrent game structure M, let k denote the number
of agents, and d the maximal number of available decisions (moves) per agent per
state. Then, m = O(nd k ).
Thus, if we consider explicit models, the size of the input is measured as nd k .
If we consider, however, implicit models, then the size of the input is viewed as a
function of n and k. Therefore, many model-checking algorithms (e.g., from [7])
are polynomial in nd k but they run in exponential time if the number of agents is
a parameter of the problem (implicit models).
Perfect Information: Agents have perfect information about the current state.
All states can be distinguished and all agents know the current state.
However, often agents do not perceive their environment perfectly. Some
are able to distinguish certain states, whereas others might do so for differ-
ent states. This requires that each agent only knows an equivalence relation
of the set of states. And then the strategy of an agent has to be compatible
with this relation. We refer to Example 14.4.
Imperfect Recall: Agents base their decisions only on the current state. This
means that whenever an agent gets back to this state, its decision must be
the same. Thus a strategy is defined as sa : St → Act where sa (q) ∈ da (q), a
memoryless strategy.
A more flexible (and powerful) method would be to base the decision not
only on the current state, but on the whole history of events until now. A
670 Chapter 14
1 2 wait,wait
pos0
push,push
q0 pos0
pu
ait
sh,
,w
h
wa
wa
us
sh
2 sh
1
it,p
it,p
pu
it
wait,wait
1
2
wait,wait
u
pos1
wa
pos2 push,push push,push
q2 wait,push q1
2
1
Figure 14.6: Two robots and a carriage: a schematic view (left) and an imperfect
information concurrent game structure M2 that models the scenario (right).
IR, Ir, iR, ir: Combining the two dimensions mentioned above gives us four dif-
ferent logics. ATLIR : ATL with perfect information and perfect recall;
ATLIr : ATL with perfect information and imperfect recall; ATLiR : ATL
with imperfect information and perfect recall; and ATLir : ATL with im-
perfect information and imperfect recall.
We note that ATLIr and ATLIR are equivalent, which does not hold for the ATL∗
version. Consider Figure 14.2 and the formula 1, 2
(♦pos1 ∧ ♦halt). Does this
formula hold in state q0 ? In order to make it true, we have to first go to q1 and
then back to q0 to finally switch to the halting state. But this is only possible when
we have perfect recall! This shows that ATL∗IR and ATL∗Ir are different.
But the formula just considered does not belong to the language of ATL. How
can one show that ATLIr and ATLIR are equivalent? One can show by induction
on the structure of a formula that the following holds. If a formula Φ is true in
a model, then there is a strategy that leads to a certain (infinite) path. So there
must be a prefix that contains for the first time a state twice. Thus it is of the form
q0 , . . . q, . . . q. Then there must be a strategy (maybe a different one) that makes Φ
true in this prefix. This implies that perfect recall is not needed because all this
depends on the first prefix: no history is needed (as in our counterexample).
Example 14.4 (Robots and Carriage, ctd.) We refine the scenario from Exam-
ples 14.1 and 14.2 by restricting the perception of the robots. We assume that
robot 1 is only able to observe the color of the surface on which it is standing,
and robot 2 perceives only the texture (cf. Figure 14.6). As a consequence, the
Chapter 14 671
first robot can distinguish between position 0 and position 1, but positions 0 and 2
look the same to it. Likewise, the second robot can distinguish between positions
0 and 2, but not 0 and 1. We also assume imperfect recall.
With these observational capabilities, no agent can make the carriage reach
or avoid any selected states single-handedly. E.g., we have that M2 , q0 |=ir
¬1
¬pos1 . Note, in particular, that strategy s1 from Example 14.2 cannot
be used here because it is not uniform (indeed, the strategy tells robot 1 to wait in
q0 and push in q2 but both states look the same to the robot). The robots cannot
even be sure to achieve the task together: M2 , q0 |=ir ¬1, 2
pos1 (when in q0 ,
robot 2 considers it possible that the current state of the system is q1 , in which case
all the hope is gone). So, do the robots know how to play to achieve anything? Yes,
for example they know how to make the carriage reach a particular state eventu-
ally: M2 , q0 |=ir 1, 2
♦pos1 , etc. – it suffices that one of the robots pushes all
the time and the other waits all the time. Still, M2 , q0 |=ir ¬1, 2
♦posx (for
x = 0, 1, 2): there is no memoryless strategy for the robots to bring the carriage to
a particular position and keep it there forever.
and abilities. For the latter kind of analysis, we need to allow for more sophisti-
cated interferences between agents’ actions (and enable modeling agents that play
synchronously).
A modular interpreted system (MIS) is defined as a tuple M =
Agt, env, Act, In
, where Agt = {a1 , . . . , ak } is a set of agents, env is the environ-
ment, Act is a set of actions, and In is a set of symbols called interaction alphabet.
Each agent has the following internal structure: ai = Sti , di , outi , ini , oi , Πi , πi
.
The unfolding of a MIS M to a concurrent game structure is naturally in-
duced by the synchronous product of the agent (and the environment) in M,
with interaction symbols being passed between local transition functions at every
step. The unfolding can also determine indistinguishability relations as follows:
q1 , . . . , qk , qenv
∼i q1 , . . . , qk , qenv
iff qi = qi , thus yielding a full iCGS.
ATL model checking for such higher-order representations was first analyzed
in [118] over a class of simple reactive modules, based on the synchronous prod-
uct of local models. However, such reactive modules do not allow us to model
interference between agents’ actions.
Note. This section is taken from [18], and we refer to the work of Jamroga and
Agotnes for further details [68]. It is inspired by interpreted systems [40], reactive
modules [5], and are in many respects similar to ISPL specifications [104]. A
recent application is [74].
EXPTIME-compl. [118]
PSPACE-complete [77]
PSPACE-complete [77]
PSPACE-complete [77]
PSPACE-compl. [67]
PSPACE-complete
EXPTIME-hard
EXPTIME-hard
Undecidable†
Undecidable†
nlocal , k, l
2EXPTIME-compl.
ΔP3 -compl. [69, 79]
PSPACE-complete
PSPACE-complete
P-complete [24]
P-complete [24]
P-complete [24]
Undecidable†
n, k, l
PSPACE-complete [111]
2EXPTIME-compl. [7]
PSPACE-compl. [111]
Δ2 -compl. [69, 111]
P-complete [24]
P-complete [24]
P-complete [24]
P-complete [7]
Undecidable†
Undecidable†
m, l
P
Logic \ Input
ATLIr,IR
ATL*IR
ATL*iR
ATL*Ir
ATL*ir
CTL*
ATLiR
ATLir
CTL
LTL
Table 14.1: Overview of the complexity results: most are completeness results.
models measured by their number of transitions. Then, we get the same complex-
ity class against explicit models measured using the number of states and agents.
Finally, model checking imperfect information turns out to be easier than model
checking perfect information for modular interpreted systems. Why is that so?
The number of available strategies (relative to the size of input parameters)
is the crucial factor here. It is exponential in the number of global states. For
uniform strategies, there are usually fewer of them but still exponentially many in
general. Thus, the fact that perfect information strategies can be synthesized in-
crementally has a substantial impact on the complexity of the problem. However,
measured in terms of local states and agents, the number of all strategies is dou-
ble exponential, while there are “only” exponentially many uniform strategies –
which settles the results in favor of imperfect information.
A comment on the two rows that state that the model-checking problems for
ATLiR and ATL*iR are undecidable: This has been open for some time (although
it has been stated without proof) and a (complicated and not very insightful) proof
has only recently been presented in [32, 57]. The main point is that the perfect re-
call helps to distinguish histories that are, in the case of ATLir , indistinguishable.
These can be then used to encode arbitrary runs of Turing machines and thus to
encode undecidable problems like the halting problem.
MCMAS. An important system here is MCMAS [82, 83, 86], which builds on
work on model checking temporal logics of knowledge and the efficient, symbolic
verification of interpreted systems [81, 97, 105]. MCMAS has been evaluated on
real systems, in particular, to verify properties of underwater autonomous vehi-
cles [39].
between these configurations. In this way, it provides an abstract view of the po-
tential execution (i.e., sequence of configuration changes) of any program. Now,
given a specific program, we can work through the program and by examining the
operational semantics, can build a model of all the potential configurations that
the particular program can generate. This model can then be checked against a
logical requirement.
This approach has been used often, and in the following some selected exam-
ples are given.
To P ROMELA and S PIN: In [133] simple agent programs were verified via
a translation to S PIN. In [17], AgentSpeak programs were translated to the
P ROMELA language, and then the S PIN model checker is used to verify its proper-
ties. Note that subsequent work translated to JAVA and used JPF (see Section 7.4).
G OAL: In [71], the operational semantics of the G OAL agent programming lan-
guage is used to describe all the possible executions of a specific G OAL program.
The on-the-fly algorithmic verification techniques are used to explore all these
potential executions. This provides quite an efficient verification mechanism for
G OAL programs.
Rewriting: Given that the formal semantics of an agent language is often given
in terms of rewrite rules (especially if it is an operational semantics), then an al-
ternative way to tackle verification would be to base it on some underlying rewrite
system. This clearly has some link to the use of an underlying logic programming
system (as in Section 5.3) as well as a link to the model-checking approaches
based on operational semantics that we consider here.
The predominant rewrite system is M AUDE, which provides an efficient and
flexible rewriting basis [25]. Indeed, the operational semantics of several agent
languages have been translated to M AUDE input [41, 124].
In [9], a programming language is defined that facilitates the implementa-
tion of coordination artifacts, which are used to regulate the behavior of individ-
ual agents. This language provides constructs inspired by social and organiza-
tional concepts and allows different operational semantics (vis-a-vis the schedul-
ing mechanism of such constructs). They show that a particular semantics can be
prototyped in M AUDE. As an example, they define certain properties, enforce-
ment and regimentation, and verify them using the M AUDE LTL model checker.
Throughout this section we have assumed that the models analyzed, and often
produced through the language’s operational semantics, exactly match the agent
676 Chapter 14
with tools such as S PIN or N U SMV. In spite of this, agent program verification is
clearly very useful.
add_belief(b)
Beliefs, Intentions, . . .
−→ Beliefs ∪ {b}, Intentions, . . .
where the set of beliefs is updated with the new belief, “b,” to generate a new
configuration. We must generate many, usually more complex, rules in order to
provide the operational semantics of our language.
Then there are two particular ways in which we might use the operational se-
mantics. The first is to provide an implementation. Since such an operational
semantics essentially describes a language interpreter, then the language can be
implemented just by encoding the operational semantic rules. Then, as we have
seen in previous sections, we might use the operational semantics as the basis
for a model checker. However, every time we tackle a new language, we must
go through this process again, defining configurations, transitions, practical im-
plementation, and model-checking procedures. A particularly awkward aspect is
defining how the model-checking procedure accesses/evaluates beliefs, intentions,
etc., within the agent execution. Finally, since many agent languages are actually
very similar, then there is surely scope for some re-use of the above aspects.
The Agent Infrastructure Layer (AIL) is essentially a toolkit that aids the de-
velopment of all the above aspects for BDI-like, JAVA-based, agent programming
languages [29].
The idea is as follows. When you have an idea for a new agent programming
language, you can access the AIL toolkit to build an operational semantics for
the language. Once such a semantics is built, the AIL toolkit naturally provides a
JAVA implementation (since the semantic elements are all objects/classes within
JAVA) and also provides ways in which a special model checker (called AJPF) can
access the components of the semantics. Although AIL provides a wide range
of “ready made” semantic components and rules corresponding to typical BDI
language features, the developer still has the capability to write new semantic
rules (so long as they respect the interfaces and interactions required).
678 Chapter 14
Multi-Agent
Multi-Ag
M gent Program L
Legend: AIL optional data/scheduling
AJP
A JP
AJPF PF veriification target
verification heuristics
((AgentS
(A
(AgentSpeak
A
AgentSp
tS
S eak , 3APL, AJPF property handling choice
Jadex,
JJad d MetateM,
dex Me
dex, M etateM GOAL,
M, GO
G ALL
L, generator
Gw
G
Gwendolen,
wendoleen, SAA
SAAPL,
APLL, ...
...))
property search
checker listener
system/ search
apps observation
||
Parallel
Java Interpretation of Java listener object encapsulating
Exploration
the Agent Program a model of the "Bad" possible paths
7.5 Example
Recall again the example of two robots working together to manufacture an arti-
fact, introduced in Chapters 13 and 15. We considered some of the requirements
of such a scenario in Section 2.5. Now, if we wish to apply algorithmic verifica-
tion techniques to assess some of these requirements, we need either a model of
all possible executions in the system, or a program for the system. If we have a
model, for example, generated through the operational semantics, then we can use
traditional model checking as in Section 6. Alternatively, if we have a program,
680 Chapter 14
such as the one described in Chapter 13, we might apply program verification
techniques as described above. Note again, however, that such direct program
verification is particularly slow.
Finally, in this section, we note that the MCAPL (i.e., AIL+AJPF) framework is
increasingly used for verifying non-trivial agent-based systems. As well as the
heterogeneous agent system from [27], the O RWELL normative agent language
has been verified through AIL. On a more practical level, in [131], this approach
is used to verify key parts of the control for an unmanned air vehicle.
8 Conclusions
As agents are being found to be useful in more and more application areas,
the need for formal specification and verification techniques specifically devised
for agents becomes more acute. Agents are not only being used in “harmless”
software for I NTERNET search and user interfaces, but are increasingly used in
business-critical areas. Here, the viability of businesses depends on the reliabil-
ity and effectiveness of the agent systems. Yet it is safety-critical applications
for which agent reliability will have the most impact. Sophisticated autonomous,
pervasive, and ubiquitous systems are being developed and deployed, with many
incorporating some form of “intelligent decision making” encapsulated within an
agent. Systems such as robots, space probes, intelligent homes, medical monitor-
ing, and unmanned vehicles typically involve agents of some form or other. The
critical nature of all of these, often with human life at risk, means that it is vital
to have techniques for comprehensively analyzing the reliability of the underlying
agent software.
As we have seen in this chapter, the use of formal logics is central to research
involved in providing sophisticated analysis tools. The flexibility and range of
logics available allows us to specify the properties we require of our agents; and
the variety of techniques available allows us to verify (often automatically) the
properties of our agent systems. While there is a great deal of important and in-
teresting research that continues to be produced, it should be clear that these areas
are still under active investigation. There are increasingly practical verification
tools for agents, such as MCAPL [29, 85] and MCMAS [82, 86], but these es-
sentially remain prototypes. In spite of this, however, they are beginning to be
tried out in industrial situations. This is because the speed of development of the
autonomous, pervasive, and autonomic systems described above is increasing, yet
there are concerns about the reliability of the agents within them. For example,
many aerospace companies are developing unmanned air vehicles (UAVs) for use
in civilian applications, yet few have a clear idea about the reliability of the “in-
Chapter 14 681
telligent” decision-making agents that are often at the heart of such vehicles. So,
the need for comprehensive analysis, preferably through formal verification, is
acute [131].
As much of the work described in this chapter, particularly that on agent veri-
fication, is leading-edge research, there are clearly many open research issues. We
have already mentioned some of these within the text, but will recap a few of them
here. While there has been significant work on the model checking of temporal
logics, and indeed of temporal logics of knowledge, we have seen that this is not
enough. Rational agents incorporate explicit representations of the motivations for
their choices, and these are typically captured through goals, intentions, desires,
etc. So, it is crucial to be able to model check combinations of time, knowledge,
and goals. There has been some work in this direction, but much more is required.
Another obvious direction, especially when real systems are being targeted,
is to address the uncertain and continuous nature of real-world interactions. If
our agents must deal with environmental sensors, then such sensors will never be
infallible or precise. If our agents deal with physical processes or control systems,
then these are typically represented as continuous systems. And so on. So, if we
wish to verify the behavior of agents in such real systems, then we are likely to
need to incorporate probabilistic and hybrid verification. Again, while some work
on this has been carried out, there is much left to do.
Acknowledgments
We thank Nils Bulling, Wojtek Jamroga, and Thomas Agotnes for the use of
some material from joint papers, in particular Example 14.1, which went into Sec-
tion 2.2 and Section 6. Nils Buling also proofread parts of this chapter and helped
us to improve an earlier version. J´’urgen Dix acknowledges that this work was
partly funded by the NTH School for IT Ecosystems. (NTH [Nieders´’achsische
Technische Hochschule] is a joint university consisting of Technische Universit´’at
Braunschweig, Technische Universit´’at Clausthal, and Leibniz Universit´’at Han-
nover.)
9 Exercises
1. Level 1 State a formula in LTL that expresses deadlock freedom.
2. Level 1 Show that LTL can be seen as a fragment of first-order logic. Trans-
form a LTL formula into first-order logic extended by the natural numbers
and including the binary predicate “≤”.
682 Chapter 14
3. Level 1 The models of LTL are infinite paths. Show that we can also handle
Kripke models that contain finite paths.
9. Level 2 Work out the differences between the ∗ versions and their restricted
versions for ATL, CTL, and LTL. Show that the ∗ versions are really more
expressive by giving example formulae and explaining how these cannot be
expressed in the restricted variety.
a) iK p → K ip,
b) Bq → Kq,
c) Dr → Ir,
d) Is → B s.
11. Level 3 Check the undecidable entries in Table 14.1 by finding the appro-
priate references, and work through the proofs.
12. Level 3 Check the last column in Table 14.1, and work through the proofs
(by consulting the appropriate literature) for modular interpreted systems.
Chapter 14 683
13. Level 4 Consider the cognitive agent language CASL and the verification
environment for it. Try to formalize the example in this chapter (or the one
in Chapter 13) in this language, and verify it using the PVS verification
system.
14. Level 4 Choose a BDI-based agent programming language that you have
an operational semantics for. Then implement (at least the core of) this
semantics within the AIL. Define the semantic configurations and syntax,
recast the operational semantic rules in terms of AIL primitives, and then
test and evaluate the semantics.
References
[1] M. Abadi and Z. Manna. Temporal Logic Programming. Journal of Symbolic
Computation, 8: 277–295, 1989.
[2] Marco Alberti, Federico Chesani, Marco Gavanelli, Evelina Lamma, Paola Mello,
and Paolo Torroni. Compliance Verification of Agent Interaction: A Logic-based
Tool. Applied Artificial Intelligence, 20(2-4):133–157, February-April 2006.
[3] Marco Alberti, Marco Gavanelli, Evelina Lamma, Paola Mello, and Paolo Tor-
roni. The SCIFF Abductive Proof Procedure. In Advances in Artificial Intelligence
(AI*IA), volume 3673 of Lecture Notes in Artificial Intelligence, pages 135–147,
Heidelberg, Germany, 2005. Springer-Verlag.
[4] Natasha Alechina, Mehdi Dastani, Brian Logan, and John-Jules Ch. Meyer. Rea-
soning about Agent Deliberation. Autonomous Agents and Multi-Agent Systems,
22(2):356–381, 2011.
[5] R. Alur and T.A. Henzinger. Reactive Modules. Formal Methods in System Design,
15(1):7–48, 1999.
[8] Paul Ammann and Jeff Offutt. Introduction to Software Testing. Cambridge Uni-
versity Press, 2008.
[9] Lacramioara Astefanoaei, Mehdi Dastani, John-Jules Meyer, and Frank S. de Boer.
On the Semantics and Verification of Normative Multi-agent Systems. Journal of
Universal Computer Science, 15(13):2629–2652, 2009.
684 Chapter 14
[10] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press,
2008.
[11] Matteo Baldoni, Cristina Baroglio, Alberto Martelli, and Viviana Patti. Verification
of Protocol Conformance and Agent Interoperability. In Proc. Sixth International
Workshop on Computational Logic in Multi-agent Systems (CLIMA), volume 3900
of Lecture Notes in Computer Science, pages 265–283. Springer-Verlag, 2005.
[12] T. Ball and S.K. Rajamani. The SLAM Toolkit. In Proc. 13th International Con-
ference on Computer Aided Verification (CAV), volume 2102 of LNCS, pages 260–
264. Springer, 2001.
[15] Patrick Blackburn, Johan van Benthem, and Frank Wolter, editors. Handbook of
Modal Logic. Elsevier, 2006.
[17] Rafael H. Bordini, Michael Fisher, Carmen Pardavila, and Michael Wooldridge.
Model Checking AgentSpeak. In Proc. 2nd International Joint Conference on
Autonomous Agents and Multi-Agent Systems (AAMAS-2003), 2003.
[18] Nils Bulling, Jürgen Dix, and Wojciech Jamroga. Model Checking Logics of
Strategic Ability: Complexity. In Mehdi Dastani, Koen V. Hindriks, and John-
Jules Ch. Meyer, editors, Specification and Verification of Multi-agent Systems,
pages 125–158. Springer, 2010.
[19] K.M. Chandy and Jayadev Misra. An Example of Stepwise Refinement of Dis-
tributed Programs: Quiescence Detection. ACM Trans. Programming Languages
and Systems, 8(3):326–343, 1986.
[20] Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith.
Counterexample-Guided Abstraction Refinement for Symbolic Model Checking.
Journal of the ACM, 50(5):752–794, 2003.
[21] Edmund M. Clarke, Orna Grumberg, and David E. Long. Model Checking
and Abstraction. ACM Transactions on Programming Languages and Systems,
16(5):1512–1542, 1994.
Chapter 14 685
[22] Edmund M. Clarke, Orna Grumberg, and Doron Peled. Model Checking. MIT
Press, 1999.
[23] E.M. Clarke and E.A. Emerson. Design and Synthesis of Synchronization Skele-
tons Using Branching-Time Temporal Logic. In Proc. Logics of Programs Work-
shop, volume 131 of Lecture Notes in Computer Science, pages 52–71, 1981.
[24] E.M. Clarke, E.A. Emerson, and A.P. Sistla. Automatic Verification of Finite-State
Concurrent Systems Using Temporal Logic Specifications. ACM Transactions on
Programming Languages and Systems, 8(2):244–263, 1986.
[25] Manuel Clavel, Francisco Durán, Steven Eker, Patrick Lincoln, Narciso Martí-
Oliet, José Meseguer, and Carolyn Talcott. The Maude 2.0 System. In Robert
Nieuwenhuis, editor, Rewriting Techniques and Applications (RTA 2003), number
2706 in Lecture Notes in Computer Science, pages 76–87. Springer-Verlag, June
2003.
[26] F. S. de Boer, K. V. Hindriks, W. van der Hoek, and J.-J. Ch. Meyer. A Verification
Framework for Agent Programming with Declarative Goals. J. Applied Logic,
5(2):277–302, 2007.
[28] Louise A. Dennis and Berndt Farwer. Gwendolen: A BDI Language for Verifiable
Agents. In Benedikt Löwe, editor, Proc. AISB’08 Workshop on Logic and the
Simulation of Interaction and Reasoning, Aberdeen, 2008. AISB.
[29] Louise A. Dennis, Michael Fisher, Matthew Webster, and Rafael H. Bordini.
Model Checking Agent Programming Languages. Automated Software Engineer-
ing, 19(1):5–63, 2012.
[32] Catalin Dima and Ferucio Laurentiu Tiplea. Model-checking ATL under Imperfect
Information and Perfect Recall Semantics is Undecidable. CoRR, abs/1102.4225,
2011.
[34] Jürgen Dix, Sarit Kraus, and V. S. Subrahmanian. Heterogeneous Temporal Prob-
abilistic Agents. ACM Trans. Comput. Log., 7(1):151–198, 2006.
[36] E. A. Emerson. Temporal and Modal Logic. In J. van Leeuwen, editor, Handbook
of Theoretical Computer Science, pages 996–1072. Elsevier, 1990.
[37] E. Allen Emerson and Chin-Laung Lei. Modalities for Model Checking: Branching
Time Logic Strikes Back. Science of Computer Programming, 8(3):275–306, June
1987.
[38] E.A. Emerson and J.Y. Halpern. “Sometimes” and “Not Never” Revisited: On
Branching versus Linear Time Temporal Logic. Journal of the ACM, 33(1):151–
178, 1986.
[39] Jonathan Ezekiel, Alessio Lomuscio, Levente Molnar, and Sandor M. Veres. Ver-
ifying Fault Tolerance and Self-Diagnosability of an Autonomous Underwater Ve-
hicle. In Toby Walsh, editor, IJCAI, pages 1659–1664. IJCAI/AAAI, 2011.
[41] Berndt Farwer and Louise A. Dennis. Translating into an Intermediate Agent
Layer: A Prototype in Maude. In Proc. International Workshop on Concurrency,
Specification and Programming (CS&P), Lagow, Poland, September 2007.
[42] Marcelo Finger and Dov M. Gabbay. Combining Temporal Logic Systems. Notre
Dame Journal of Formal Logic, 37(2):204–232, 1996.
[46] Michael Fisher. Temporal Semantics for Concurrent MetateM. Journal of Symbolic
Computation, 22(5/6):627–648, 1996.
[47] Michael Fisher. Temporal Representation and Reasoning. In Frank van Harmelen,
Bruce Porter, and Vladimir Lifschitz, editors, Handbook of Knowledge Represen-
tation, pages 513–550. Elsevier Press, 2007.
Chapter 14 687
[49] Michael Fisher and Anthony Hepple. Executing Logical Agent Specifications. In
Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni,
editors, Multi-agent Programming: Languages, Tools and Applications, pages 1–
27. Springer, 2009.
[51] D. Gabbay, A. Pnueli, S. Shelah, and J. Stavi. The Temporal Analysis of Fair-
ness. In Proc. 7th ACM Symposium on the Principles of Programming Languages
(POPL), pages 163–173. ACM Press, 1980.
[53] Rob Gerth, Doron Peled, Moshe Y. Vardi, and Pierre Wolper. Simple On-the-
fly Automatic Verification of Linear Temporal Logic. In Proc. 15th Workshop on
Protocol Specification Testing and Verification (PSTV), pages 3–18. Chapman &
Hall, 1995.
[54] Laura Giordano and Alberto Martelli. Verifying Agents’ Conformance with Mul-
tiparty Protocols. In Michael Fisher, Fariba Sadri, and Michael Thielscher, edi-
tors, CLIMA, volume 5405 of Lecture Notes in Computer Science, pages 17–36.
Springer, 2008.
[55] Laura Giordano, Alberto Martelli, and Camilla Schwind. Specifying and Verify-
ing Interaction Protocols in a Temporal Action Logic. Journal of Applied Logic,
5(2):214–234, 2007.
[56] James Gleick. A Bug and a Crash — Sometimes a Bug Is More Than a Nuisance,
1996. http://www.around.com/ariane.html.
[57] Dimitar P. Guelev, Catalin Dima, and Constantin Enea. An Alternating-Time Tem-
poral Logic with Knowledge, Perfect Recall and Past: Axiomatisation and Model-
Checking. Journal of Applied Non-Classical Logics, 21(1):93–131, 2011.
[59] David Harel, Dexter Kozen, and Jerzy Tiuryn. Dynamic Logic. MIT Press, Cam-
bridge, MA, USA, 2000.
688 Chapter 14
[60] Klaus Havelund and Grigore Rosu. Monitoring Programs Using Rewriting. In
Proc. 16th IEEE International Conference on Automated Software Engineering
(ASE), pages 135–143. IEEE Computer Society Press, 2001.
[61] Robert M. Hierons, Kirill Bogdanov, Jonathan P. Bowen, Rance Cleaveland, John
Derrick, Jeremy Dick, Marian Gheorghe, Mark Harman, Kalpesh Kapoor, Paul
Krause, Gerald Lüttgen, Anthony J. H. Simons, Sergiy A. Vilkomir, Martin R.
Woodward, and Hussein Zedan. Using Formal Specifications to Support Testing.
ACM Comput. Surv., 41(2):1–76, 2009.
[63] Gerard J. Holzmann. Logic Verification of ANSI-C Code with SPIN. In Proc.
7th International SPIN Workshop on Model Checking of Software (SPIN), volume
1885 of LNCS, pages 131–147. Springer, 2000.
[64] Gerard J. Holzmann. The Spin Model Checker: Primer and Reference Manual.
Addison-Wesley, 2003.
[65] Gerard J. Holzmann and Margaret H. Smith. A Practical Method for Verifying
Event-Driven Software. In Proc. International Conference on Software Engineer-
ing (ICSE), pages 597–607, 1999.
[66] Gerard J. Holzmann and Margaret H. Smith. Software Model Checking. In Proc.
Formal Description Techniques (FORTE), pages 481–497, 1999.
[69] W. Jamroga and J. Dix. Model Checking Abilities of Agents: A Closer Look.
Theory of Computing Systems, 42(3):366–410, 2008.
[70] G.L.J. M. Janssen. Hardware Verification using Temporal Logic: A Practical View.
In Formal VLSI Correctness Verification, VLSI Design Methods-II. Elsevier Sci-
ence Publishers, 1990.
[71] Sung-Shik T. Q. Jongmans, Koen V. Hindriks, and M. Birna van Riemsdijk. Model
Checking Agent Programs by Using the Program Interpreter. In Proc. 11th In-
ternational Workshop on Computational Logic in Multi-agent Systems (CLIMA),
volume 6245 of LNCS, pages 219–237. Springer, 2010.
Chapter 14 689
[73] Brian W. Kernighan and Dennis Ritchie. The C Programming Language, Second
Edition. Prentice-Hall, 1988.
[74] Michael Köster and Peter Lohmann. Abstraction for model checking modular in-
terpreted systems over ATL. In Liz Sonenberg, Peter Stone, Kagan Tumer, and
Pinar Yolum, editors, AAMAS, pages 1129–1130. IFAAMAS, 2011.
[79] Francois Laroussinie, Nicolas Markey, and Ghassan Oreiby. On the Expressiveness
and Complexity of ATL. LMCS, 4:7, 2008.
[80] Chuchang Liu, Mehmet A. Orgun, and Kang Zhang. A Parallel Execution Model
for Chronolog. Computer Systems: Science & Engineering, 16(4):215–228, 2001.
[81] Alessio Lomuscio, Wojciech Penczek, and Hongyang Qu. Partial Order Reduc-
tions for Model Checking Temporal-epistemic Logics over Interleaved Multi-agent
Systems. Fundamenta Informaticae, 101(1-2):71–90, 2010.
[82] Alessio Lomuscio, Hongyang Qu, and Franco Raimondi. MCMAS: A Model
Checker for the Verification of Multi-agent Systems. In Proc. 21st International
Conference on Computer Aided Verification (CAV), volume 5643 of LNCS, pages
682–688. Springer, 2009.
[83] Alessio Lomuscio and Franco Raimondi. MCMAS: A Model Checker for Multi-
agent Systems. In Proc. 12th International Conference on Tools and Algorithms for
the Construction and Analysis of Systems (TACAS), volume 3920 of LNCS, pages
450–454. Springer, 2006.
[84] Zohar Manna and Richard J. Waldinger. Toward Automatic Program Synthesis.
ACM Communications, 14(3):151–165, 1971.
690 Chapter 14
[88] J.J. Meyer, W. van der Hoek, and B. van Linder. A Logical Approach to the Dy-
namics of Commitments. Artificial Intelligence, 113(1-2):1–40, 1999.
[89] Ali Mili, Jules Desharnais, and Jean Raymond Gagné. Formal Models of Stepwise
Refinements of Programs. ACM Computer Surveys, 18(3):231–276, 1986.
[90] Peter G. Neumann. Cause of AT&T Network Failure. The Risks Digest, 9(62),
1990. http://catless.ncl.ac.uk/Risks/9.62.html#subj2.1.
[92] M.A. Orgun and W. Wadge. Theory and Practice of Temporal Logic Program-
ming. In L. Fariñas del Cerro and M. Penttonen, editors, Intensional Logics for
Programming. Oxford University Press, 1992.
[93] Mehmet A. Orgun and William W. Wadge. Towards a Unified Theory of Inten-
sional Logic Programming. Journal of Logic Programming, 13(1–4):413–440,
1992.
[94] Sam Owre, John Rushby, N. Shankar, and David Stringer-Calvert. PVS: An Ex-
perience Report. In Dieter Hutter, Werner Stephan, Paolo Traverso, and Markus
Ullman, editors, Applied Formal Methods, volume 1641 of Lecture Notes in Com-
puter Science, pages 338–345. Springer, 1998.
[96] L.C. Paulson. A Generic Theorem Prover, volume 828 of Lecture Notes in Com-
puter Science. Springer, 1994.
[98] Nir Piterman, Amir Pnueli, and Yaniv Sa’ar. Synthesis of Reactive(1) Designs. In
Proc. 7th International Conference on Verification, Model Checking, and Abstract
Interpretation (VMCAI), volume 3855 of LNCS, pages 364–380. Springer, 2006.
Chapter 14 691
[100] A. Pnueli and R. Rosner. On the Synthesis of a Reactive Module. In Proc. 16th
ACM Symposium on the Principles of Programming Languages (POPL), pages
179–190. ACM Press, 1989.
[102] M.R. Prasad, A. Biere, and A. Gupta. A Survey of Recent Advances in SAT-
based Formal Verification. International Journal on Software Tools for Technology
Transfer, 7(2):156–173, 2005.
[103] V. Pratt. Dynamic Logic. In J. W. DeBakker and J. van Leeuwen, editors, Com-
puter Science III, Part 2: Languages, Logic, Semantics, volume 109 of Mathemat-
ical Centre Tracts, pages 53–83. (Mathematical Centre Tracts 109), Mathematisch
Centrum, Amsterdam, 1979.
[104] F. Raimondi. Model Checking Multi-agent Systems. PhD thesis, University College
London, United Kingdom, 2006.
[108] F. Sadri and F. Toni. A Formal Analysis of KGP Agents. In Proc. European Con-
ference on Logics in Artificial Intelligence (JELIA), volume 4160 of Lecture Notes
in Artificial Intelligence, pages 413–425, Heidelberg, Germany, 2006. Springer-
Verlag.
[109] Sven Schewe and Bernd Finkbeiner. Bounded Synthesis. In Proc. 5th Interna-
tional Symposium on Automated Technology for Verification and Analysis (ATVA),
volume 4762 of LNCS, pages 474–488. Springer, 2007.
[110] K. Schild. On the Relationship Between BDI Logics and Standard Logics of Con-
currency. Journal of Autonomous Agents and Multi-Agent Systems, 3(3):259–283,
2000.
692 Chapter 14
[112] S. Shapiro, Y. Lespérance, and H.J. Levesque. The Cognitive Agents Specifica-
tion Language and Verification Environment for Multiagent Systems. In Proc.
1st International Joint Conference on Autonomous Agents and Multiagent Systems
(AAMAS), pages 19–26, New York, NY, USA, 2002. ACM Press.
[113] A.P. Sistla, M. Vardi, and P. Wolper. The Complementation Problem for Büchi
Automata with Applications to Temporal Logic. Theoretical Computer Science,
49:217–237, 1987.
[114] Leon Sterling and Ehud Shapiro. The Art of Prolog. MIT Press, 1987.
[116] V.S. Subrahmanian, Piero Bonatti, Jürgen Dix, Thomas Eiter, Sarit Kraus, Fatma
Özcan, and Robert Ross. Heterogenous Active Agents. MIT Press, 2000.
[118] W. van der Hoek, A. Lomuscio, and M. Wooldridge. On the Complexity of Practi-
cal ATL Model Checking. In Proc. AAMAS, pages 201–208, 2006.
[119] H. van Ditmarsch, W. van der Hoek, and B. Kooi. Playing Cards with Hin-
tikka — An Introduction to Dynamic Epistemic Logic. Australasian Journal of
Logic, 3:108–134, 2005. http://www.philosophy.unimelb.edu.au/
ajl/2005/2005\_8.pdf.
[120] H. van Ditmarsch, W. van der Hoek, and B. Kooi. Dynamic Epistemic Logic,
volume 337 of Synthese Library Series. Springer, 2007.
[121] Hans van Ditmarsch, Wiebe van der Hoek, and Barteld Kooi. Public Announce-
ments and Belief Expansion. In Proc. 5th International Conference on Advances
in Modal Logic (AiML), pages 335–346. King’s College Publications, 2005.
[122] Frank van Harmelen, Bruce Porter, and Vladimir Lifschitz, editors. Handbook
of Knowledge Representation, volume 2 of Foundations of Artificial Intelligence.
Elsevier Press, 2007.
[123] B. van Linder, W. van der Hoek, and J.J. Ch. Meyer. Formalising Abilities and
Opportunities of Agents. Fundamentae Informaticae, 34(1-2):53–101, 1998.
Chapter 14 693
[124] Birna van Riemsdijk, Frank S. de Boer, Mehdi Dastani, and John-Jules Ch. Meyer.
Prototyping 3APL in the Maude Term Rewriting Language. In Proc. 5th Interna-
tional Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS),
pages 1279–1281. ACM, 2006.
[125] Moshe Y. Vardi and Pierre Wolper. Reasoning About Infinite Computations. Infor-
mation and Computation, 115(1):1–37, 1994.
[126] Willem Visser, Klaus Havelund, Guillaume P. Brat, Seungjoon Park, and Flavio
Lerda. Model Checking Programs. Automated Software Engineering, 10(2):203–
232, 2003.
[127] Website, ESA. Ariane 5 Flight 501 Failure — Report by the Inquiry Board, 1996.
http://esamultimedia.esa.int/docs/esa-x-1819eng.pdf.
[129] Website, The SLAM Project: Debugging System Software via Static Analysis.
http://research.microsoft.com/slam.
[131] M. Webster, M. Fisher, N. Cameron, and M. Jump. Model Checking and the Cer-
tification of Autonomous Unmanned Aircraft Systems. In Proc. 30th International
Conference on Computer Safety, Reliability and Security (SAFECOMP), 2011.
[133] M. Wooldridge, M. Fisher, M-P. Huget, and S. Parsons. Model Checking for Multi-
agent Systems: The MABLE Language and its Applications. International Journal
of Artificial Intelligence Tools, 15(2):195–225, 2006.
Chapter 15
Agent-Oriented Software
Engineering
1 Introduction
Increasingly, software is called upon to operate successfully in complex and
dynamic environments, and to be adaptable, flexible, and robust. This can be
achieved by software designed as a collection of agents: software entities that op-
erate autonomously within their environment, and are able to proactively achieve
goals, while responding to changes in the environment [132]. For example, in a
transport logistics application [46], autonomous agents negotiate with each other
to schedule deliveries, and renegotiate in the event of delays.
There have been many demonstrated applications of agents including [88,
105]: production scheduling, simulation in a range of domains, energy production
and distribution, transport logistics [46], crisis management [113], flexible man-
ufacturing [59], air traffic control [82], and business process management [16].
There is certainly anecdotal evidence that agent technology leads to much faster
and more modular development of somewhat complex applications, particularly
those operating in dynamic domains. Unfortunately this is difficult to verify sci-
entifically or to quantify objectively. The only substantial study we are aware of
was written by an agent technology provider, based on a study done by one of
its customers. This study indicated productivity gains of up to 350% [7]. The
reason for substantial efficiency gains results at least in part from the fact that
the execution engine manages plan selection (based on context evaluation) and
696 Chapter 15
tween agents and objects, there are also key differences (e.g., autonomy, proactive-
ness), and these differences are sufficiently significant to justify the development
and use of agent-specific design methodologies. For example, when designing
an agent system, which, by definition, exhibits proactiveness, and which tends to
be conceptualized and implemented using goals, it is important to identify and
model the goals in the system. This activity, and resulting model, are not covered
by existing OO design methodologies.
However, as mentioned, there are similarities between agents and objects, and
as we will see, AOSE methodologies in general do adopt and adapt various el-
ements (techniques, notations, models, processes) from OO design where it is
applicable.
Furthermore, the phases of software development do not change because we
are using agents: we still need to identify the purpose and scope of the system-
to-be (“requirements”), plan the system’s overall structure (“design”), flesh out
the details of parts of the system (“detailed design”), implement the system, and
test and debug it. Furthermore, as with non-agent development, these phases are
typically performed in an iterative fashion, not in a strict waterfall-like sequence.
Thus, the high-level process followed by an agent-oriented methodology is similar
to any methodology in that it includes activities that are concerned with defining
the purpose of the system, designing the system (with varying degrees of detail),
and implementing, testing, and refining the system. We also note that in many
cases a system being developed will not be purely agent based, but may include
parts that are best conceptualized, designed, and implemented in terms of objects,
or in terms of procedural code. However, in this chapter we focus on those aspects
of a system which are agent based.
The aims of this chapter are, first, to give a feeling for what an AOSE method-
ology looks like, without going into full details (which would require a whole
book!), and, second, to give a sense of the current state of work in the field: what
has been done and what the outstanding challenges are. Most of the chapter (Sec-
tions 4–9) describes the activities that one might find in a typical AOSE method-
ology – that is, requirements, design, detailed design, implementation, assurance
(e.g., testing), as well as software maintenance. Each section begins by discussing
the common “core,” i.e., activities and models that are common to a number of
methodologies. Each section then goes on to discuss some of the interesting vari-
ations, i.e., particular activities or models that are unique to a small number of
methodologies. In this way we hope to give a sense of where there is general
agreement in the field on the use of particular models and activities, and where
there are differences between the various proposed methodologies. Sections 4–9
are preceded by a discussion of the foundational concepts (Section 2) and by an
introduction to the running example (Section 3). The presentation is followed by a
698 Chapter 15
Year Methodologies
1995 DESIRE
1996 AAII, MAS-CommonKADS
1999 MaSE
2000 Gaia (v1), Tropos
2001 MESSAGE, Prometheus
2002 PASSI, INGENIAS
2003 Gaia (v2)
2005 ADEM
2007 O-MaSE
fluential, and hence is clearly significant, but, like UML, it is a notation, not a
methodology.
Roughly speaking, we can see the history of AOSE methodologies in terms
of three generations. The first generation of methodologies emerged in the mid
to late 1990s. They can be characterized as being generally briefly described
(e.g., a single brief paper), lacking tool support, and sometimes not covering all
of the core activities of analysis, design, and detailed design. This first generation
includes1 DESIRE [14], AAII [78], MAS-CommonKADS [70], and Gaia [131].
The second generation of methodologies emerged in the late 1990s and early
2000s. They can be characterized as having detailed descriptions (multiple or
longer papers, and in the case of Prometheus a text book), having tool support,
and covering all the core activities from analysis through to implementation. The
second generation of methodologies includes MaSE [37, 40], Tropos [15, 89],
MESSAGE [31, 54], and Prometheus [94, 95, 130]. Tropos is interesting in that
for a long time it didn’t have tool support. The extended version of Gaia [137]
can also be viewed as a second-generation methodology, although for a long time
it too lacked tool support.
A third generation of methodologies emerged in the mid to late 2000s. Com-
pared with the second generation, these third-generation methodologies (PASSI,
INGENIAS, and ADEM) can be characterized as having an increased focus on
compatibility with UML as a notation and/or a focus on model-driven develop-
ment. They also tend to be more complex than second-generation methodologies.
Note that although initial papers on PASSI [17] and INGENIAS [58] appeared
in 2002, the INGENIAS methodology didn’t really crystalize until the mid 2000s
[100], and, similarly, the definitive PASSI paper appeared in 2005 [29]. It is
also notable that some of the INGENIAS developers were involved in developing
MESSAGE. The ADEM methodology was first described in 2005 [24], and the
associated notation (AML) was first described in 2004 [25]. However, the defini-
tive description of AML is a 2007 book [23]. ADEM and AML are influenced by
many earlier methodologies, including most of the methodologies in Figure 15.1,
and also by UML, OCL, and RUP.
There are two key observations that can be made. First, there is limited recent
work on developing new AOSE methodologies: the prominent and significant
second- and third-generation methodologies are well-developed and supported,
and it is hard to justify developing yet another methodology.
The second key observation concerns diversity and convergence. In the early
years of work on AOSE methodologies, there was a lot of diversity and dozens
of methodologies. Over the years, most of these methodologies have faded away,
1 Note that citations given are sometimes the definitive description, rather than the earliest paper
available.
700 Chapter 15
and a smaller number of methodologies, which have seen significant work – and
typically the development of tool support – have remained active and prominent.
More recently, there is increasing awareness of the drawback of diversity, and the
need to standardize (e.g., [60]) or, at least, to try and reduce unnecessary differ-
ences between methodologies (e.g., [96]). Looking forward, we might expect the
future of AOSE research to focus on consolidation and standardization, rather than
on the development of more methodologies. See Section 11 for further discussion
of future directions.
2 Agent Concepts
As discussed earlier in this book (see Chapter 1), agents are defined as having
certain properties, such as being proactive, and being situated in an environment.
In order to design systems of agents that have these properties, we need to use
certain design concepts. For example, one way of designing agents that display
proactive behavior is to model, design, and implement them in terms of the con-
cept of goals. We now consider in turn each of the defining properties of agents,
and which concepts can be used to support the design and implementation of
agents that possess a given property. The properties and concepts used to support
them are summarized in Figure 15.2.
The first, and most basic, property of agents and agent systems is that they
are situated in an environment. In order to design agents and agent systems
that inhabit environments, we need to model the environment in some way. At
a minimum we need to capture the interface between the agent system and its
environment. This can be done in terms of the ways in which agents affect their
environment (“actions”), and the ways in which the agent system is affected by
its environment, typically by receiving information from the environment (“per-
cepts”). For example, a manufacturing robot may have certain actions that it can
perform (“load a part,” “join two parts”), and may be able to perceive certain infor-
mation from outside the system (“manufacturing request,” “table malfunction”).
Chapter 15 701
In describing actions and percepts one also considers properties of the environ-
ment such as [112]: do actions have predictable outcomes?, is the environment
fully visible?, and can actions fail? For example, a robot’s actions may be sub-
ject to failure, but may nonetheless have predictable outcomes: an action either
succeeds or reports an error, in which case it has no effect. Finally, although mod-
eling the environment in terms of its interface with the agent system is sufficient
for many systems, in some situations a richer model of the environment can be
valuable (e.g., [109], and see Chapter 13).
A key property of agents that distinguishes them from objects is that they are
expected to behave in a way that balances the pursuit of goals (“proactive”) with
responding to significant changes in their situation (“reactive”). In order to design
and implement agents that exhibit proactive behavior we use the concept of goals.
A goal is a certain condition that the agent persistently strives to bring about [63].
Although the literature discusses a range of goal types (e.g., [13, 36, 47, 62, 123]),
in practice it is often sufficient to consider so-called “achievement” goals. These
are goals that are described in terms of a condition that is required to hold at a
single point in time (for example, having completed manufacturing a part). Since
goals are persistently pursued by the agent, they result in autonomous behavior:
the agent continues to actively pursue its goals without requiring external guidance
or stimulus.
In order to design and implement agents that are able to be reactive (i.e., re-
spond in a timely manner to changes in the situation), we design them using the
concept of events. An event is some change of status that is significant to the
agent. For example, a machine breaking down, or the arrival of a new order re-
quest. Events can arise from the receipt of messages from other agents, or from
internal changes. They can also arise from percepts, if the information from the
environment is significant.
Finally, agent systems are comprised of a number of agents that interact (i.e.,
are social). There are many concepts and approaches that could be used to design
interacting agents, including norms, social commitments, institutions, and agent
society models (see Section 11 for further discussion, and also see Chapters 2 and
13). However, the minimal concept that is almost invariably used to support the
design of social agents is messages. When designing message-based interaction,
it is often useful to consider a collection of related messages together, which is
usually done by grouping related messages in an interaction protocol (“protocol”
for short).2 It is worth noting that it is also possible to design agents that interact
without messages, by making changes to the environment which other agents ob-
2 These protocols are specifications of the message types exchanged in a particular interaction,
or conversation, and their ordering, not the details of the low-level exchange mechanisms and their
transfer protocols.
702 Chapter 15
serve [97]. This approach, which is called “stigmergy,” is often used with systems
that comprise a large number of very simple agents, and in which goals are not
represented in the agents, but are hardwired into their behavior. In the remainder
of this chapter we will focus on the so-called cognitive approach, in which agents
are more coarse-grained, use some form of (limited) deliberation to select their
actions, and communicate directly using messages.
3 Running Example
This chapter uses a running example to illustrate the design of a multiagent sys-
tem. The example used is that of a holonic manufacturing system. We now briefly
introduce this example. Since this chapter is about agent-oriented software engi-
neering and not holonic manufacturing per se, our coverage of holonic manufac-
turing is very brief, and we refer the interested reader to the literature for further
information (e.g., [74]).
Traditional approaches to manufacturing tend to use a fixed layout and pro-
cess, which works well when manufacturing jobs are consistent and identical, but
are not well suited to flexible manufacturing where jobs may vary in details (e.g.,
packing different items into a box), and the sizes of orders may be relatively small.
Holonic manufacturing has been proposed as a means of realizing flexible manu-
facturing by conceptualizing a manufacturing process in terms of a collection of
autonomous entities that interact to realize system goals.
A holon (from the Greek word “holos,” whole, and the suffix “-on,” part-of)
is an independent entity that does not exist in isolation, but is part of something
larger. Specifically, holons are viewed as being part of a holarchy, which is a
hierarchy of holons. Although proposed independently, it is clear (and recognized)
that there is much similarity between holons and autonomous agents [19]: both
agents and holons are autonomous entities that interact with each other to realize
design goals. One difference is that holons are viewed as existing in a hierarchical
structure, whereas agents may or may not exist in a hierarchical structure.
The simple manufacturing scenario that we use is based on the assembly cell
described by Jarvis et al. [74]. There are three parts – labeled A, B, and C – and
the assembly unit needs to assemble these into combined parts, which may be
“ABC” parts or “AB” parts, using the following process (see Figure 15.3):
1. robot1 loads an A part into one of the jigs on the rotating table
2. robot1 loads a B part on top of it
3. the table rotates so the A and B parts are at robot2
4. robot2 joins the parts together, yielding an “AB” part
Chapter 15 703
C buffer
B buffer
A buffer
robot1
robot2 jig 1 jig 2 (loads &
(joins)
unloads)
rotating table
flipper
– load(part), which loads a particular part type onto the jig at R1’s posi-
tion (E(ast) in this example).
– unload(), which unloads the part at R1’s position.
– moveToFlipper(), which moves the part on the jig at R1’s position to
the flipper.
– moveFromFlipper(), which moves the part on the flipper, to the jig at
R1’s position.
• robot2:
– join(jig), which joins the bottom part at specified jig to the top part
(which may be a composite part).
• flipper:
• table:
4 Requirements
The requirements activity is concerned with defining the required functionality of
the agent system-to-be. Our description will focus on techniques, notations, and
general processes that are used in a number of AOSE methodologies. We will
Chapter 15 705
finish this section with a brief discussion of some techniques that are specific to
certain AOSE methodologies. The sections on design and detailed design will also
follow this pattern of discussing the “commonalities” first, and then highlighting
interesting differences.
There are three commonly used activities in agent-oriented requirements:
• specifying instances of desired system behavior using scenarios;
pickAndPlacer: this role is responsible for moving parts in and out of the jig
when it is located on the East side of the table. Associated actions are: load,
moveToFlipper, moveFromFlipper, unload.
706 Chapter 15
Scenario: manufacturePart(ABC)
Type Name Roles
G build2 manager, pickAndPlacer, fastener
G decideParts manager
G loadPart pickAndPlacer
A load(A) pickAndPlacer
G loadPart pickAndPlacer
A load(B) pickAndPlacer
G fastenParts fastener, transporter
A rotateTo(1,W) transporter
A join(1) fastener
G addPart manager, pickAndPlacer, fastener
G decideNext manager
G flipOver manager
A rotateTo(1,E) transporter
A moveToFlipper() pickAndPlacer
A flip() flipper
G loadPart pickAndPlacer
A load(C) pickAndPlacer [in parallel with flip]
A moveFromFlipper() pickAndPlacer
G fastenParts fastener, transporter
A rotateTo(1,W) transporter
A join(1) fastener
G complete manager
G assess manager
A rotateTo(1,E) transporter
A unload() pickAndPlacer
fastener: this role is responsible for joining parts together. Associated action:
join.
transporter: this role is responsible for transporting items by rotating the table.
Associated action: rotateTo.
flipper: this role is responsible for flipping parts using the “flip” action.
Note that actions are always associated with the role that performs the action.
Goals need to be associated with roles. Typically, low-level goals are associ-
ated with a single role, whereas high-level goals are associated with multiple roles.
Chapter 15 707
For example, the goal “fastenParts” has two roles that are jointly responsible for
the achievement of the goal. However, in some cases we may choose to assign
a high-level goal to a single role, which initiates, or has overall responsibility for
the goal. For example, the goal “complete” is assigned to the “manager” role only,
even though its achievement involves other roles as well.
The notation used in Figure 15.4 also indicates the “nesting” by using inden-
tation of the step type. The first step (“build2”) has its type, G, unindented, rep-
resenting a top-level goal. The second step (“decideParts”) is indented to indicate
that decideParts is a subgoal of build2. Similarly, the load(A) action is indented
to indicate that it is a part of the achievement of the loadPart goal. The structure
of goals implied by the indentation is based on the goal model, which is derived
in parallel with the scenario. However, as explained below, it may be the case
that subgoals of different higher-level goals are interleaved, so that a subgoal may
not belong to its nearest outer goal, but to something earlier. Finally, note that
a scenario only captures one possible execution trace, rather than capturing all
possibilities.
The second commonly used activity in requirements is to capture the goals of
the system using a goal model. The goal model is complementary to the use case
scenarios [110] in that it captures the different system functionalities and their
relationships, and is not specific to a given execution trace. It is worth noting
that the use of goals for requirements is motivated both by the fact that agents are
defined in terms of goals, but also by evidence (from non-agent work) that goals
are a good way to model requirements [122].
Creating a goal model can be done by identifying certain goals from the sce-
narios, and then refining them. One technique for refinement is asking “why” a
particular goal is achieved, which identifies its parent goal, and “how” a particular
goal is achieved, which identifies its subgoals [122]. Another technique is to con-
sider how goals influence other goals. In the case of the holonic manufacturing
system, the goal model is actually quite simple. One possible goal model might
be defined by beginning with a top-level goal (“manufacturePart”) and asking how
the goal is achieved, leading to the identification of subgoals. Asking why regard-
ing an identified goal can lead to identification of motivating goals, which in turn
leads to identification of additional subgoals. For example, asking why for manu-
facturePart could lead to identifying a goal of filling orders, which in turn leads to
identification of subgoals to obtain orders and prioritize orders.
Some methodologies capture the goal model using a tree structure where each
goal has as children its subgoals. Other methodologies use more sophisticated
models, which can capture the influences between goals, e.g., that one goal in-
hibits or supports the achievement of another goal [15]; or that particular goals
are triggered by certain events [53]. Figure 15.5 shows a simple goal model for
708 Chapter 15
Key
the holonic manufacturing system. This goal model does not show dependencies
between goals, other than the parent-child relationship (depicted by an arrow from
the parent goal to the subgoal). Note that Figure 15.5 also depicts actions, which
can be helpful in understanding the design, but is actually not normally done by
methodologies.
Finally, as discussed earlier, since the agent system is situated in an environ-
ment, the requirements need to include information on the environment, which can
be done by specifying the interface with the environment in terms of actions and
percepts. In some systems the interface is prescribed by existing hardware or sys-
tems. For instance, in the holonic manufacturing example, we have assumed that
each robot and piece of equipment has specific invocable actions (e.g., load(part),
rotateTo(jig,pos), etc.) In some systems the boundary between the agent system
and its environment is more flexible, and can be fully specified by the designer. In
others it may be that there is a defined low-level interface, but the designer chooses
to specify the agent interface at a higher level and implement separately the low-
level controls. For example, it may be that the interface to a robot is specified in
terms of low-level primitives such as “open-pincer,” and “move-to-position,” but
that the designer chooses a more abstract interface (e.g., load()) for developing the
agent specification. Particularly with robotic or vision systems, it is common that
a separate layer is needed to abstract from the lowest level of sensor input (e.g.,
pixel maps) and effector actions (which often need feedback control mechanisms
at the low level, and do not benefit from an agent level abstraction).
There is overlap between these three models – scenarios, goals, and environ-
ment interface – for instance, scenarios may include goals, actions, and percepts.
This overlap means that each of the three models influences the others. For ex-
ample: the goals in the scenario may be a starting point for the goal model; the
refined goal model may suggest additional goals for the use case scenarios; and
Chapter 15 709
the actions identified in the use case scenario need to be defined as part of the
environmental model.
Having discussed the common activities, we now consider some variations and
differences. One significant variation is the use of an early requirements phase. As
the name suggests, this phase, which exists in Tropos [15], based on earlier work
on i* [136], is performed before the requirements phase. The aim of the early
requirements phase is to capture the organizational context in which the system-
to-be will exist. This is done by identifying the stakeholders (e.g., users, owners,
and other relevant organizations), their goals, and the ways in which they depend
on each other and on the system-to-be. The early requirements phase allows the
requirements of the system-to-be to be motivated in relation to its organizational
context. One key benefit of doing early requirements is that the resulting models
allow for systematic consideration of how changes in the organizational context
affect the requirements of the system. Another benefit is that it is possible for
alternatives to be investigated and assessed.
5 Design
The design phase in AOSE methodologies aims to define the overall structure of
the system. It addresses the following key questions:
• What agent types exist, and what (roles and) goals do they incorporate?
• What are the communication pathways between agents?
• How do agents interact to achieve the system’s goals?
The answers to these questions are captured in two key models: a static view of
the structure of the system, and a model that captures the dynamic behavior of the
system. Further details of data, in particular, shared data, are also determined at
this stage, although this is not specifically agent-oriented.
As in the previous section, we begin by presenting certain models and pro-
cesses that are common to a number of methodologies, and thus can be regarded
as “core.” We then briefly describe some interesting differences between method-
ologies.
The first question that needs to be answered in defining the system’s structure
is, which agent types should exist? A common technique for identifying agent
types is to consider smaller “chunks” of functionality (e.g., capabilities in Tropos,
roles in Prometheus), and to make a trade-off design decision based on the various
factors that support certain chunks being grouped together. For instance, when
two chunks have a related purpose or make use of common data, then there is a
force pulling them together; in other words, there is a reason for grouping them
together in a single agent type. Some of the factors that might be considered are:
• The degree of coupling between agents. Having a system design where
each agent type needs to interact with every other agent type is undesirable
because it implies a high level of coupling: any change that is made to an
agent type may require consequent changes in some or even all other agent
types.
• The cohesiveness of agent types. A system design is easier to understand if
each agent type has a clearly defined purpose. On the other hand, if there are
agent types that do a number of unrelated things, then they become complex
and harder to understand and work with.
• Whether there are reasons to keep certain chunks in separate agents. There
are a number of reasons why it might be a good idea not to have two partic-
ular chunks in the same agent type. One reason is that we may require the
two chunks to exist on different hardware, for instance, in the holonic man-
ufacturing example, the pickAndPlacer role and the transporter role interact
Chapter 15 711
Note that in some cases the decision as to which agent types to use is already
given, or is quite clear. For instance, in a simulation where agents represent ex-
isting entities in a human organization, we would simply map each human or
organizational entity into an agent type. Similarly, in the holonic manufactur-
ing scenario it is natural to have each robot represented as a separate agent type.
Indeed, this is basically the approach that we adopt. The manager role we also
include in Robot1 as the manager role is making the decisions about what to load
and when to unload, which are closely associated with the pickAndPlacer role. If
we were including an agent that coordinates the whole system of cells, we may
have decided to put the manager role in that agent. However, in our small example,
we assign the manager role to the Robot1 agent. Assigning goals to agent types
can be done based on the assignment of goals to roles, or may be done during the
process of determining agent types. In this case, the assignment is based on the
assignment of goals to roles (see Figure 15.4). Where goals that have subparts are
assigned to an agent that does not itself do all the subparts, then the responsible
agent will need to request other agents to do the necessary subparts. An example
is the goal flipOver, which is assigned to Robot1, who will need to request as-
sistance from Table and FlipperRobot to do the actions required for this subgoal.
Similarly, the goal fastenParts, which is assigned to both the fastener and trans-
porter role, is assigned to a single agent (Robot2, which is assigned the fastener
role).
We also need to define how the agents interact with the environment; that is,
which actions and percepts each agent deals with. If these have been assigned to
roles, then they are simply assigned to the appropriate agent along with the role.
In our example the actions were predefined based on the hardware, as specified
on page 704. We must also ensure that each percept is handled by some agent. In
3 Actions are shown in italics.
712 Chapter 15
Lock(Jig)
Robot Table
lock-at(Jig, Pos)
alt
[table locked]
lockFailed
else
opt
<rotateTo(Jig,Pos)>
locked-at
ManufacturePart
Robot1 Robot2 Table FlipperRobot
ref Lock(R1,jigE)
<load>
<load>
unlock
ref Fasten
loop
ref AddPart
ref Fasten
ref Lock(R1,jigE)
<unload>
unlock
AddPart
Robot1 Table FlipperRobot
ref Lock(R1,jigE)
<moveToFlipper>
par
opt
flipRequest
<flip>
flipped
<load>
<moveFromFlipper>
unlock
not rotate. An alternative to this design could be to have a central coordinator that
oversees and coordinates all aspects. However, it is simpler and more robust to
allow each agent to manage its own part of the process. We note that the rotateTo
action in the Lock protocol is actually optional: if the jig is already in the desired
position, then no action needs to be taken.
One can verify that with these protocols in place it is possible to obtain the
scenario described earlier. It is left as an exercise for the reader to verify that with
this design it is not possible for a rotation to be performed at the same time as
another action.
Finally, in addition to communication between agents and other agents, and
between agents and the environment, we also define data. Often a good design will
not have data that is shared between agents (which is the case in the holonic man-
ufacturing example), so that data definition is done in the detailed design phase,
when the internals of each agent type are designed in detail. Again, to the extent
that the requirements phase defined the data used, this can be used as a basis for
defining data in the design phase.
716 Chapter 15
<load(X)>
Key:
>manufacture< <moveToFlipper> <action>
Robot1 >percept<
<moveFromFlipper>
<unload>
agent
AddPart Lock
Fasten Protocol
Unlock
Message
Lock
Flipper Table Robot2
Robot
The basic structure of the system can now be captured with a model that spec-
ifies the agent types, any shared data that is used, and the communication links
between agents. For example, in Prometheus a system overview diagram is used
to capture the system’s (static) structure. Figure 15.9 shows the system overview
model for the holonic manufacturing example.4 It shows the agent types, which
agents perform which actions, which agent handles the manufacture percept, and
which agents communicate with which other agents. In many methodologies com-
munication pathways are shown in terms of individual messages between agents,
or just as links between agents that communicate. In the system overview diagram
in Figure 15.9 we use a number of interaction protocol nodes (e.g., “AddPart”),
which hide a number of messages.
To summarize, the two key models that result from the design phase are:
1. Some type of overview model, which shows the (static) structure of the sys-
tem, including the agent types, communication paths between them, and
shared data (if any). In some methodologies (e.g., Prometheus and O-
MaSE), the overview model also includes the interface between the system
and its environment.
2. Some sort of model of the dynamic behavior of the system. Most often,
4 Although there have been limited attempts to standardize notation, there is no widely accepted
standard. Consequently, we have used a set of simple graphical symbols that are not specific to
any given methodology.
Chapter 15 717
6 Detailed Design
We now need to describe the internals of agents, using as a basis the static and
dynamic behavior of the system as a whole, as captured in the system overview
diagram and the protocols. That is, we need to specify how agents operate to
achieve their goals, and how they respond to messages in order to achieve the
desired interactions. The aim of the detailed design phase is to define the internal
structure of agents in sufficient detail to allow implementation to be done.
In order to define the details of agent internals in a way that supports im-
plementation of the agents, it is necessary to know the kind of implementation
platform that will be used. Will the agents be implemented in terms of finite-state
machines, Petri nets, collections of event-triggered plans, or JADE behaviors?
The more that is known about the implementation platform, the more detailed and
specific the design can be. For example, both Tropos and Prometheus assume that
agents will be implemented in terms of event-triggered plans, making them well
suited to BDI agent platforms [108] such as JACK [18, 126], Jason [12], Jadex
[103], and others. On the other hand, O-MaSE [38] assumes that agent inter-
nals are defined in terms of finite-state machines, and uses finite-state automata
718 Chapter 15
Fasten
Robot1 Robot2 Table
fasten(jig)
ref Lock(R2,jigW)
<join(jig)>
fastened(jig) unlock
to model the internal behavior of agents. Similarly, PASSI [29] uses activity dia-
grams or state charts to describe the behavior of individual agents.
In this section we will describe (briefly) the basic idea of how detailed design
is done, and then illustrate it using the holonic manufacturing scenario. We will
consider two cases: first, where the agents are implemented using a BDI program-
ming language (see Chapter 13); and second, where agents’ behaviors are defined
in terms of a finite-state machine.
Regardless of the agent architecture that is used, the process of detailed de-
sign begins with the interface of each agent. In the design phase, each agent was
defined as being able to receive and send certain messages, to perform certain
actions, to deal with certain percepts, and to realize certain goals. These are the
starting points for detailed design. For example, consider Robot1 in the overview
diagram (Figure 15.9). We see that it participates in the AddPart protocol (Fig-
ure 15.8), the Lock protocol (Figure 15.6), and the Fasten protocol (Figure 15.10);
and that it sends and receives messages to and from Robot2, the FlipperRobot,
and the Table. We also know that it has actions load, unload, moveToFlipper, and
moveFromFlipper, and that it receives a percept manufacture(composite) (or if we
have a coordinator managing whole orders, this may be a message). We also know
(see page 711) that it has goals loadPart, decideParts, decideNext, and flipOver.
This implies (from the goal hierarchy, Figure 15.5) that it participates in:
• the goal build2 by achieving subgoals decideParts and loadPart (the latter
of which requires the action “load”)
• the goal “complete” by achieving the subgoal “assess” (as well as doing the
action “unload”)
We focus here on the part Robot1 plays in the manufacturePart goal, although
in general there may be multiple high-level goals an agent participates in, some
of which may share subgoals. Each subgoal represents an identified chunk of ac-
tivity, which will need to be triggered either by an internally posted goal, by a
percept from the environment, or by a message from another agent. Both the sce-
narios and the protocols provide guidance about the ordering of activities, which
also inform the detailed design.
Just as for earlier stages, we need some notation for capturing and commu-
nicating our detailed design. As discussed earlier, which notation is appropriate
depends on the underlying agent architecture that will be used to implement the
agents. If the design notation and the implementation approach are well aligned,
then implementation maps each design entity to an implementation entity. For
example, when implementing a UML class diagram using an object-oriented lan-
guage, each UML class is mapped to a class in the programming language. Simi-
larly, if the design of an agent’s internals is done in terms of event-triggered plans,
and the implementation is done using a BDI agent programming language, then
each design entity (e.g., event or plan) is mapped to a corresponding implementa-
tion entity.
manufacture
>manufacture<
PartPlan Key:
event
add compl
build2
Part ete >percept<
<action>
complete plan
build2Plan addPartPlan
Plan
manufacture
>manufacture<
PartPlan Key:
event
add compl
build2
Part ete >percept<
<action>
fasten(jig)
complete plan
build2Plan unlock addPartPlan
Plan
message
<load(X)>
decide
Parts
lock-at
decideParts
Plan
7 We note that if looking at the goal hierarchy alone, we would not add moveToFlipper and
moveFromFlipper here, but rather in the flipOverPlan. However, the protocol is a more precise
description, and there we see that in fact these actions need to happen outside the flipOverPlan in
order to obtain the desired parallelism of flipping and load actions.
722 Chapter 15
>manufacture<
manufacture
PartPlan
Key:
event
add compl
build2 >percept<
fasten(jig) Part ete
<action>
<load(X)>
<moveFromFlipper>
plan
build2Plan unlock addPartPlan
<moveToFlipper>
message
<unload>
decide complete
Parts Plan
decide flip
Next Over
decideParts
Plan
flipRequest
decideNext
flipOverPlan
Plan
flipped
lock-at
Figure 15.13: Developing the addPartPlan and completePlan for Robot1 internals.
We now proceed to finalize the design. We need to check to ensure that all mes-
sages that Robot1 can expect to receive are handled properly, and that it generates
all messages required of it by the protocols. The messages sent by Robot1 are
lock-at, fasten, flipRequest, and unlock, whereas those received are lockFailed,
locked-at, fastened, and flipped. We see that all sent messages are already present
in Figure 15.13, but that the received messages lockFailed, locked-at, and fastened
are not present.
The locked-at and lockFailed messages are not yet shown in Figure 15.13. We
decide to use a modularization construct, often referred to as a capability, to ab-
stract out the process of potentially receiving lockFailed messages and retrying
lock-at until receiving a successful locked-at message. Because designs are typi-
cally much larger than our simple example, some ability to package related plans
and events into abstract entities is important for understandability. These mod-
ules/packages/capabilities may often map back to the roles of the requirements
phase. In Figure 15.14, we see that the lock-at message has now become an in-
ternal subgoal to be handled by this capability, which will send and receive the
relevant messages (not shown in Figure 15.14). Capabilities must then have their
internal structure defined in a similar way to an agent. Capabilities can be nested
within each other as required.
Chapter 15 723
>manufacture<
manufacture
PartPlan
fastened(jig) Key:
event
continue
add
build2 >percept<
fasten(jig) Part
<action>
<load(X)> <moveFromFlipper>
plan
build2Plan unlock addPartPlan
<moveToFlipper>
message
<unload>
decide complete
Parts Plan
decide flip
Next Over
decideParts
Plan
flipRequest
decideNext
flipOverPlan
Plan
lockCapability
flipped
lock-at
(jig,pos)
We also do not yet have the fastened message shown, so we add this to Fig-
ure 15.14 as an incoming message, and consider what should happen when this
is received. Looking at our ManufacturePart protocol, we see that this can be re-
ceived either when we have joined our first two parts, or after we have added a
part. In both cases what we want to do is ascertain whether the part is now com-
plete (in which case we would do completePlan), or whether we need to add a
further part (i.e., generate the subgoal addPart). We add a plan called “continue,”
to allow us to generate the addPart subgoal in the latter case. We have already
provided a context condition that will make completePlan applicable only if the
part is completed. We should then add a context condition to the “continue” plan,
to ensure that it is applicable only when the part still requires additional compo-
nents. In this way, the context conditions of the two plans relevant for responding
to the fastened message will be chosen appropriately, depending on the situation.
If our current composite matches the part to be made, we will choose the com-
pletePlan. If not, we choose the continue plan and repeat the process of adding a
suitable part. This illustrates a situation in which an event (in this case a message
event) can have multiple plans potentially responding to it, with selection gov-
erned by the context condition. We can now see the completed design for Robot1
in Figure 15.14.
724 Chapter 15
The ability of a given event to be handled by more than one plan is one of the
strengths of the BDI architecture. This can be used to readily add additional flexi-
bility into the system. For example, in our system, we may sometimes need to flip
a composite part before adding to it, whereas in other cases we may not. Let us as-
sume now that we are manufacturing a CABD part, and that we already have built
AB. The Robot1 agent now has a choice for the next part of the process: to simply
move away AB, load D beneath it, and then arrange joining (yielding ABD), or,
alternatively, to move away and flip AB (giving BA), load C, and again arrange
joining (yielding BAC, which can be flipped to yield CAB). This decision may be
driven by the immediate availability of C and D parts to load. We can envisage two
alternative plans to add a part: SimpleAdd and FlipAndAdd. SimpleAdd would
have the context condition that the bottom part is needed and available, and would
simply move the composite aside, load the bottom part, then move the compos-
ite back and request a fasten. FlipAndAdd would have the context condition that
the top part is needed and available and would include requesting a flip. If both
plans are applicable, a default could be used – perhaps having the SimpleAdd used
first on the basis that the less that has to be done, the less there is to go wrong.
Alternatively, further reasoning could be done to choose between the alternatives.
In addition to facilitating modular encoding of variations for different situa-
tions, the use of multiple plans to respond to an event also facilitates robustness
when coupled with a failure recovery mechanism that is common in most BDI
platforms. In the above example, suppose that we have selected SimpleAdd, but
while obtaining a lock, some other process has taken the required C part, and
when we go to do load(C) the action fails as there is no C part available. At this
point, rather than having to fail the whole process, we could potentially select the
FlipAndAdd plan, in order to achieve the goal of AddPart in an alternative way.8
One can also envision a situation in which at a higher level it is not possible
to add the next part in building a particular composite (perhaps no next part is
available). Rather than stalling the system and waiting for availability, we could
add a plan to deal with this situation. Such a plan could temporarily remove the
partially built composite and watch for the availability of the required part in order
to continue. These kinds of plans to provide additional robustness are often added
after the core functionality is built.
8 Implementation would need to address the fact that if the composite part had already been
moved to the flipper, then the goal to move the composite part succeeds without any action re-
quired. This is straightforward to do in a clean goal-oriented way. The goal is for the composite
to be at the flipper. If that is already achieved, then the goal succeeds trivially.
Chapter 15 725
Typically, each agent will have a single finite-state automaton, which describes its
behavior. For example, the behavior of the Robot1 agent can be described by the
automaton of Figure 15.15.
Figure 15.15 was derived from the interaction protocols by identifying the
possible states of the interaction (corresponding to the gaps between messages in
the protocol) and considering them as states of the system. Messages correspond
to transitions between interaction states. For the finite-state machine (FSM) of a
given agent, we compress interactions that do not involve that agent. For example,
Figure 15.15 describes the behavior of Robot1, and so the interaction between
Robot2 and the Table (to Lock the Table, as part of the Fasten subprotocol) is not
shown.
One issue that arose in this example is that whereas the AUML notation sup-
ports subprotocols (such as “Lock"), the FSM notation used by O-MaSE does not
support subprotocols. This means that subprotocols need to be expanded in the
FSM. However, in this case, this would result in multiple copies of the Lock pro-
tocol. In order to avoid a needlessly complex FSM, the protocol was modified
by lifting out the Lock protocol, and positioning it as a common prefix, which is
then followed by either loading two parts, adding a part, or unloading the final
product – where loading two parts as well as adding a part are both followed by
fastening the parts together. This gives slightly different possible behaviors to the
original interaction protocol. For instance, it does not require that the initial load-
ing (“load(A)” and “load(B)”) only occur at the start of the interaction. However,
726 Chapter 15
receive(manufacture(part))
receive(lockFailed)
receive(locked-at)
Wait
/send(lock-at(Jig,Pos))
Assess
receive(fastened(jig))
moveToFlipper
Wait unload
load(A)
/send(flipRequest)
/send(fasten(jig))
/send(unlock)
load(C) Wait
load(B) receive(flipped)
moveFromFlipper
/send(unlock)
although the FSM permits the agent to load(A) and load(B) in the middle of a
manufacturing process, in practice the Assess condition would be defined in such
a way as to avoid this from occurring.
plementation must ensure proper behavior, even when planned sequences of (in-
ter)actions are interleaved with the parallel pursuit of other tasks (or additional
instances of the same task). It is left as an exercise for the reader to ensure that
two instances of the ManufacturePart protocol can in fact be interleaved, without
causing problems.
As in previous sections, having described the common core of the various
methodologies, we now briefly touch on a few interesting differences. One feature
that is unique to MaSE (but that does not appear to have been retained in O-MaSE)
is the use of a deployment diagram to capture the run-time location of agents.
Another difference that has been mentioned earlier is that methodologies differ
in the notations used to capture behaviors (e.g., informal pseudocode vs. UML
activity diagram).
7 Implementation
While work in AOSE tends to focus on software engineering aspects (e.g., require-
ments, design), it is clearly necessary for a design to be implemented, and, there-
fore, there is a close relationship between AOSE and the related field of agent-
oriented programming languages (and other support tools for implementing agent
systems).
The result of detailed design is intended to be easily mapped to an implementa-
tion. Clearly, there needs to be an alignment between the type of implementation
platform used and the implementation platform type assumed by the methodol-
ogy. For example, if a methodology’s detailed design phase assumes a BDI-style
implementation, then the results of the detailed design will be expressed in terms
of agents that have event-triggered plans, and these will map naturally to a BDI-
style implementation platform (see Chapter 13). On the other hand, if the detailed
design phase assumes agents that are message-exchanging black boxes, and spec-
ifies the behavior of each agent using, say, a finite-state machine, then the results
of the detailed design will map more naturally to an agent platform such as JADE
[6].
Mapping design to implementation is generally done manually, with some as-
sistance from tools in specifying skeleton code, which is then fleshed out. This is
similar to the way that a UML design is mapped to an object-oriented language,
resulting in class definitions, where the internal details of methods need to be com-
pleted. Several of the tools associated with AOSE methodologies provide support
for implementation in the form of production of skeleton code. Specifically, the
PDT tool (supporting Prometheus) can generate JACK code, the TAOM4E tool
(supporting Tropos) can generate Jadex code, the IDK tool (supporting INGE-
NIAS) can generate JADE code, agentTool III (supporting O-MaSE) can generate
728 Chapter 15
JADE code, and the PTK tool (supporting PASSI) can generate JADE code using
AgentFactory [29]. However, there is still room for improvement in tool support
for the transition from design to implementation [9], and one area in particular
is to support “round trip” engineering, where changes to generated code can be
reflected in the design. This is partially supported by PDT, but, as far as we are
aware, not by any other tools.9
One approach to improving the link between design and implementation is
to use model-driven development [84] of agent systems (e.g., [75, 100]). The
aim is to produce a complete executable system directly from the design model.
This requires that the design models are specified with enough detail that this
is possible. The benefits of this approach are that the implementation phase is
eliminated and that it is no longer possible for the implementation and design
to diverge over time. However, the cost is that the design must contain a lot of
additional detail, which makes it harder to develop: in effect the implementation
work is shifted into the detailed design phase. Furthermore, the model is more
complex, which tends to reduce its value as a means of understanding the system.
Nevertheless for some systems and user targets, this can be a good approach. The
work of Jayatilleke et al. [75] showed that this approach could facilitate the work
of domain experts who are not programmers in extending and refining a system
through the design models.
8 Assurance
Whereas support for the “core” activities of specification, design, and implemen-
tation is now well developed in AOSE methodologies, support for other activities,
such as testing, debugging, and software maintenance, is less well developed. This
section briefly summarizes the state of the art in the assurance of agent systems:
how can we be confident that a developed multiagent system meets its specifica-
tion? This is usually accomplished through testing and debugging (Section 8.1).
However, due to the characteristics of agent systems, it has been argued that test-
ing (at least if done manually) is not likely to be adequate, and so there is also work
on assurance of agent systems using formal methods, which we briefly survey in
Section 8.2. The topic of support for the ongoing maintenance and modification
of agent software is covered in the next section (Section 9).
Much of the work on testing and debugging takes advantage of the existence of
structured design models. These models can potentially be used in a comprehen-
sive approach to support testing of agent systems (ideally in an automated way).
9 INGENIAS provides support for repeated generation of code from the design, but not for code
For example, it is possible to monitor the execution of an agent system and auto-
matically detect behaviors that contradict information in the design models, such
as two agents communicating when they are not supposed to do so. An additional
advantage of using design models in testing is that this helps to ensure that the
design models and the code remain in sync, i.e., it assists in avoiding divergence
between design and code.
can readily be executed, and has been shown to isolate complex sporadically oc-
curring bugs in existing programs, which had defied manual debugging [138].
As the entire process is automated,11 an oracle is required to determine whether
output produced is correct or not. The detailed design models produced by the
methodology are used for this purpose [139]. Prometheus also has preliminary
work on testing system requirements or acceptance testing, using scenario specifi-
cations [119]. For a scenario to pass the acceptance test, all test cases must result
in some valid sequence of inputs and outputs according to the scenario definition.
An important question in any testing regime is whether the test cases have
adequately tested the system. A common testing concept here is that of “cover-
age.” In traditional systems it is common to consider “statement coverage” (each
statement is executed at least once), “branch coverage” (each branch is executed
at least once), and “path coverage” (each combination of branches is executed). In
general, path coverage is not possible due to loops and recursion, though in prac-
tice these are capped. Low et al. [83] propose a representation of plans in terms
of arcs and nodes, and then provide some coverage criteria that involve these arcs
and nodes, as well as the plans. They explore the subsumption hierarchy of the
coverage criteria, showing that this is more complex than for traditional cover-
age where path coverage subsumes branch coverage, which subsumes statement
coverage.
Zheng and Alagar [140] also explore coverage criteria for agent system testing,
although this is not implemented, and appears to be based on a finite-state machine
representation, which would involve a computational explosion for many agent
systems, making it impractical in practice.
The notion of coverage is further explored by Miller et al. [85], where it is
applied to the testing of agent interactions. In this approach the simplest level of
coverage is one in which each message in a protocol is sent and received. The
most complete is plan-path coverage, which addresses the notion of paths that
traverse different plans to produce the messages, and ensures that each such path
is traversed. To our knowledge, none of the implemented agent testing tools ex-
plicitly use notions of coverage, although the Prometheus unit testing implicitly
includes some coverage notions in that it provides warnings where sets of test
cases do not show generation of an event, or execution of a plan, which would be
expected at some point according to the design.
In summary, all the major current methodologies include some support for
testing within their tools, with the exception of O-MaSE. However this is primarily
support for automated execution (often using JUnit) with test cases; the correct
results need to be manually specified. Only Prometheus and Tropos have tools
for fully automated testing process, and in both cases these are limited to specific
11 It is also possible to manually add test cases if desired.
732 Chapter 15
aspects: in the case of Prometheus, that of unit testing, and in the case of Tropos,
that of receipt of messages. Both do allow execution of a very large number of test
cases in the particular testing subspace. Clearly, testing of agent-based systems is
an area where there is a need for substantial additional work.
agent systems. Another more recent example is the work of Alechina et al. [1, 2,
3], which takes agent programs written in a simple agent language (SimpleAPL)
and translates them into propositional dynamic logic. These translations, along
with the agent’s starting state, can be used to prove safety and liveness properties
using an existing PDL theorem prover. Their key contribution is the ability to
model the agent’s execution strategy, and prove properties that may rely on a
given execution strategy (e.g., interleaved vs. non-interleaved execution of plans).
The approach appears to have only been applied to toy programs (single agent, a
couple of plans, and propositional beliefs).
To summarize, current work on formal methods for verifying agent systems
(see [35]) tends to be applicable to toy programs only. We would expect the
efficiency of tools to improve over time. However, the logics that tend to be used
for stating desired properties tend to have a high computational complexity for
model checking (see Chapter 14). Theorem proving is, in general, less promising
than model checking, since it is less amenable to full automation. Additionally,
model checking has a significant advantage in that it provides a counterexample
to a failed property [27].
Finally, all formal verification work is concerned with showing that all execu-
tions of an agent program P will satisfy a specification ϕ. One issue that needs
to be considered is – where does the specification ϕ come from? In particular, it
is possible for a specification to be incomplete. Indeed, it has been argued that
it is quite likely that specifications are incomplete due to assumptions about the
interface between software and its environment [76], or due to easily-made im-
plicit assumptions about the execution model that allow proofs to be incorrect.
For example, Jackson [72, p. 87] describes a bug that was found in a widely-used
implementation of a binary search algorithm (which had been proved correct).
Winikoff [128] presents a proof of correctness for a simple waste disposal robot,
and then goes on to describe various errors in the simple program that exist despite
it having been proved to be correct!
More broadly, it has been argued that “Problems with requirements and us-
ability dwarf the problems of bugs in code, suggesting that the emphasis on cod-
ing practices and tools, both in academia and industry, may be mistaken” [72,
pp. 86–87]. Based on these observations, a number of authors (e.g., [72, 111])
have argued for the use of safety cases to provide direct end-to-end evidence of
correctness, that is, an end-to-end argument that provides evidence that the system
exhibits desired properties [72], and which establishes which (explicit) assump-
tions need to be made in order for certain properties to follow from the system.
Jackson also argues that properties should be expressed in real-world terms rather
than in software terms. For example, specifying a safety property in terms of the
radiation dose received by a patient, rather than in terms of the software comput-
734 Chapter 15
ing a correct dose. These ideas have yet to be explored in the context of agent
systems.
Quality assurance is unequivocably a very important issue, especially in com-
plex systems, which is often where an agent paradigm gives the most benefit. Con-
sequently, a broad range of techniques that contribute to this endeavor – from au-
tomated testing to formal verification, as well as approaches such as safety cases –
are likely to be of value.
9 Software Maintenance
Once software has been designed, implemented, assured, and deployed, it is sub-
ject to ongoing maintenance to fix bugs (“corrective maintenance”), adapt to
changes in the application’s environment (“adaptive maintenance”), or adapt to
changes in user requirements (“perfective maintenance”). The ongoing mainte-
nance of existing software is an important issue, since maintenance activities can
account for the majority of the costs of software (as much as two-thirds [124]).
Software maintenance (also termed “software evolution”) is an active area of re-
search for non-agent-based software.
The agent-oriented approach to design and implementation is inherently ad-
vantageous for maintenance, especially adaptive or perfective maintenance, due
to its modular nature. This has been demonstrated to some extent by Bartish and
Thevathayan [5] and by Jayatilleke et al. [75]. The former compared the effort
and amount of code required to extend and modify a game implemented as both
a finite-state machine and as an agent system, finding that the agent system was
much more efficient. The latter extended an aircraft weather alerting system based
on temperature and wind predictions to include alerts involving volcanic ash. This
extension was built and integrated extremely efficiently and with little modifica-
tion required to existing code.
However, little has been done specifically on software maintenance of agent-
based software. Perhaps due to the relative youth of the field, there has been very
little work on software maintenance of agent systems. In fact, the only work we
are aware of on software maintenance for agent systems is by Dam et al. [32, 33].
Dam et al. deal with change propagation in design models. Change prop-
agation is concerned with the issue that making changes to a software system
involves making some initial changes, but these initial changes almost invariably
have consequences, and require additional secondary changes to be made. Change
propagation tools aim to support the software developer in identifying and making
these secondary changes.
The approach used by Dam et al. is to focus on design models, rather than
code, and to define consistency constraints (expressed in OCL) which capture
Chapter 15 735
consistency requirements of the design – for example, that when a message type
is defined to be received by a certain agent type, then that agent type must have a
plan that deals with the message type. Change propagation is then performed by
repairing violations of these constraints, caused by primary changes. For example,
creating a new message and linking it to an agent violates the constraint above,
and a possible secondary change is to define a new plan in the agent, and have this
new plan handle the new message type.
The framework and techniques proposed by Dam et al. are generic, and have
been applied to both agent-oriented designs (Prometheus) and object-oriented de-
signs (UML). Interestingly, the underlying change propagation engine uses ab-
stract repair plans expressed as BDI event-triggered plans. These plans are derived
directly from the OCL constraints, and are provably sound and complete.
Evaluation has shown that the approach is effective in performing a significant
proportion (around two-thirds) of secondary changes, including some that would
be likely to be missed because they concern parts of the design that would not
normally be considered by the designer (e.g., updating analysis-related design
artifacts when making changes to the detailed design). In terms of efficiency
and scalability, the approach is viable for small to medium design models, but
further opportunities for efficiency improvement exist, since constraint checking
is a bottleneck, and there exist known techniques for making constraint checking
“instant” [48].
10 Comparing Methodologies
As noted in Section 1.1, the early years of work on agent-oriented software engi-
neering saw the development of a large number of methodologies. This raised a
question: given many competing methodologies, how could one select a method-
ology to use? This question saw the appearance of a body of work that focused on
comparing different methodologies. Roughly speaking, this body of work devel-
oped around 2001–2003 [22, 34, 115, 118]. By the middle of the decade the area
of AOSE methodologies had begun to consolidate, resulting in fewer method-
ologies being serious contenders for adoption, and therefore there was reduced
interest in comparing methodologies, although there was some continued work
(e.g., [121]).
All of the work in this area adopted a feature-based comparison approach.
This approach compares methodologies by first identifying a set of features (e.g.,
[117]), such as which phases the methodology covered, whether it supported the
design of proactive agents, whether it provided good traceability support, whether
its notation was intuitive, and whether the methodology was well-documented.
Each methodology being compared was then evaluated against the (typically fairly
736 Chapter 15
11 Conclusions
The bulk of this chapter was devoted to describing the state of the art in agent-
oriented software engineering. It is now time to look ahead and try to answer
the question: What’s next? What are the current areas of research? The big
Chapter 15 737
challenges?
We consider two broad areas for future work: those that require research, and
those that address hurdles to wider adoption but are not research topics as such.
One area where we believe there is an urgent need for research is in under-
standing the benefits of the agent paradigm. Although agent technologies have
been around for a while now, our understanding of the benefits that they offer, and
of the context in which these benefits may be realized, is still, in essence, a col-
lection of well-documented anecdotes in the form of case studies (e.g., [88, 105]).
What is largely missing is measurement of the costs and benefits of agent tech-
nologies. Although some early results have been reported by Benfield et al. [7],
they lack detail, and are only a small number of data points. Another possible area
for work is conducting experiments comparing the development of systems using
agent technologies, and using traditional approaches, along the lines of Prechelt’s
work comparing programming languages [104].
Another area for research concerns designing flexible interactions. As was dis-
cussed earlier in this chapter, most methodologies use interaction protocols that
are message-centric (using a notation such as AUML’s sequence diagrams). The
issue is that this approach is not a good match for agent systems: message-centric
interaction protocols define precisely and explicitly the legal message sequences,
and tend to result in overly prescriptive designs that do not always support max-
imum flexibility, robustness, or autonomy [26]. There is ongoing work on de-
veloping alternative methods and models for designing agent interaction (e.g.,
[51, 52, 79, 80, 81, 134, 135], and see Chapter 3), but at present none of the ap-
proaches proposed have been widely accepted, and it could be argued that they
are not yet ready for use on real systems.
More broadly, it has been observed [45] that there are different types of agent
systems, and that in fact many of the well-known methodologies have focused
on supporting the design of particular types of systems, typically those that are
relatively closed, and which involve coarse-grained agents with a certain degree
of cognitive capabilities (such as BDI agents). However, there are three types of
agent systems that are not well-supported by current methodologies.
One such type of system is one in which there is a need to model a complex
organizational structure, which may change during execution. There has been
considerable work in recent years on organizational modeling (see Chapter 2),
and this is gradually finding its way into methodologies (e.g., [42]).
Another type of system which is not well supported by current AOSE work
is one involving large numbers of very simple agents that rely on emergence of
desired behavior. Although there has been some work in this area (e.g., [98]), one
issue is that the work on designing emergent agent systems is proceeding inde-
pendently of work on the design of cognitive agent systems. However, ultimately,
738 Chapter 15
what are needed are integrated methodologies that can support the design of sys-
tems with both emergent and cognitive aspects [99].
A third type of system is one in which agents are developed by different people
and organizations, and in which agents join an “open” society of agents. Again,
there has been some work on methodologies to support the design of open agent
systems (e.g., [4, 116]), but more work is needed.
Thus, a significant longer-term challenge for the AOSE community is to de-
velop methodologies (along with tools) that support the engineering of a wider
range of types of agent systems, including emergent systems, open systems, and
systems with complex and dynamic organizational aspects.
Finally, as noted in Section 8.2, obtaining assurance that an agent system will
never malfunction is a significant challenge, given that agent systems are en-
gineered to be adaptive, and to exhibit a complex range of possible behaviors.
Munroe et al. [88, section 3.7.2] note that “. . . validation through extensive tests
was mandatory” but go on to observe that “. . . the task proved challenging for
several reasons. First, agent-based systems explore realms of behavior outside
people’s expectations and often yield surprises.” Similarly, Hall et al. identify the
question of how “. . . the aggregate behavior of the agent-based system be guar-
anteed to meet all the system requirements?” [59] as being a key obstacle to the
adoption of agent technology in manufacturing.
As argued earlier, the complex range of behaviors that can be exhibited by
agent systems makes manual testing an ineffective means of obtaining assurance,
and so there is a need for research in this area. There are a number of approaches
to be explored, including (fully) automated testing, formal verification, the use
of safety cases, and end-to-end evidence. The issue of assurance is probably one
of the most crucial areas for further development if multiagent systems are to be
extensively used by industry. While there is some substantial work in some areas,
much remains to be done to ensure that agent systems can be extensively and
easily tested, giving some reasonable assurance of robustness in the face of the
myriad of potential executions possible.
The areas of work described next are not major research challenges, but pri-
marily involve the engineering, development, and integration of software (such as
design tools). However, the following challenges are significant, and do need to
be addressed in order to enable wider adoption of agent technologies.
One area of work concerns standardization. As noted earlier, the number of
viable methodologies competing for adoption has shrunk in recent years, but there
are still a number of methodologies; and while these methodologies are in some
ways quite similar, there are still differences, some of which are significant, and
others merely gratuitous. One area of work, therefore, is to work toward the de-
velopment of a standardized methodology [39, 60, 96]. It is recognized that in
Chapter 15 739
general there won’t be a single methodology that will suit all users, but it should
be possible to define a core methodology along with a number of variations or
customizations for different purposes and settings.
One approach that needs to be mentioned with regard to standardization is
method engineering [61]. Briefly, the idea is that methodologies are broken up
into “fragments,” which describe either a process or a product resulting from car-
rying out a design process. These fragments are captured within a framework that
includes a metamodel (e.g., SPEM12 ) defining the notions of work product, work
process, etc. The key idea of method engineering is that a methodology instance is
created by assembling it from a collection of fragments, held in a repository. We
believe that there may be contexts in which method engineering is valuable and
applicable, but we argue that in many cases, it is more appropriate to begin with
a complete methodology and customize it as needed, rather than beginning with
a repository of fragments. This is because assembling an effective and sensible
methodology out of fragments is not only a significant effort, but doing so requires
expertise and experience in AOSE development. This explains why method engi-
neering has not seen widespread adoption [50]. For further information on method
engineering in an AOSE context, see the recent survey by Cossentino et al. [30].
The wider industrial adoption of agent technologies is also hindered by a num-
ber of other factors [21, 55, 56, 125, 127]. One factor is that mainstream (including
object-oriented) practices, standards, and tools need to be integrated13 with agent-
oriented practices, standards, and tools. Companies and software engineers have
a great deal invested in current (especially object-oriented) approaches to design
and development. The more that agent-oriented approaches can leverage this ex-
isting expertise and be consistent with it when possible – while being clear about
where agent-oriented design is unique and different – the more likely it is that in-
dustry will be willing to use and invest in agent technology. Also, real applications
typically are not a standalone agent system, but involve a multiagent system that is
part of a broader system, which includes object-oriented software, databases, and
other components. So there is a real need for tools and methodologies that incor-
porate both agent and non-agent aspects. This line of work is arguably best driven
by companies, rather than by academic researchers. However, if researchers can
clearly articulate the value proposition of agent technologies, then companies may
be more likely to invest in the integration of these technologies with existing tool
sets and approaches.
12 Exercises
1. Level 1 Extend and modify the presented design for the cell system to in-
clude a coordinator agent that will make the decision as to which part should
be manufactured by the cell. This decision should be based on prioritizing
orders that are highest priority. Orders arrive dynamically, and if a higher
priority order comes in, the coordinator should shift to manufacturing parts
for that order, once parts currently in production on the cell are completed.
2. Level 1 Design a smart printer system that will monitor paper levels in
printers, sending a message to refill as needed; and intelligently re-route
jobs to a nearby printer if a given printer is overloaded, out of paper, or not
functioning.
6. Level 2 Extend the holonic manufacturing example with the ability to re-
cover from failures, such as an attempt to join two parts failing, or the table
jamming. Assume that when an action fails, the failure is reported to the
system.
8. Level 3 Find a way of convincing a skeptical but rational critic that your
holonic manufacturing implementation will never misbehave, i.e., that the
table never moves while loading, unloading, or joining are being done. Are
the synchronization constraints sufficient?
9. Level 2 Consider the following version of the Lock protocol. If this proto-
col is used instead of the one in Figure 15.6 (on page 713), is it possible for
a rotation to occur at the same time as another action?
Chapter 15 741
Lock(Jig)
robot table
lock-at(Jig, Pos)
<rotateTo(Jig,Pos)>
alt
locked-at
lockFailed
10. Level 2 Assume that we add the possibility of resting a partially finished
composite piece by having the unload() action do either an unload(bin) or
an unload(shelf) action. The load(part) action can then load a composite part
(from the shelf) as well as single parts. How will this change the design?
Extend the existing design, making as few changes as possible to what is
there, to allow this extension. Include in your design some mechanism for
deciding when it makes sense to do this.
11. Level 3 Expand the simple cell design discussed in this chapter to capture
a hierarchical organization of agents, where the bottom level contains the
agents described in this chapter that control specific machines; then there
would be a layer of cell coordinators that each manage their cell, and, above
this, a supervisor agent that does the prioritization of orders. Investigate
some of the approaches to team and organizational structures, and suggest
how the basic methodology (process and notation) described in this chapter
could be extended to incorporate the development of organizational or team
design.
12. Level 3 Develop a smart meeting scheduler that schedules meetings be-
tween participants based on their availability and the availability of rooms.
The system should be able to automatically reschedule a meeting if a pre-
viously booked meeting room or participant is subsequently required for a
more important meeting, and if the new, more important meeting cannot
be scheduled within the required timeframe without modifying some other
742 Chapter 15
13. Level 4 Design a multiagent system for managing the logistics of a courier
system that must continuously take in and assign jobs in an efficient manner,
where jobs have a pickup and delivery location, and also a priority level of
1, 2, or 3. Compare the resulting approach with state-of-the-art approaches
within logistics.
Example Protocol
User System
Query
Response
The primary way that AUML allows alternatives, parallelism, etc., to be spec-
ified is using boxes. A box is a region within the sequence diagram that contains
messages and may contain nested boxes. Each box has a tag that describes the
Chapter 15 743
type of box (for example, Alternative, Parallel, Option, etc.). A box can affect the
interpretation of its contents in a range of ways, depending on its type. For exam-
ple, an Option box (abbreviated “opt”) indicates that its contents may be executed
as normal, or may not be executed at all (in which case the interpretation of the
sequence diagram continues after the Option box).
Whether or not the box is executed can be specified by guards. A guard,
denoted by text in square brackets, indicates a condition that must be true in order
for the Option box to execute.
Most box types can be divided into regions, indicated by heavy horizontal
dashed lines. For example, an Alternative box (abbreviated “alt”) can have a num-
ber of regions (each with its own guard) and exactly one region will be executed.
The example below shows an example of nested boxes. The Option box indicates
that nothing happens if the system is not operational. If the system is operational,
then we have two alternatives (separated by a horizontal heavy dashed line). The
first alternative is that the System sends the user a Response message. The second
alternative is that the System indicates that the user’s query was not understood.
sd Example Protocol
User System
Query
opt
[System Operational]
alt
Response
Not Understood
The following are some of the box types that are defined by AUML (and in-
clude all of the box types that we use):
• Alternative: Specifies that one of the box’s regions occurs. One of the
regions may have “else” as the guard.
• Option: Can only have a single region. Specifies that this region may or
may not occur.
• Parallel: Specifies that each of the regions takes place simultaneously and
the sequence of messages is interleaved.
744 Chapter 15
• Loop: Can only have a single region. Specifies that the region is repeated
some number of times. The tag gives the type (“Loop”) and also an indica-
tion of the number of repetitions, which can be a fixed number (or a range)
or a Boolean condition.
• Ref: This box type is a little different in that it doesn’t contain subboxes
or messages. Instead, it contains the name of another protocol. This is
basically a form of procedure call – the interpretation of the Ref box is
obtained by replacing it with the protocol it refers to.
Finally, in our use of the AUML sequence diagram notation to capture inter-
action protocols, we find it useful to be able to indicate the places in a protocol
where actions are performed. Since the AUML sequence diagram notation does
not provide a way to do this, we have extended the notation to do so. We place an
action indicator, the name of the action in angled brackets (“<action>”), on an
agent’s lifeline to indicate that the agent performs the action.
References
[1] N. Alechina, M. Dastani, B.S. Logan, and J.-J. Ch. Meyer. A logic of agent pro-
grams. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intel-
ligence (AAAI), pages 795–800, 2007.
[2] N. Alechina, M. Dastani, B.S. Logan, and J.-J. Ch. Meyer. Reasoning about agent
deliberation. In Gerhard Brewka and Jérôme Lang, editors, Proceedings, Eleventh
International Conference on Principles of Knowledge Representation and Reason-
ing, pages 16–26, 2008.
[3] Natasha Alechina, Mehdi Dastani, Brian Logan, and John-Jules Ch. Meyer. Rea-
soning about agent execution strategies (short paper). In Autonomous Agents and
Multi-Agent Systems (AAMAS), pages 1455–1458, 2008.
[4] Josep Lluís Arcos, Marc Esteva, Pablo Noriega, Juan A. Rodríguez-Aguilar, and
Carles Sierra. Engineering open environments with electronic institutions. Eng.
Appl. of AI, 18(2):191–204, 2005.
[5] Arran Bartish and Charles Thevathayan. BDI agents for game development. In AA-
MAS ’02: Proceedings of the First International Joint Conference on Autonomous
Agents and Multiagent Systems, pages 668–669, 2002.
[6] Fabio Luigi Bellifemine, Giovanni Caire, and Dominic Greenwood. Developing
multi-agent systems with JADE. Wiley, 2007.
Chapter 15 745
[7] Steve S. Benfield, Jim Hendrickson, and Daniel Galanti. Making a strong busi-
ness case for multiagent technology. In Peter Stone and Gerhard Weiss, editors,
Autonomous Agents and Multi-Agent Systems (AAMAS), pages 10–15. ACM Press,
2006.
[8] Federico Bergenti, Marie-Pierre Gleizes, and Franco Zambonelli, editors. Method-
ologies and Software Engineering for Agent Systems. Kluwer Academic Publishing
(New York), 2004.
[9] Rafael Bordini, Mehdi Dastani, and Michael Winikoff. Current issues in multi-
agent systems development. In Post-proceedings of the Seventh Annual Interna-
tional Workshop on Engineering Societies in the Agents World., volume 4457 of
LNAI, pages 38–61, 2007.
[10] Rafael H. Bordini, Michael Fisher, Carmen Pardavila, and Michael Wooldridge.
Model checking AgentSpeak. In Autonomous Agents and Multiagent Systems (AA-
MAS), pages 409–416, 2003.
[11] Rafael H. Bordini, Michael Fisher, Willem Visser, and Michael Wooldridge. Ver-
ifying multi-agent programs by model checking. Journal of Autonomous Agents
and Multi-Agent Systems (JAAMAS), 12:239–256, 2006.
[12] Rafael H. Bordini, Jomi Fred Hübner, and Michael Wooldridge. Programming
Multi-Agent Systems in AgentSpeak using Jason. Wiley, 2007.
[13] Lars Braubach, Alexander Pokahr, Daniel Moldt, and Winfried Lamersdorf. Goal
representation for BDI agent systems. In Programming Multiagent Systems, Sec-
ond International Workshop (ProMAS’04), volume 3346 of LNAI, pages 44–65.
Springer, Berlin, 2005.
[15] Paolo Bresciani, Paolo Giorgini, Fausto Giunchiglia, John Mylopoulos, and Anna
Perini. Tropos: An agent-oriented software development methodology. Journal of
Autonomous Agents and Multi-Agent Systems, 8:203–236, May 2004.
[16] Birgit Burmeister, Michael Arnold, Felicia Copaciu, and Giovanni Rimassa. BDI-
agents for agile goal-oriented business processes. In Proceedings of the 7th In-
ternational Joint Conference on Autonomous Agents and Multiagent Systems (AA-
MAS), Industry Track, pages 37–44. IFAAMAS, 2008.
[18] Paolo Busetta, Ralph Rönnquist, Andrew Hodgson, and Andrew Lucas. JACK
Intelligent Agents – Components for Intelligent Agents in Java. Technical report,
Agent Oriented Software Pty. Ltd, Melbourne, Australia, 1998. Available from
http://www.agent-software.com.
[20] G. Caire, M. Cossentino, A. Negri, A. Poggi, and P. Turci. Multi-agent systems im-
plementation and testing. In Fourth International Symposium: From Agent Theory
to Agent Implementation, pages 14–16, 2004.
[21] Monique Calisti and Giovanni Rimassa. Opportunities to support the widespread
adoption of software agent technologies. Int. J. Agent-Oriented Software Engineer-
ing, 3(4):411–415, 2009.
[22] L. Cernuzzi and G. Rossi. On the evaluation of agent oriented modeling meth-
ods. In Proceedings of Agent Oriented Methodology Workshop, Seattle, November
2002.
[23] Radovan Cervenka and Ivan Trencansky. The Agent Modeling Language AML:
A Comprehensive Approach to Modeling Multi-Agent Systems. Birkhäuser, 2007.
ISBN 978-3-7643-8395-4.
[24] Radovan Cervenka, Ivan Trencanský, and Monique Calisti. Modeling social as-
pects of multi-agent systems: The AML approach. In Jörg P. Müller and Franco
Zambonelli, editors, AOSE, volume 3950 of Lecture Notes in Computer Science,
pages 28–39. Springer, 2005.
[25] Radovan Cervenka, Ivan Trencanský, Monique Calisti, and Dominic A. P. Green-
wood. AML: Agent modeling language toward industry-grade agent-based mod-
eling. In James Odell, Paolo Giorgini, and Jörg P. Müller, editors, AOSE, volume
3382 of Lecture Notes in Computer Science, pages 31–46. Springer, 2004.
[26] Christopher Cheong and Michael Winikoff. Hermes: Designing flexible and robust
agent interactions. In Virginia Dignum, editor, Multi-Agent Systems – Semantics
and Dynamics of Organizational Models, chapter 5, pages 105–139. IGI, 2009.
[27] Edmund M. Clarke, E. Allen Emerson, and Joseph Sifakis. Model checking: Algo-
rithmic verification and debugging. Communications of the ACM, 52(11):74–84,
2009.
[28] Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Checking. The
MIT Press, 2000. ISBN 978-0-262-03270-4.
Chapter 15 747
[29] Massimo Cossentino. From requirements to code with the PASSI methodology. In
Brian Henderson-Sellers and Paolo Giorgini, editors, Agent-Oriented Methodolo-
gies, pages 79–106. Idea Group Inc., 2005.
[30] Massimo Cossentino, Marie-Pierre Gleizes, Ambra Molesini, and Andrea Omicini.
Processes engineering and AOSE. In Marie-Pierre Gleizes and Jorge J. Gómez-
Sanz, editors, Post-proceedings of Agent-Oriented Software Engineering (AOSE
2009), volume 6038 of LNCS, pages 191–212, 2011.
[31] Wim Coulier, Francisco Garijo, Jorge Gomez, Juan Pavon, Paul Kearney, and
Philip Massonet. MESSAGE: a methodology for the development of agent-based
applications. In Bergenti et al. [8], chapter 9.
[32] Hoa Khanh Dam and Michael Winikoff. An agent-oriented approach to change
propagation in software maintenance. Journal of Autonomous Agents and Multi-
Agent Systems, 23(3):384–452, 2011.
[33] Khanh Hoa Dam. Supporting Software Evolution in Agent Systems. PhD thesis,
RMIT University, Australia, 2008.
[34] Khanh Hoa Dam and Michael Winikoff. Comparing agent-oriented methodologies.
In Paolo Giorgini, Brian Henderson-Sellers, and Michael Winikoff, editors, AOIS,
volume 3030 of Lecture Notes in Computer Science, pages 78–93. Springer, 2003.
[35] Mehdi Dastani, Koen V. Hindriks, and John-Jules Ch. Meyer, editors. Specification
and Verification of Multi-agent systems. Springer, Berlin/Heidelberg, 2010.
[36] Mehdi Dastani, M. Birna van Riemsdijk, and John-Jules Ch. Meyer. Goal types in
agent programming. In Proceedings of the 17th European Conference on Artificial
Intelligence 2006 (ECAI’06), volume 141 of Frontiers in Artificial Intelligence and
Applications, pages 220–224. IOS Press, 2006.
[39] Scott A. DeLoach. Moving multi-agent systems from research to practice. Int. J.
Agent-Oriented Software Engineering, 3(4):378–382, 2009.
[40] Scott A. DeLoach, Mark F. Wood, and Clint H. Sparkman. Multiagent systems
engineering. International Journal of Software Engineering and Knowledge Engi-
neering, 11(3):231–258, 2001.
748 Chapter 15
[41] Louise A. Dennis, Berndt Farwer, Rafael H. Bordini, and Michael Fisher. A flexible
framework for verifying agent programs. In Autonomous Agents and Multi-Agent
Systems (AAMAS), pages 1303–1306. IFAAMAS, 2008.
[44] Virginia Dignum and Frank Dignum. The knowledge market: Agent-mediated
knowledge sharing. In Proceedings of the Third International/Central and Eastern
European Conference on Multi-Agent Systems (CEEMAS 03), pages 168–179, June
2003.
[45] Virginia Dignum and Frank Dignum. Designing agent systems: State of the prac-
tice. International Journal of Agent-Oriented Software Engineering, 4(3):224–243,
2010.
[46] Klaus Dorer and Monique Calisti. An adaptive solution to dynamic transport opti-
mization. In Michal Pěchouček, Donald Steiner, and Simon G. Thompson, editors,
4rd International Joint Conference on Autonomous Agents and Multiagent Systems
(AAMAS 2005), July 25-29, 2005, Utrecht, The Netherlands – Special Track for
Industrial Applications, pages 45–51. ACM, 2005.
[47] Simon Duff, James Harland, and John Thangarajah. On proactive and maintenance
goals. In Autonomous Agents and Multi-Agent Systems (AAMAS), pages 1033–
1040. ACM, 2006.
[48] Alexander Egyed. Instant consistency checking for the UML. In ICSE ’06: Pro-
ceedings of the 28th International Conference on Software Engineering, pages
381–390, New York, NY, USA, 2006. ACM.
[49] Jonathan Ezekiel and Alessio Lomuscio. Combining fault injection and model
checking to verify fault tolerance in multi-agent systems. In Autonomous Agents
and Multi-Agent Systems (AAMAS), pages 113–120, 2009.
[50] Brian Fitzgerald, Nancy L. Russo, and Tom O’Kane. Software development
method tailoring at Motorola. Commun. ACM, 46:64–70, April 2003.
[51] Roberto A. Flores and Robert C. Kremer. A pragmatic approach to build conversa-
tion protocols using social commitments. In Autonomous Agents and Multi-Agent
Systems (AAMAS), pages 1242–1243, July 2004.
Chapter 15 749
[52] Roberto A. Flores and Robert C. Kremer. A pragmatic approach to build conversa-
tion protocols using social commitments. In Nicholas R. Jennings, Carles Sierra,
Liz Sonenberg, and Milind Tambe, editors, Autonomous Agents and Multi-Agent
Systems (AAMAS), pages 1242–1243. ACM Press, 2004.
[53] Juan C. Garcia-Ojeda, Scott A. DeLoach, Robby, Walamitien H. Oyenan, and Jorge
Valenzuela. O-MaSE: A customizable approach to developing multiagent devel-
opment processes. In M. Luck and L. Padgham, editors, Agent-Oriented Software
Engineering VIII, volume 4951 of Lecture Notes in Computer Science (LNCS),
pages 1–15. Springer-Verlag, 2008.
[54] Francisco Garijo, Jorge J. Gomez-Sanz, Juan Pavon, and Phillipe Massonet. Multi-
agent system organization: An engineering perspective. In Proceedings of Mod-
elling Autonomous Agents in a Multi-Agent World, 10th European Workshop on
Multi-Agent Systems (MAAMAW’2001), pages 101–108, May 2001.
[55] Michael Georgeff. The gap between software engineering and multi-agent systems:
Bridging the divide. Int. J. Agent-Oriented Software Engineering, 3(4):391–396,
2009.
[56] Aditya Ghose. Industry traction for MAS technology: Would a rose by any other
name smell as sweet? Int. J. Agent-Oriented Software Engineering, 3(4):397–401,
2009.
[57] Jorge J. Gómez-Sanz, Juan Botía, Emilio Serrano, and Juan Pavón. Testing and
Debugging of MAS Interactions with INGENIAS. In Michael Luck and Jorge J.
Gomez-Sanz, editors, Agent-Oriented Software Engineering IX, pages 199–212,
Berlin, Heidelberg, 2009. Springer-Verlag.
[59] Kenwood H. Hall, Raymond J. Staron, and Pavel Vrba. Experience with holonic
and agent-based control systems and their adoption by industry. In V. Mařík, R.W.
Brennan, and M. Pěchouček, editors, Holonic and Multi-Agent Systems for Man-
ufacturing, Proceedings of the Second International Conference on Industrial Ap-
plications of Holonic and Multi-Agent Systems (HoloMAS’05), volume 3593 of
Lecture Notes in Artificial Intelligence (LNAI), pages 1–10, 2005.
[61] Brian Henderson-Sellers and Jolita Ralyté. Situational method engineering: State-
of-the-art review. J. UCS, 16(3):424–478, 2010.
[62] Koen V. Hindriks, Wiebe van der Hoek, and M. Birna van Riemsdijk. Agent pro-
gramming with temporally extended goals. In Autonomous Agents and Multi-Agent
Systems (AAMAS), pages 137–144. IFAAMAS, 2009.
[63] K.V. Hindriks, F.S. de Boer, W. van der Hoek, and J.-J. Ch. Meyer. Agent program-
ming with declarative goals. In Intelligent Agents VI – Proceedings of the 7th Inter-
national Workshop on Agent Theories, Architectures, and Languages (ATAL’2000).
Springer Verlag, 2001.
[65] J.F. Hübner, J.S. Sichman, and O. Boissier. Developing organised multiagent sys-
tems using the MOISE+ model: Programming issues at the system and agent lev-
els. International Journal of Agent-Oriented Software Engineering, 1(3/4):370–
395, 2007.
[66] Jomi Hübner, Olivier Boissier, and Rafael Bordini. From organisation specification
to normative programming in multi-agent organisations. In Jürgen Dix, João Leite,
Guido Governatori, and Wojtek Jamroga, editors, Computational Logic in Multi-
Agent Systems, volume 6245 of Lecture Notes in Computer Science, pages 117–
134. Springer Berlin / Heidelberg, 2010.
[67] Marc-Philippe Huget and James Odell. Representing agent interaction protocols
with agent UML. In Proceedings of the Fifth International Workshop on Agent-
Oriented Software Engineering (AOSE), July 2004.
[68] Marc-Philippe Huget, James Odell, Øystein Haugen, Mariam “Misty” Nodine,
Stephen Cranefield, Renato Levy, and Lin Padgham. 7FIPA modeling: Interaction
diagrams. On www.auml.org under Working Documents, 2003. FIPA Working
Draft (version 2003-07-02).
[69] Carlos Iglesias, Mercedes Garrijo, and José Gonzalez. A survey of agent-oriented
methodologies. In Jörg Müller, Munindar P. Singh, and Anand S. Rao, editors,
Proceedings of the 5th International Workshop on Intelligent Agents V: Agent The-
ories, Architectures, and Languages (ATAL-98), volume 1555, pages 317–330.
Springer-Verlag: Heidelberg, Germany, 1999.
[70] Carlos A. Iglesias, Mercedes Garijo, José C. González, and Juan R. Velasco.
A methodological proposal for multiagent systems development extending Com-
monKADS. In Proceedings of the Tenth Knowledge Acquisition for Knowledge-
Based Systems Workshop, 1996.
Chapter 15 751
[71] Carlos Argel Iglesias, Mercedes Garijo, José C. González, and Juan R. Velasco.
Analysis and design of multiagent systems using MAS-CommonKADS. In Agent
Theories, Architectures, and Languages, pages 313–327, 1997.
[75] Gaya Jayatilleke, Lin Padgham, and Michael Winikoff. A model driven develop-
ment toolkit for domain experts to modify agent based systems. In Lin Padgham
and Franco Zambonelli, editors, Agent-Oriented Software Engineering VII: 7th In-
ternational Workshop, AOSE 2006, Hakodate, Japan, May 2006, Revised and In-
vited Papers, 2006.
[76] Cliff B. Jones, Ian J. Hayes, and Michael A. Jackson. Deriving specifications
for systems that are connected to the physical world. In Cliff B. Jones, Zhiming
Liu, and Jim Woodcock, editors, Formal Methods and Hybrid Real-Time Systems:
Essays in Honour of Dines Bjørner and Zhou Chaochen on the Occasion of Their
70th Birthdays, volume 4700 of Lecture Notes in Computer Science (LNCS), pages
364–390. Springer, 2007.
[77] T. Juan, A. Pearce, and L. Sterling. ROADMAP: Extending the Gaia methodology
for complex open systems. In Proceedings of the First International Joint Confer-
ence on Autonomous Agents and Multi-Agent Systems (AAMAS 2002), pages 3–10.
ACM Press, 2002.
[78] David Kinny, Michael Georgeff, and Anand Rao. A methodology and modelling
technique for systems of BDI agents. In Rudy van Hoe, editor, Seventh European
Workshop on Modelling Autonomous Agents in a Multi-Agent World, 1996.
[79] Sanjeev Kumar and Philip R. Cohen. STAPLE: An agent programming language
based on the joint intention theory. In Proceedings of the Third International Joint
Conference on Autonomous Agents & Multi-Agent Systems (AAMAS 2004), pages
1390–1391. ACM Press, July 2004.
[80] Sanjeev Kumar, Philip R. Cohen, and Marcus J. Huber. Direct execution of team
specifications in STAPLE. In Proceedings of the First International Joint Confer-
ence on Autonomous Agents & Multi-Agent Systems (AAMAS 2002), pages 567–
568. ACM Press, July 2002.
752 Chapter 15
[81] Sanjeev Kumar, Marcus J. Huber, and Philip R. Cohen. Representing and executing
protocols as joint actions. In Proceedings of the First International Joint Confer-
ence on Autonomous Agents and Multi-Agent Systems, pages 543 – 550, Bologna,
Italy, 15 – 19 July 2002. ACM Press.
[82] Magnus Ljungberg and Andrew Lucas. The OASIS air-traffic management system.
In Proceedings of the Second Pacific Rim International Conference on Artificial
Intelligence (PRICAI ’92), Seoul, Korea, 1992.
[83] Chi Keen Low, Tsong Yueh Chen, and Ralph Rönnquist. Automated test case gen-
eration for BDI agents. Autonomous Agents and Multi-Agent Systems, 2(4):311–
332, 1999.
[84] Stephen J. Mellor, Anthony N. Clark, and Takao Futagami. Guest editors’ intro-
duction: Model-driven development. IEEE Software, 20(5):14–18, 2003.
[85] T. Miller, L. Padgham, and J. Thangarajah. Test coverage criteria for agent inter-
action testing. In Danny Weyns and Marie-Pierre Gleizes, editors, Proceedings of
the 11th International Workshop on Agent-Oriented Software Engineering, pages
1–12, 2010.
[86] Ambra Molesini, Enrico Denti, and Andrea Omicini. Agent-based conference man-
agement: A case study in SODA. IJAOSE, 4(1):1–31, 2010.
[87] Jörg P. Müller and Bernhard Bauer. Agent-Oriented Software Technologies: Flaws
and Remedies. In Fausto Giunchiglia, James Odell, and Gerhard Weiß, editors,
Agent-Oriented Software Engineering III: Third International Workshop, AOSE
2002, Revised Papers and Invited Contributions, volume 2585 of LNCS, pages 21–
227. Springer-Verlag, 2003.
[90] Cu D. Nguyen, Anna Perini, and Paolo Tonella. Goal-Oriented Testing for MASs.
International Journal of Agent-Oriented Software Engineering, 4(1):79–109, 2010.
[91] J. Odell, H. Parunak, and B. Bauer. Extending UML for agents. In Proceedings of
the Agent-Oriented Information Systems Workshop at the 17th National Conference
on Artificial Intelligence, 2000.
Chapter 15 753
[93] Andrea Omicini. SODA: Societies and Infrastructures in the Analysis and Design
of Agent-Based Systems. In AOSE, pages 185–193, 2000.
[94] Lin Padgham and Michael Winikoff. Prometheus: A methodology for developing
intelligent agents. In Third International Workshop on Agent-Oriented Software
Engineering, July 2002.
[95] Lin Padgham and Michael Winikoff. Developing intelligent agent systems: A prac-
tical guide. John Wiley & Sons, Chichester, 2004. ISBN 0-470-86120-7.
[96] Lin Padgham, Michael Winikoff, Scott DeLoach, and Massimo Cossentino. A
unified graphical notation for AOSE. In Michael Luck and Jorge J. Gomez-Sanz,
editors, Proceedings of the Ninth International Workshop on Agent-Oriented Soft-
ware Engineering, pages 116–130, Estoril, Portugal, May 2008.
[97] H. Van Dyke Parunak. “Go to the Ant”: Engineering Principles from Natural
Multi-Agent systems. Annals of Operations Research, 75:69–101, 1997.
[98] H. Van Dyke Parunak and Sven A. Brueckner. Engineering swarming systems. In
Bergenti et al. [8], chapter 17, pages 341–376.
[99] H. Van Dyke Parunak, Paul Nielsen, Sven Brueckner, and Rafael Alonso. Hy-
brid multi-agent systems: Integrating swarming and BDI agents. In Sven Brueck-
ner, Salima Hassas, Márk Jelasity, and Daniel Yamins, editors, Engineering Self-
Organising Systems, 4th International Workshop, ESOA 2006, Hakodate, Japan,
May 9, 2006, Revised and Invited Papers, volume 4335 of Lecture Notes in Com-
puter Science, pages 1–14. Springer, 2007.
[100] Juan Pavón, Jorge J. Gómez-Sanz, and Rubén Fuentes-Fernández. The INGENIAS
Methodology and Tools, article IX, pages 236–276. Idea Group Publishing, 2005.
[101] Gauthier Picard and Marie-Pierre Gleizes. The ADELFE methodology: Designing
adaptive cooperative multi-agent systems. In Bergenti et al. [8], chapter 8.
[103] Alexander Pokahr, Lars Braubach, and Winfried Lamersdorf. Jadex: A BDI rea-
soning engine. In Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El
754 Chapter 15
[105] Michal Pěchouček and Vladimír Mařík. Industrial deployment of multi-agent tech-
nologies: review and selected case studies. Journal of Autonomous Agents and
Multi-Agent Systems (JAAMAS), 17:397–431, 2008.
[106] Iyad Rahwan, Liz Sonenberg, Nicholas R. Jennings, and Peter McBurney. Stra-
tum: A methodology for designing heuristic agent negotiation strategies. Applied
Artificial Intelligence, 21(6):489–527, 2007.
[108] Anand S. Rao and Michael P. Georgeff. BDI agents: From theory to practice. In
Victor R. Lesser and Les Gasser, editors, Proceedings of the First International
Conference on Multiagent Systems, June 12-14, 1995, San Francisco, California,
USA, pages 312–319. The MIT Press, 1995.
[109] Alessandro Ricci, Mirko Viroli, and Andrea Omicini. Give agents their artifacts:
The A&A approach for engineering working environments in MAS. In Proceed-
ings of the 6th International Joint Conference on Autonomous Agents and Multi-
agent Systems, pages 613–615, New York, NY, USA, 2007. ACM.
[110] Collette Rolland, Georges Grosz, and Régis Kla. Experience with goal-scenario
coupling in requirements engineering. In Proceedings of the Fourth IEEE Interna-
tional Symposium on Requirements Engineering (RE’99), 1999.
[111] John Rushby. A safety-case approach for certifying adaptive systems. In AIAA
Infotech@Aerospace Conference, April 2009.
[112] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pren-
tice Hall, 2nd edition, 2003.
[113] Nathan Schurr, Janusz Marecki, John P. Lewis, Milind Tambe, and Paul Scerri.
The DEFACTO system: Coordinating human-agent teams for the future of disaster
response. In Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-
Seghrouchni, editors, Multi-Agent Programming: Languages, Platforms and Ap-
plications, volume 15 of Multiagent Systems, Artificial Societies, and Simulated
Organizations, pages 197–215. Springer, 2005.
Chapter 15 755
[114] S. Shapiro, Y. Lespérance, and H.J. Levesque. The cognitive agents specification
language and verification environment for multiagent systems. In Proceedings of
the First International Joint Conference on Autonomous Agents and Multiagent
Systems: Part 1, pages 19–26. ACM New York, NY, USA, 2002.
[115] Onn Shehory and Arnon Sturm. Evaluation of modeling techniques for agent-based
systems. In Jörg P. Müller, Elisabeth Andre, Sandip Sen, and Claude Frasson,
editors, Proceedings of the Fifth International Conference on Autonomous Agents,
pages 624–631. ACM Press, May 2001.
[116] Carles Sierra, John Thangarajah, Lin Padgham, and Michael Winikoff. Design-
ing institutional multi-agent systems. In Lin Padgham and Franco Zambonelli,
editors, Agent-Oriented Software Engineering VII: 7th International Workshop,
AOSE 2006, Hakodate, Japan, May 2006, Revised and Invited Papers, volume
4405, pages 84–103. Springer LNCS, 2007.
[117] Arnon Sturm and Onn Shehory. A framework for evaluating agent-oriented
methodologies. In Paolo Giorgini and Michael Winikoff, editors, Proceedings of
the Fifth International Bi-Conference Workshop on Agent-Oriented Information
Systems, pages 60–67, Melbourne, Australia, 2003.
[118] Jan Sudeikat, Lars Braubach, Alexander Pokahr, and Winfried Lamersdorf. Eval-
uation of agent-oriented software methodologies: Examination of the gap between
modeling and platform. In Paolo Giorgini, Jörg Müller, and James Odell, editors,
Agent-Oriented Software Engineering (AOSE), 2004.
[119] John Thangarajah, Gaya Jayatilleke, and Lin Padgham. Scenarios for system re-
quirements traceability and testing. In Tumer, Yolum, Sonenberg, and Stone, edi-
tors, Proceedings of the 10th International Conference on Autonomous Agents and
Multi-Agent Systems (AAMAS), pages 285–292. IFAAMAS, 2011.
[120] Ali Murat Tiryaki, Sibel Öztuna, Oguz Dikenelli, and Riza Cenk Erdur. SUNIT: A
Unit Testing Framework for Test Driven Development of Multi-Agent Systems. In
AOSE, pages 156–173, 2006.
[121] Quynh-Nhu Numi Tran and Graham C. Low. Comparison of ten agent-oriented
methodologies. In Brian Henderson-Sellers and Paolo Giorgini, editors, Agent-
Oriented Methodologies, chapter XII, pages 341–367. Idea Group Publishing,
2005.
[123] M. Birna van Riemsdijk, Mehdi Dastani, and Michael Winikoff. Goals in agent
systems: A unifying framework. In Proceedings of the Seventh International Joint
756 Chapter 15
[124] Hans van Vliet. Software Engineering: Principles and Practice. John Wiley &
Sons, Inc., 2nd edition, 2001. ISBN 0471975087.
[125] Danny Weyns, Alexander Helleboogh, and Tom Holvoet. How to get multi-
agent systems accepted in industry? Int. J. Agent-Oriented Software Engineering,
3(4):383–390, 2009.
[127] Michael Winikoff. Future directions for agent-based software engineering. Int. J.
Agent-Oriented Software Engineering, 3(4):402–410, 2009.
[128] Michael Winikoff. Assurance of Agent Systems: What Role should Formal Ver-
ification play? In Mehdi Dastani, Koen V. Hindriks, and John-Jules Ch. Meyer,
editors, Specification and Verification of Multi-agent systems, chapter 12, pages
353–383. Springer, 2010.
[129] Michael Winikoff and Stephen Cranefield. On the testability of BDI agent sys-
tems. Information Science Discussion Paper Series 2008/03, University of Otago,
Dunedin, New Zealand, 2008.
[130] Michael Winikoff, Lin Padgham, and James Harland. Simplifying the development
of intelligent agents. In Markus Stumptner, Dan Corbett, and Mike Brooks, editors,
AI2001: Advances in Artificial Intelligence. 14th Australian Joint Conference on
Artificial Intelligence, pages 555–568. Springer, LNAI 2256, December 2001.
[131] M. Wooldridge, N.R. Jennings, and D. Kinny. The Gaia methodology for agent-
oriented analysis and design. Autonomous Agents and Multi-Agent Systems, 3(3),
2000.
[132] Michael Wooldridge. An Introduction to MultiAgent Systems. John Wiley & Sons
(Chichester, England), 2002. ISBN 0 47149691X.
[133] Michael Wooldridge, Michael Fisher, Marc-Philippe Huget, and Simon Parsons.
Model checking multi-agent systems with MABLE. In Autonomous Agents and
Multi-Agent Systems (AAMAS), pages 952–959, 2002.
[134] Pınar Yolum and Munindar P. Singh. Flexible protocol specification and execution:
Applying event calculus planning using commitments. In Proceedings of the 1st
Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages
527–534, 2002.
Chapter 15 757
[135] Pınar Yolum and Munindar P. Singh. Reasoning about commitments in the event
calculus: An approach for specifying and executing protocols. Annals of Mathe-
matics and Artificial Intelligence (AMAI), Special Issue on Computational Logic in
Multi-Agent Systems, 2004.
[136] E. Yu. Modelling Strategic Relationships for Process Reengineering. PhD thesis,
University of Toronto, Department of Computer Science, 1995.
[138] Zhiyong Zhang. Automated Unit Testing of Agent Systems. PhD thesis, RMIT
University, Melbourne, Australia, 2011.
[139] Zhiyong Zhang, John Thangarajah, and Lin Padgham. Automated testing for intel-
ligent agent systems. In AOSE, pages 66–79, 2009.
[140] M. Zheng and V.S. Alagar. Conformance testing of BDI properties in agent-based
software systems. In Proceedings of the 12th Asia-Pacific Software Engineering
Conference (APSEC), pages 457–464. IEEE Computer Society Press, 2005.
Part VI
Technical Background
Chapter 16
1 Introduction
If one wants to reason about an agent or about a multiagent system, then logic
can provide a convenient and powerful tool. First of all, logics provide a language
with which to specify properties: properties of the agent, of other agents, and of the
environment. Ideally, such a language also provides a means to then implement an
agent or a multiagent system, either by somehow executing the specification, or by
transforming the specification into some computational form. Second, given that
such properties are expressed as logical formulas that are part of some inference
system, they can be used to deduce other properties. Such reasoning can be part
of an agents’ own capabilities, but it can also be done by the system designer,
the analyst, or the potential user of (one of) the agents. Third, logics provide a
formal semantics in which the sentences from the language get a precise meaning:
if one manages to come up with a semantics that closely models (part of) the
system under consideration, one then can verify properties of a particular system
(model checking). This, in a nutshell, sums up the three main characteristics
of any logic (language, deduction, semantics), as well as the three main roles
that logics play in system development (specification, execution, and verification)
(see also Chapter 14 for a discussion on the role of logic in specification and
verification of agent systems).
If the role and value of logic for multiagent systems (MAS) is clear, then why
are there so many logics for MAS, with new variants proposed at almost every
multiagent conference? What most of these logics compete for is a proper balance
762 Chapter 16
between expressiveness, on the one hand, and complexity on the other. What kinds
of properties are interesting for the scenarios of interest, and how can they be
“naturally" and concisely expressed? Then, how complex are the formalisms, in
terms of how easily the key relevant properties can be expressed and grasped by
human users, and how costly is it to use the formalism when doing verification or
reasoning with it? Of course, the complexity of the logic under consideration will
be related to the complexity of the domain it tries to formalize.
In multiagent research, this complexity often depends on a number of issues.
Let us illustrate this with a simple example, say the context of traffic. If there
is only one agent involved, the kinds of things we would like to represent to
model the agent’s sensing, planning, and acting could probably be done in a sim-
ple propositional logic, using atoms like gn (light n is green), ok (gate k is open),
and ei,m (agent i enters through gate m). However, taking the agent perspective
seriously, one quickly realizes that we need more: there might be a difference
between what actually is the case and what the agent believes is the case, and
also between what the agent believes to hold and what the agent would like to be
true – otherwise there would be no reason to act! So, we would like to be able to
say things like ¬ok ∧ Bi ok ∧ Di (ok → ei,k ) (although the gate is closed, the agent
believes it is open and desires to establish that in that case it enters through it).
Things get more interesting when several agents enter the scene. Not only does
our agent i need a model of its operational environment, but also a model of j’s
mental state, the latter involving a model of i’s mental state. We can then express
properties like Bi B j g j → stopi (if i believes that j believes that the latter’s light is
green, then i will stop). Higher-order information enters the picture, and there is
no a priori level where this would stop (this is for instance important in reasoning
about games: see the discussion on common knowledge for establishing a Nash
equilibrium [12]). In our simple traffic scenario,1 assume that i is a pedestrian who
approaches a crossing without traffic lights while motorist j advances as well. It
might well be that both believe that j is advancing (Bi ad j ∧ B j ad j ) so, being the
more vulnerable party, one would expect that i will wait. However, if i has a strong
desire to not lose time with waiting, it may try to make j stop for i by “making j
believe that i is not aware of j" (Bi ad j ∧ B j ad j ∧ Di Bi ¬B j Bi ad j ), i.e., it is i’s desire
to be convinced that j does not believe that i is aware of j advancing (i can avoid
eye contact, for instance, and approach the crossing in a determined way, the aim
of this being that j prefers no accident over being involved in one and hence will
stop). In other words, i plans to act contrary to its own beliefs.
Another dimension that makes multiagent scenarios (and hence their log-
ics) complex is their dynamics: the world changes (this is arguably the goal of
the whole enterprise) and the information, the desires, and the goals of agents
1 This example is adapted from one given in a talk by Rohit Parikh.
Chapter 16 763
change as well. So, we need tools to reason either about time, or else about ac-
tions explicitly. A designer of a crossing is typically interested in properties like
A ¬(gi ∧ g j ) (this is a safety property requiring that in all computations, it is
always the case that no two lights are green) and cli → A♦gi (this is a liveness
property expressing that if there is a car in lane i, then no matter how the system
evolves, eventually light i will be green). Combining the aspects of multiagents
and dynamics is where things really become intriguing: there is not just “a fu-
ture,” or a “possible future depending on an agent’s choice": what the future looks
like will depend on the choices of several agents at the same time. We will come
across languages in which one can express ¬i
♦s ∧ ¬ j
♦s ∧ i, j
♦s (al-
though i cannot bring about that eventually everybody has safely crossed the road,
and neither can j, by cooperating together, i and j can guarantee that they both
cross in a safe manner).
Logics for MAS are often some variant of modal logic; to be more precise,
they are all intensional, contrary to propositional and first-order logic, which are
extensional. A logic is extensional if the truth-value of a formula is completely
determined given the truth-value of all its constituents, the sub-formulas. If we
know the truth-value of p and q, we also know that of (p ∧ q), and of ¬p → (q →
p). For logics of agency, extensionality is often not realistic. It might well be that
“rain in Utrecht” and “rain in Liverpool” are both true, while our agent knows one
without the other. Even if one is given the truth value of p and of q, one is not
guaranteed to be able to tell whether Bi (p∨q) (agent i believes that p∨q), whether
♦(p ∧ q) (eventually, both p and q), or whether Bi (p → Bh q) (i believes that it
is always the case that as soon as p holds, agent h believes that q).
These examples make clear why extensional logics are so popular for multi-
agent systems. However, the most compelling argument for using modal logics for
modeling the scenarios we have in mind lies probably in the semantics of modal
logics. They are built around the notion of a “state,” which can be the state of
a system, of a processor, or a situation in a scenario. Considering several states
at the same time is then rather natural, and usually, many of them are “related”:
some because they “look the same” for a given agent (they define its beliefs), some
because they are very attractive (they comprise its desires), or some of them may
represent some state of affairs in the future (they model possible evolutions of the
system). Finally, some states are reachable only when certain agents make certain
decisions (those states determine what coalitions can achieve).
In the remainder of this section, we demonstrate some basic languages, infer-
ence systems, and semantics that are foundational for logics of agency. The rest
of the chapter is then organized along two main streams, reflecting two trends in
multiagent systems research when it comes to representing and reasoning about
environments:
764 Chapter 16
Knowledge Axioms
Kn1 ϕ where ϕ is a propositional tautology
Kn2 Ki (ϕ → ψ) → (Ki ϕ → Ki ψ)
Kn3 Ki ϕ → ϕ
Kn4 Ki ϕ → Ki Ki ϕ
Kn5 ¬Ki ϕ → Ki ¬Ki ϕ
Rules of Inference
MP ϕ, (ϕ → ψ) ⇒ ψ
Nec ϕ ⇒ Ki ϕ
Cognitive models of rational action: The first main strand of research in repre-
senting multiagent systems focuses on the issue of representing the attitudes
of agents within the system: their beliefs, aspirations, intentions, and the
like. The aim of such formalisms is to derive a model that predicts how a
rational agent would go from its beliefs and desires to actions. Work in
this area builds largely on research in the philosophy of mind. The logical
approaches presented in Section 2 focus on this trend.
Models of the strategic structure of the system: The second main strand of re-
search focuses on the strategic structure of the environment: what agents
can accomplish in the environment, either together or alone. Work in this
area builds on models of effectivity from the game theory community, and
the models underpinning such logics are closely related to formal games. In
Section 3 we present logics that deal with this trend.
logic for belief (where “i believes ϕ” is usually written as Bi ϕ). Finally, Kn4 and
Kn5 denote positive and negative introspection, respectively: they indicate that an
agent knows what it knows, and knows what it is ignorant of. Modus Ponens (MP)
is a standard logical rule, and Necessitation (Nec) guarantees that it is derivable
that agents know all tautologies.
Moving on to the semantics of such a logic, models for epistemic logic are
tuples M = S, Ri∈Ag ,V
(also known as Kripke models), where S is a set of states,
Ri ⊆ S × S is a binary relation for each agent i, and V : At → 2S gives for each
atom p the states V (p) where p is true. Truth of ϕ in a model M with state s,
written as M, s |= ϕ, is standard for the classical connectives (cf. Figure 16.2, left),
and the clause M, s |= Ki ϕ means that for all t with Ri st, M,t |= ϕ holds. In other
words, in state s agent i knows ϕ iff ϕ is true in all states t that look similar to s
for i. Ki is called the necessity operator for Ri . M |= ϕ means that for all states
s ∈ S, M, s |= ϕ. So the states describe some atomic facts p about some situation,
and Ri st means that for agent i, the states s and t look the same, or, given its
information, are indistinguishable. Let S5m be all models in which each Ri is an
equivalence relation. Let S5m |= ϕ mean that in all models M ∈ S5m , we have
M |= ϕ. The system S5m is complete for the validities in S5m , i.e., for all ϕ we
have S5m ϕ iff S5m |= ϕ.
Whereas in epistemic logics the binary relation on the set of states represents
the agents’ ignorance, in temporal logics it represents the flow of time. In its most
simple appearance, time has a beginning, and advances linearly and discretely
into an infinite future: this is linear-time temporal logic (LTL, [84]). So a simple
model for time is obtained by taking as the set of states the natural numbers N, and
for the accessibility relation “the successor” of, i.e., R = {(n, n + 1)}, and V , the
valuation, can be used to specify specific properties in states. In the language, we
then would typically see operators for the “next state,” ( i), for “all states in the
future” ( ), and for “some time in the future” (♦). The truth conditions for those
operators, together with an axiom system for them, are given in Figure 16.2. Note
that is the reflexive transitive closure of R, and ♦ is its dual: ♦ϕ = ¬ ¬ϕ.
To give a simple example, suppose that atom p is true in exactly the prime
numbers, e is true in all even numbers, and o in all odd numbers. In the nat-
ural numbers, in state 0 we then have ♦ p (there are infinitely many primes),
♦(p ∧ e ∧ i ¬(p ∧ e)) (there is a number that is even and prime, and for which
all greater numbers are not even and prime), and (♦(e ∧ i♦o)) (for every
number, one can find a number at least as big that is even and for which one can
find a bigger number that is odd).
Often, one wants a more expressive language, adding, for instance, an operator
for until:
LTL Axioms
Truth Conditions
LT L1 ϕ (ϕ a prop. taut.)
M, n |= p iff n ∈ V (p) h
LT L2 (ϕ → ψ) → ( ϕ → hψ)
h
M, n |= ¬ϕ iff not M, n |= ϕ
LT L3 ¬ hϕ ↔ h¬ϕ
M, n |= ϕ ∧ ψ iff M, n |= ϕ and
LT L4 ϕ → (ϕ ∧ h ϕ)
M, n |= ψ
h Rules of Inference
M, n |= ϕ iff M, n + 1 |= ϕ
MP ϕ, (ϕ → ψ) ⇒ ψ
M, n |= ϕ iff ∀m ≥ n, M, m |= ϕ
Nec ϕ ⇒ hϕ
M, n |= ♦ϕ iff ∃m ≥ n, M, m |= ϕ
Ind ϕ → ψ, ϕ → hϕ ⇒ ϕ → ψ
A rational agent deliberates about its choices, and to represent those, branching-
time seems a more appropriate framework than linear-time. To understand
branching-time operators, though, an understanding of linear-time operators is
still of benefit. Computational tree logic (CTL) (see [29]) is a branching-time logic
that uses pairs of operators; the first quantifies over paths, and the second is an LTL
operator over those paths. Let us demonstrate this by mentioning some properties
that are true in the root ρ of the branching-time model M of Figure 16.3. Note that
on the highlighted path, in ρ, the formula ¬q is true. Hence, on the branching-
model M, ρ, we have E ¬q, saying that in ρ, there exists a path through it, on
which q is always false. A♦ϕ means that on every path starting in ρ, there is some
future point where ϕ is true. So, in ρ, A♦¬p holds. Likewise, Ep U q is true in
ρ because there is a path (the path “up,” for example) in which p U q is true. We
leave it to the reader to check that in ρ, we have E♦(p ∧ A ¬p). In CTL ∗ , the
requirement that path and tense operators need to occur together is dropped: CTL ∗
formulas true in ρ are, for instance, A( ♦ p ∧ ♦¬p (all paths have infinitely
many p states and infinitely many ¬p states), and AEp (for all paths, there is a
path such that p).
Rather than giving an axiom system for CTL here, we now describe frame-
works where change is not imposed by nature (i.e., by passing of time), but where
we can be more explicit about how change is brought about. By definition, an
agent is supposed to act, so rather than thinking of the flow of time as the main
driver for change, transitions between states can be labeled with actions, or, more
generally, a pair (i, α), where i is an agent and α an action. The formalism dis-
cussed below is based on dynamic logic [41, 42], which is again a modal logic. On
page 772 we will see how temporal operators can be defined in terms of dynamic
ones.
Actions in the set Ac are either atomic actions (a, b, . . . ) or composed (α, β, . . . )
by means of testing of formulas (ϕ?), sequencing (α; β), conditioning (if ϕ then
Chapter 16 767
pq
p
p
q
p
α else β), and repetition (while ϕ do α). The informal meaning of such constructs
is as follows:
Here, the test must be interpreted as a test by the system; it is not a so-called
“knowledge-producing action” (like observations or communication) that can be
used by the agent to acquire knowledge.
These actions α can then be used to build new formulas to express the possible
result of the execution of α by agent i (the formula [doi (α)]ϕ denotes that ϕ is a
result of i’s execution of α), the opportunity for i to perform α (doi (α)
), and i’s
capability of performing the action α (Ai α). The formula doi (α)
ϕ is shorthand
for ¬[doi (α)]¬ϕ, thus expressing that one possible result of performance of α by
i implies ϕ.
In the Kripke semantics, we then assume relations Ra for individual actions,
where the relations for compositions are then recursively defined: for instance,
Rα;β st iff for some state u, Rα su and Rβ ut. Indeed, [doi (α)] is then the neces-
sity operator for Rα . Having epistemic and dynamic operators, one has already
a rich framework to reason about an agent’s knowledge about doing actions. For
instance, a property like perfect recall
which semantically implies some grid structure on the set of states: If Rα st and
Ritu, then for some v, we also have Ri sv and Rα vu. For temporal epistemic logic,
perfect recall is captured in the axiom Ki iϕ → iKi ϕ, while its converse, no
learning, is iKi ϕ → Ki iϕ. It is exactly this kind of interaction properties that
can make a multiagent logic complex, both conceptually and computationally.
In studying the way that actions and knowledge interact, Robert Moore argued
that one needs to identify two main issues. The first is that some actions produce
knowledge, and therefore their effects must be formulated in terms of the epistemic
states of participants. The second is that of knowledge preconditions: what an
agent needs to know in order to be able to perform an action. A simple example is
that in order to unlock a safe, one must know the combination for the lock. Using
these ideas, Moore formalized a notion of ability. He suggested that in order for
an agent to be able to achieve some state of affairs ϕ, the agent must either:
The point about “knowing the identity” of an action is that in order for me to be
able to become rich, it is not sufficient for me simply to know that there exists
some action I could perform that would make me rich. I must either know what
that action is (the first clause above), or else be able to perform some action that
would furnish me with the information about which action to perform in order to
make myself rich. This subtle notion of knowing an action is rather important, and
it is related to the distinction between knowledge de re (which involves knowing
the identity of a thing) and de dicto (which involves knowing that something ex-
ists) [31, p. 101]. In the example of the safe, most people would have knowledge
de dicto to open the safe, but only a few would have knowledge de re. We will
see later, when we review more recent work on temporal logics of ability, that this
distinction also plays an important role there.
It is often the case that actions are ontic: they bring about a change in the
world, like assigning a value to a variable, moving a block, or opening a door.
However, dynamic epistemic logic (DEL) [113] studies actions that bring about
mental change: change of knowledge in particular. So in DEL, the actions them-
selves are epistemic. A typical example is announcing ϕ in a group of agents:
[ϕ]ψ would then mean that after announcement of ϕ, it holds that ψ. Surprisingly
enough, the formula [ϕ]Ki ϕ (after the announcement that ϕ, agent i knows that
ϕ, is not a validity, a counterexample being the infamous Moore sentences [74]
ϕ = (¬Ki p ∧ p): “although i does not know it, p holds”).
Chapter 16 769
Let us make one final remark in this section. We already indicated that things
become interesting and challenging when one combines several notions into one
framework (like knowledge and action, or knowledge and time). In fact it already
becomes interesting if we stick to one notion, and take the aspect of having a
multiagent system seriously. For instance, interesting group notions of knowledge
in S5m gives rise to are
• Eϕ (“everybody knows ϕ”, i.e., K1 ϕ ∧ · · · ∧ Km ϕ),
• Dϕ (“it is distributed knowledge that ϕ”, i.e., if you would pool all the
knowledge of the agents together, ϕ would follow from it, like in (Ki (ϕ1 →
ϕ2 ) ∧ K j ϕ1 ) → Dϕ2 )), and
• Cϕ (“it is common knowledge that ϕ”: this is axiomatized such that it re-
sembles the infinite conjunction Eϕ ∧ EEϕ ∧ EEEϕ ∧ . . . ).
In DEL for instance, this gives rise to the question of which formulas are success-
ful, i.e., formulas for which [ϕ]Cϕ (after ϕ is announced, it is public knowledge) is
valid [114]. Different questions arise when taking the multiagent aspect seriously
in the context of actions. What happens with the current state if all agents take
some action? How to “compute” the result of those concurrent choices? This lat-
ter question will in particular be addressed in Section 3. First, we focus on logics
that amplify aspects of the mental state of agents. The next two sections heavily
borrow from [111].
abstraction tool. If we accept the usefulness of the intentional stance for charac-
terizing the properties of rational agents, then the next step in developing a formal
theory of such agents is to identify the components of an agent’s state. There are
many possible mental states that we might choose to characterize an agent: be-
liefs, goals, desires, intentions, commitments, fears, and hopes are just a few. We
can identify several important categories of such attitudes, for example:
Information attitudes: those attitudes an agent has toward information about its
environment. The most obvious members of this category are knowledge
and belief.
Pro-attitudes: those attitudes an agent has that tend to lead it to perform actions.
The most obvious members of this category are goals, desires, and inten-
tions.
Moreover, there is also a social state when modeling agents which includes:
We will not say much about normative attitudes in this chapter other than giving
some pointers for further reading in Section 4.
Much of the literature on developing formal theories of agency has been taken
up with the relative merits of choosing one attitude over another, and investigating
the possible relationships between these attitudes.
Following [15, 16], Cohen and Levesque identify seven specific properties that
must be satisfied by a reasonable theory of intention:
1. Intentions pose problems for agents, who need to determine ways of achiev-
ing them.
2. Intentions provide a “filter” for adopting other intentions, which must not
conflict.
3. Agents track the success of their intentions, and are inclined to try again if
their attempts fail.
5. Agents do not believe they will not bring about their intentions.
6. Under certain circumstances, agents believe they will bring about their in-
tentions.
7. Agents need not intend all the expected side effects of their intentions.
Given these criteria, Cohen and Levesque adopt a two-tiered approach to the prob-
lem of formalizing a theory of intention. First, they construct the logic of rational
agency, “being careful to sort out the relationships among the basic modal opera-
tors” [22, p. 221]. On top of this framework, they introduce a number of derived
constructs, which constitute a “partial theory of rational action” [22, p. 221]; in-
tention is one of these constructs.
Syntactically, the logic of rational agency is a many-sorted, first-order, multi-
modal logic with equality, containing four primary modalities (see Table 16.1).
The semantics of Bel and Goal are given via possible worlds, in the usual way:
each agent is assigned a belief accessibility relation and a goal accessibility rela-
tion. The belief accessibility relation is Euclidean, transitive, and serial, giving a
belief logic of KD45. The goal relation is serial, giving a conative logic KD. It
is assumed that each agent’s goal relation is a subset of its belief relation, imply-
ing that an agent will not have a goal of something it believes will not happen.
A world in this formalism is a discrete sequence of events, stretching infinitely
into the past and future. The system is only defined semantically, and Cohen and
772 Chapter 16
Operator Meaning
(Bel i ϕ) agent i believes ϕ
(Goal i ϕ) agent i has goal of ϕ
(Happens α) action α will happen next
(Done α) action α has just happened
A temporal precedence operator, (Before p q), can also be derived, and holds if p
holds before q. An important assumption is that all goals are eventually dropped:
(P-Goal i p) =
ˆ (Goal i (Later p)) ∧
(Bel
⎡ i ¬p) ⎤ ∧
Before
⎣ ((Bel i p) ∨ (Bel i ¬p)) ⎦
¬(Goal i (Later p))
1. It has a goal that p eventually becomes true, and believes that p is not cur-
rently true.
2. Before it drops the goal, one of the following conditions must hold:
(Intend i α) =
ˆ (P-Goal i
[Done i (Bel i (Happens α))?; α]
)
Cohen and Levesque go on to show how such a definition meets many of Brat-
man’s criteria [15] for a theory of intention (outlined above). In particular, by
basing the definition of intention on the notion of a persistent goal, Cohen and
Levesque are able to avoid overcommitment or undercommitment. An agent will
only drop an intention if it believes that the intention has either been achieved, or
is unachievable.
A critique of Cohen and Levesque’s theory of intention is presented in [103];
space restrictions prevent a discussion here.
• A philosophical foundation
The BDI model is based on a widely respected theory of rational action in
humans, developed by the philosopher Michael Bratman [15].
• A software architecture
The BDI model of agency does not prescribe a specific implementation. The
model may be realized in many different ways, and indeed a number of
774 Chapter 16
Intuitively, an agent’s beliefs correspond to information the agent has about the
world. These beliefs may be incomplete or incorrect. An agent’s desires represent
states of affairs that the agent would, in an ideal world, wish to be brought about.
(Implemented BDI agents require that desires be consistent with one another, al-
though human desires often fail in this respect.) Finally, an agent’s intentions
represent desires that it has committed to achieving. The intuition is that an agent
will not, in general, be able to achieve all its desires, even if these desires are con-
sistent. Ultimately, an agent must therefore fix upon some subset of its desires and
commit resources to achieving them. These chosen desires, to which the agent has
some commitment, are intentions [22]. The BDI theory of human rational action
was originally developed by Michael Bratman [15]. It is a theory of practical
reasoning – the process of reasoning that we all go through in our everyday lives,
deciding moment by moment which action to perform next.
The BDI model has been implemented several times. Originally, it was real-
ized in IRMA, the intelligent resource-bounded machine architecture [17]. IRMA
was intended as a more or less direct realization of Bratman’s theory of practical
reasoning. However, the best-known implementation is the procedural reasoning
system (PRS) [37] and its many descendants [26, 33, 55, 87]. In the PRS, an agent
has data structures that explicitly correspond to beliefs, desires, and intentions. A
PRS agent’s beliefs are directly represented in the form of PROLOG -like facts [21,
p. 3]. Desires and intentions in PRS are realized through the use of a plan library.2
A plan library, as its name suggests, is a collection of plans. Each plan is a recipe
that can be used by the agent to achieve some particular state of affairs. A plan
in the PRS is characterized by a body and an invocation condition. The body of a
plan is a course of action that can be used by the agent to achieve some particular
state of affairs. The invocation condition of a plan defines the circumstances under
2 In this description of the PRS , we have modified the original terminology somewhat to be more
in line with contemporary usage; we have also simplified the control cycle of the PRS slightly.
Chapter 16 775
which the agent should “consider” the plan. Control in the PRS proceeds by the
agent continually updating its internal beliefs, and then looking to see which plans
have invocation conditions that correspond to these beliefs. The set of plans made
active in this way corresponds to the desires of the agent. Each desire defines a
possible course of action that the agent may follow. On each control cycle, the
PRS picks one of these desires, and pushes it onto an execution stack for subse-
quent execution. The execution stack contains desires that have been chosen by
the agent, and thus corresponds to the agent’s intentions.
The third and final aspect of the BDI model is the logical component, which
gives us a family of tools that allow us to reason about BDI agents. There have
been several versions of BDI logic, starting in 1991 and culminating in Rao and
Georgeff’s 1998 paper on systems of BDI logics [88, 90, 91, 92, 93, 94, 95]; a
book-length survey was published as [121]. We focus on [121].
Syntactically, BDI logics are essentially branching-time logics (CTL or CTL *,
depending on which version one is reading about), enhanced with additional
modal operators Bel, Des, and Intend, for capturing the beliefs, desires, and in-
tentions of agents, respectively. The BDI modalities are indexed with agents, so
for example the following is a legitimate formula of BDI logic:
This formula says that if i believes that j intends that p is inevitably true even-
tually, then i believes that j desires p is inevitable. Although they share much
in common with Cohen-Levesque’s intention logics, the first and most obvious
distinction between BDI logics and the Cohen-Levesque approach is the explicit
starting point of CTL-like branching-time logics. However, the differences are ac-
tually much more fundamental than this. The semantics that Rao and Georgeff
give to BDI modalities in their logics are based on the conventional apparatus of
Kripke structures and possible worlds. However, rather than assuming that worlds
are instantaneous states of the world, or even that they are linear sequences of
states, it is assumed instead that worlds are themselves branching temporal struc-
tures: thus each world can be viewed as a Kripke structure for a CTL-like logic.
While this tends to rather complicate the semantic machinery of the logic, it makes
it possible to define an interesting array of semantic properties, as we shall see be-
low.
Before proceeding, we summarize the key semantic structures in the logic.
Instantaneous states of the world are modeled by time points, given by a set T ;
the set of all possible evolutions of the system being modeled is given by a binary
relation R ⊆ T × T . A world (over T and R) is then a pair T , R
, where T ⊆ T
is a non-empty set of time points, and R ⊆ R is a branching-time structure on T .
Let W be the set of all worlds over T . A pair w,t
, where w = Tw , Rw
∈ W
776 Chapter 16
The primary focus of Rao and Georgeff’s early work was to explore the possible
interrelationships between beliefs, desires, and intentions from the perspective of
semantic characterization. In order to do this, they defined a number of possible
interrelationships between an agent’s belief, desire, and intention accessibility re-
lations. The most obvious relationships that can exist are whether one relation is a
subset of another: for example, if Dtw (i) ⊆ Itw (i) for all i, w,t, then we would have
as an interaction axiom (Intend i ϕ) → (Des i ϕ). However, the fact that worlds
themselves have structure in BDI logic also allows us to combine such properties
with relations on the structure of worlds themselves. The most obvious structural
relationship that can exist between two worlds – and the most important for our
purposes — is that of one world being a subworld of another. Intuitively, a world
w is said to be a subworld of world w if w has the same structure as w but has
fewer paths and is otherwise identical. Formally, if w, w are worlds, then w is a
subworld of w (written w 1 w ) iff paths(w) ⊆ paths(w ) but w, w agree on the
interpretation of predicates and constants in common time points.
The first property we consider is the structural subset relationship between
accessibility relations. We say that accessibility relation R is a structural subset of
accessibility relation R̄ if for every R-accessible world w, there is an R̄-accessible
world w such that w is a subworld of w . Formally, if R and R̄ are two accessibility
relations, then we write R ⊆sub R̄ to indicate that if w ∈ Rtw (i), then there exists
some w ∈ R̄tw (i) such that w 1 w . If R ⊆sub R̄, then we say R is a structural
subset of R̄.
We write R̄ ⊆sup R to indicate that if w ∈ Rtw (i), then there exists some w ∈
R̄t (i) such that w 1 w . If R ⊆sup R̄, then we say R is a structural superset of R̄.
w
In other words, if R is a structural superset of R̄, then for every R-accessible world
w, there is an R̄-accessible world w such that w is a subworld of w.
Chapter 16 777
Table 16.2: Systems of BDI logic. (Source: [90, p. 321].) In the first six rows, the
corresponding formulas of type A → B → C are shorthand for (A → B) ∧ (B → C).
2.3 Discussion
Undoubtedly, formalizing the informational and motivational attitudes in a context
with evolving time, or where agents can do actions, has greatly helped to improve
our understanding of complex systems. At the same time, admittedly, there are
many weaknesses and open problems with such approaches.
To give one example of how a formalization can help us to become more clear
about the interrelationship between the notions defined here, recall that Rao and
Georgeff assume the notion of belief-goal compatibility, saying
Goali ϕ → Bi ϕ
Bi ϕ → Goali ϕ
By analyzing the framework of Cohen and Levesque more closely, it appears that
they have a much weaker property in mind, which is
Goali ϕ → ¬Bi ¬ϕ
To mention just one aspect in which the approach mentioned here is still far from
completed, we recall that the three frameworks allow one to reason about many
agents, but are in essence still one-agent systems. Although notions such as dis-
tributed and common knowledge are well-understood epistemic notions in multi-
agent systems, their motivational analogues seem to be much harder and are yet
only partially understood (see Cohen and Levesque’s [23], and Tambe’s [105] or
Dunin-Kȩplicz and Verbrugge’s [28] on teamwork).
This formula says that if i believes valve 32 is open, then i should intend that j
believes valve 32 is open. A rational agent i with such an intention can select
a speech act to perform in order to inform j of this state of affairs. It should
be intuitively clear how a system specification might be constructed using such
formulae, to define the intended behavior of a system.
One of the main desirable features of a software specification language is that
it should not dictate how a specification should be satisfied by an implementation.
It should be clear that the specification above has exactly these properties. It does
not dictate how agent i should go about making j aware that valve 32 is open. We
simply expect i to behave as a rational agent given such an intention.
There are a number of problems with the use of such logics for specification.
The most worrying of these is with respect to their semantics. As we set out in
Section 1, the semantics for the modal operators (for beliefs, desires, and inten-
tions) are given in the normal modal logic tradition of possible worlds [19]. There
are several advantages to the possible worlds model: it is well studied and well un-
derstood, and the associated mathematics of correspondence theory is extremely
elegant. These attractive features make possible worlds the semantics of choice
for almost every researcher in formal agent theory. However, there are also a
number of serious drawbacks to possible worlds semantics. First, possible worlds
semantics imply that agents
• are logically perfect reasoners (in that their deductive capabilities are sound
and complete, this follows for instance from the axiom Kn2 and the rule
Nec that we gave in Figure 16.1 for knowledge, and, moreover, this axiom
and inference rule are part of any modal axiomatization of agent’s attitudes)
and
• have infinite resources available for reasoning (see axioms Kn1andKn2 and
rule Nec again).
2.4.2 Implementation
Once given a specification, we must implement a system that is correct with re-
spect to this specification. The next issue we consider is this move from abstract
specification to concrete computational system. There are at least two possibilities
for achieving this transformation that we consider here:
IMPACT are programmed by using rules that incorporate deontic modalities (per-
mitted, forbidden, obliged [73]). These rules can be interpreted to determine the
actions that an agent should perform at any given moment [104, p. 171].
Note that executing Concurrent M ETATE M agent specifications is possible
primarily because the models upon which the Concurrent M ETATE M temporal
logic is based are comparatively simple, with an obvious and intuitive computa-
tional interpretation. However, agent specification languages in general (e.g., the
BDI formalisms of Rao and Georgeff [89]) are based on considerably more com-
plex logics. In general, possible worlds semantics do not have a computational
interpretation in the way that Concurrent M ETATE M semantics do. Hence it is
not clear what “executing” a logic based on such semantics might mean.
In response to this issue, a number of researchers have attempted to develop
executable agent specification languages with a simplified logical basis that has a
computational interpretation. An example is Rao’s AGENT S PEAK (L) language,
which although essentially a BDI system, has a simple computational seman-
tics [87]. The 3APL project [44] is also an attempt to have an agent program-
ming language with a well-defined semantics, based on transition systems. One
advantage of having a thorough semantics is that it enables one to compare differ-
ent agent programming languages, such as AGENT S PEAK (L) with 3APL [43] or
AGENT0 with 3APL [45]. One complication in bridging the gap between the
agent programming paradigm and the formal systems of Sections 2.1–2.2 is that
the former usually takes goals to be procedural (a plan), whereas goals in the latter
are declarative (a desired state). A programming language that tries to bridge the
gap in this respect is the language GOAL [107].
GOLOG [64, 96] and its multiagent sibling CONGOLOG [63] represent another
rich seam of work on logic-oriented approaches to programming rational agents.
Essentially, GOLOG is a framework for executing a fragment of the situation calcu-
lus; the situation calculus is a well-known logical framework for reasoning about
action [70]. Put crudely, writing a GOLOG program involves expressing a logical
theory of what action an agent should perform, using the situation calculus; this
theory, together with some background axioms, represents a logical expression
of what it means for the agent to do the right action. Executing such a program
reduces to constructively solving a deductive proof problem, broadly along the
lines of showing that there is a sequence of actions representing an acceptable
computation according to the theory [96, p. 121]; the witness to this proof will be
a sequence of actions, which can then be executed.
2.4.3 Verification
Once we have developed a concrete system, we need to show that this system
is correct with respect to our original specification. This process is known as
verification, and it is particularly important if we have introduced any informality
into the development process. We can divide approaches to the verification of
systems into two broad classes: (1) axiomatic; and (2) semantic (model checking).
Axiomatic approaches to program verification were the first to enter the main-
stream of computer science, with the work of Hoare in the late 1960s [47]. Ax-
iomatic verification requires that we can take our concrete program, and from this
program systematically derive a logical theory that represents the behavior of the
program. Call this the program theory. If the program theory is expressed in the
same logical language as the original specification, then verification reduces to a
Chapter 16 783
problem. Proofs are hard enough, even in classical logic; the addition of temporal
and modal connectives to a logic makes the problem considerably harder. For this
reason, more efficient approaches to verification have been sought. One particu-
larly successful approach is that of model checking [20]. As the name suggests,
whereas axiomatic approaches generally rely on syntactic proof, model checking
approaches are based on the semantics of the specification language. The model
checking problem, in abstract, is quite simple: given a formula ϕ of language L,
and a model M for L, determine whether or not ϕ is valid in M, i.e., whether or
not M |=L ϕ. Model checking-based verification has been studied in connection
with temporal logic. The technique once again relies upon the close relationship
between models for temporal logic and finite-state machines. Suppose that ϕ is
the specification for some system, and π is a program that claims to implement
ϕ. Then, to determine whether or not π truly implements ϕ, we take π, and from
it generate a model Mπ that corresponds to π, in the sense that Mπ encodes all
the possible computations of π. We then determine whether or not Mπ |= ϕ, i.e.,
whether the specification formula ϕ is valid in Mπ ; the program π satisfies the
specification ϕ just in case the answer is “yes.” The main advantage of model
checking over axiomatic verification is in complexity: model checking using the
branching-time temporal logic CTL [20] can be done in polynomial time, whereas
the proof problem for most modal logics is quite complex.
In [94], Rao and Georgeff present an algorithm for model checking BDI logic.
More precisely, they give an algorithm for taking a logical model for their (propo-
sitional) BDI agent specification language, and a formula of the language, and
determining whether the formula is valid in the model. The technique is closely
based on model checking algorithms for normal modal logics [40]. They show
that despite the inclusion of three extra modalities (for beliefs, desires, and in-
tentions), into the CTL branching-time framework, the algorithm is still quite ef-
ficient, running in polynomial time. So the second step of the two-stage model
checking process described above can still be done efficiently. However, it is not
clear how the first step might be realized for BDI logics. Where does the logical
model characterizing an agent actually come from – can it be derived from an ar-
bitrary program π, as in mainstream computer science? To do this, we would need
to take a program implemented in, say, JAVA, and from it derive the belief, desire,
and intention accessibility relations that are used to give a semantics to the BDI
component of the logic. Because, as we noted earlier, there is no clear relationship
between the BDI logic and the concrete computational models used to implement
agents, it is not clear how such a model could be derived.
One approach to this problem was presented in [122], where an imperative
programming language called MABLE was presented, with an explicit BDI seman-
tics. Model checking for the language was implemented by mapping the language
Chapter 16 785
to the input language for the SPIN model checking system [54], and by reducing
formulae in a restricted BDI language to the linear temporal logic format required
by SPIN. Here, for example, is a sample claim that may be made about a MABLE
system, which may be automatically verified by model checking:
claim
[]
((believe agent2
(intend agent1
(believe agent2 (a == 10))))
->
<>(believe agent2 (a == 10))
);
This claim says that it is always ([]) the case that if agent 2 believes that agent
1 intends that agent 2 believes that variable a has the value 10, then subsequently
(<>), agent 2 will itself believe that a has the value 10. MABLE was developed
primarily as a testbed for exploring possible semantics for agent communication,
and was not intended for large-scale system verification.
Several model checkers for logics combining knowledge, time, and other
modalities have been developed in recent years. For example, using techniques
similar to those used for CTL model checkers [20], Raimondi and Lomuscio im-
plemented MCMAS, a model checker that supports a variety of epistemic, tempo-
ral, and deontic logics [67, 86]. Another recent approach to model checking multi-
agent systems is [48], which involves model checking temporal epistemic logics
by reducing the model checking problem to a conventional LTL model checking
problem.
Moore and Morgenstern also informed later attempts to integrate a logic of abil-
ity into more general logics of rational action in autonomous agents [121, 124]
(see [123] for a survey of such logics).
In a somewhat parallel thread of research, researchers in the philosophy of
action developed a range of logics underpinned by rather similar ideas and moti-
vations. A typical example is that of Brown, who developed a logic of individual
ability in the mid-1980s [18]. Brown’s main claim was that modal logic was a
useful tool for the analysis of ability, and that previous – unsuccessful – attempts
to characterize ability in modal logic were based on an oversimple semantics.
Brown’s account of the semantics of ability was as follows [18, p. 5]:
[An agent can achieve A] at a given world iff there exists a relevant
cluster of worlds, at every world of which A is true.
Notice the ∃∀ pattern of quantifiers in this account. Brown immediately noted that
this gave the resulting logic a rather unusual flavor, neither properly existential nor
properly universal [18, p. 5]:
Cast in this form, the truth condition [for ability] involves two met-
alinguistic quantifiers (one existential and one universal). In fact, [the
character of the ability operator] should be a little like each.
More recently, there has been a surge of interest in logics of strategic ability, which
has been sparked by two largely independent developments: Pauly’s development
of coalition logic [80, 81, 82, 83], and the development of the alternating-time
temporal logic (ATL) by Alur, Henzinger, and Kupferman [9, 27, 38]. Although
these logics are very closely related, the motivation and background of the two
systems is strikingly different.
agents (i.e., a subset of the grand coalition Ag), and ϕ is a sentence; the intended
reading is that “C can cooperate to ensure that ϕ”.
The semantics of cooperation modalities are given in terms of an effectivity
function, which defines for every coalition C the states that C can cooperate to
S
bring about. The effectivity function E : S → (2Ag → 22 ), gives for any state
t and coalition C a set of sets of end-states EC (t), with the intended meaning of
S ∈ EC (t) that C can enforce the outcome to be in S (although C may not be able to
pinpoint the exact outcome that emerges with this choice; this generally depends
on the choices of agents outside C, or “choices” made by the environment). This
effectivity function comes on a par with a modal operator [C] with truth definition
In words: coalition is effective for or can enforce ϕ if there is a set of states S that
it is effective for, i.e., which it can choose and which is exactly the denotation of
ϕ: S = [|ϕ|]. It seems reasonable to say that C is also effective for ϕ if it can choose
a set of states S that “just” guarantees ϕ, i.e., for which we have S ⊆ [|ϕ|]. This
will be taken care of by imposing monotonicity on effectivity functions: we will
discuss constraints on effectivity at the end of this section.
In games and other structures for cooperative and competitive reasoning, ef-
fectivity functions are convenient when one is interested in the outcomes of the
game or the encounter, and not so much about intermediate states, or how a cer-
tain state is reached. Effectivity is also a level in which one can decide whether
two interaction scenarios are the same. The two games G1 and G2 in Figure 16.4
are “abstract” in the sense that they do not lead to payoffs for the players but
rather to states that satisfy certain properties, encoded with propositional atoms p,
q, and u. Such atoms could refer to which player is winning, but also denote other
properties of an end-state, such as some distribution of resources, or “payments.”
Both games are two-player games: in G1, player A makes the first move, which it
chooses from L (Left) and R (Right). In that game, player E is allowed to choose
between l and r, respectively, but only if A plays R; otherwise the game ends after
one move in the state satisfying p. In game G2, both players have the same reper-
toire of choices, but the order in which the players choose is different. It looks as
if in G1, player A can hand over control to E, whereas the converse seems to be
true for G2. Moreover, in G2, the player that is not the initiator (i.e., player A)
will be allowed to make a choice, regardless of the choice of its opponent.
Despite all these differences between the two games, when we evaluate them
with respect to what each coalition can achieve, they are the same! To be a little
more precise, let us define the powers of a coalition in terms of effectivity func-
tions E. In game G1 , player A’s effectivity gives EA (ρ1 ) = {{a}, {c, d}}. Similarly,
player E’s effectivity yields {{a, c}, {a, c}}: E can enforce the game to end in a or
788 Chapter 16
G1 ρ1 A G 2 ρ2 E H ρ E
L R l r l r
E A A A A
p x y
a
l r L R L R l r l r
q u
p q p q q p p u
c d
Figure 16.4: Two games G1 and G2 that are the same in terms of effectivity. H is
an imperfect information game (see Section 3.3).
c (by playing l), but it can also enforce the end-state among a and d (by playing r).
Obviously, we also have E{A,E} (ρ1 ) = {{a}, {c}, {d}}: players A and E together
can enforce the game to end in any end-state. When reasoning about this, we have
to restrict ourselves to the properties that are true in those end-states. In coalition
logic, what we have just noted semantically would be described as:
G1 |= [A]p ∧ [A](q ∨ u) ∧ [E](p ∨ q) ∧ [E](p ∨ u) ∧ [A, E]p ∧ [A, E]q ∧ [A, E]r
Being equipped with the necessary machinery, it now is easy to see that the
game G2 verifies the same formula. Indeed, in terms of what propositions can
be achieved, we are in a similar situation as in the previous game: E is effective
for {p, q} (by playing l) and also for {p, u} (by playing r). Likewise, A is effective
for {p} (play L) and for {q, u} (play R). The alert reader will have recognized the
logical law (p ∧ (q ∨ r)) ≡ ((p ∧ q) ∨ (p ∧ u)) resembling the “equivalence” of the
two games: (p ∧ (q ∨ r)) corresponds to A’s power in G1 , and ((p ∧ q) ∨ (p ∧ u))
to A’s power in G2 . Similarly, the equivalence of E’s powers is reflected by the
logical equivalence (p ∨ (q ∧ r)) ≡ ((p ∨ q) ∧ (p ∨ u)).
At the same time, the reader will have recognized the two metalinguistic quan-
tifiers in the use of the effectivity function E, laid down in its truth-definition
above. A set of outcomes S is in EC iff for some choice of C, we will end up in
S, under all choices of the complement of C (the other agents). This notion of
so-called α-effectivity uses the ∃∀-order of the quantifiers: what a coalition can
establish through the truth-definition mentioned above, their α-ability, is some-
times also called ∃∀-ability. Implicit within the notion of α-ability is the fact that
C have no knowledge of the choice that the other agents make; they do not see
the choice of C (i.e., the complement of C), and then decide what to do, but rather
they must make their decision first. This motivates the notion of β-ability (i.e.,
“ ∀∃ ”-ability): coalition C is said to have the β-ability for ϕ if for every choice
Chapter 16 789
(⊥) ¬[C]⊥
(N) /
¬[0]¬ϕ → [Ag]ϕ
(M) [C](ϕ ∧ ψ) → [C]ψ
(S) ([C1 ]ϕ1 ∧ [C2 ]ϕ2 ) → [C1 ∪C2 ](ϕ1 ∧ ϕ2 )
where C1 ∩C2 = 0/
(MP) from ϕ and ϕ → ψ infer ψ
(Nec) from ϕ infer [C]ϕ
The first point to note is that we can naturally axiomatize these requirements using
coalition logic:
[A, B]x x ∈ {p, q}
¬[A, B](p ∧ q)
¬[x]p x ∈ {A, B}
¬[x]q x ∈ {A, B}
It should be immediately obvious how these axioms capture the requirements as
stated above. Now, given a particular voting procedure, a model checking algo-
rithm can be used to check whether or not this procedure implements the specifi-
cation correctly. Moreover, a constructive proof of satisfiability for these axioms
might be used to synthesize a procedure; or else to announce that no implementa-
tion exists.
(A ¬ f ail). CTL-like logics are of limited value for reasoning about multiagent
systems, in which system components (agents) cannot be assumed to be benev-
olent, but may have competing or conflicting goals. The kinds of properties we
wish to express of such systems are the powers that the system components have.
For example, we might wish to express the fact that “agents 1 and 2 can cooperate
to ensure that the system never enters a fail state.” It is not possible to capture
such statements using CTL-like logics. The best one can do is either state that
something will inevitably happen, or else that it may possibly happen: CTL-like
logics have no notion of agency.
Alur, Henzinger, and Kupferman developed ATL in an attempt to remedy this
deficiency. The key insight in ATL is that path quantifiers can be replaced by
cooperation modalities: the ATL expression C
ϕ, where C is a group of agents,
expresses the fact that the group C can cooperate to ensure that ϕ. (Thus the ATL
expression C
ϕ corresponds to Pauly’s [C]ϕ.) So, for example, the fact that
agents 1 and 2 can ensure that the system never enters a fail state may be captured
in ATL by the following formula: 1, 2
¬ f ail. An ATL formula true in the root
ρ1 of game G1 of Figure 16.4 is A
iE
iq: A has a strategy (i.e., play R
in ρ1 ) such that the next time, E has a strategy (play l) to enforce u.
Note that ATL generalizes CTL because the path quantifiers A (“on all
paths. . . ”) and E (“on some paths. . . ”) can be simulated in ATL by the coop-
eration modalities 0
/ (“the empty set of agents can cooperate to. . . ”) and Ag
(“the grand coalition of all agents can cooperate to. . . ”).
One reason for the interest in ATL is that it shares with its ancestor CTL the
computational tractability of its model checking problem [20]. This led to the
development of an ATL model checking system called MOCHA [7, 10]. With
MOCHA, one specifies a model against which a formula is to be checked, using
a model definition language called R EACTIVE M ODULES [8]. R EACTIVE M OD -
ULES is a guarded command language, which provides a number of mechanisms
for the structured specification of models, based upon the notion of a “module,”
which is basically the R EACTIVE S YSTEMS terminology for an agent. Interest-
ingly, however, it is ultimately necessary to define for every variable in a R EAC -
TIVE M ODULES system which module (i.e., agent) controls it. The powers of
agents and coalitions then derive from the ability to control these variables: and
this observation was a trigger for [53] to develop a system for propositional con-
trol, CL - PC, as a system in its own right. We will come briefly back to this idea in
Section 3.4.
ATL has begun to attract increasing attention as a formal system for the spec-
ification and verification of multiagent systems. Examples of such work include
formalizing the notion of role using ATL [99], the development of epistemic ex-
tensions to ATL [49, 50, 51], and the use of ATL for specifying and verifying
792 Chapter 16
S, q |= ;
S, q |= ¬ϕ iff S, q |= ϕ;
S, q |= ϕ ∨ ψ iff S, q |= ϕ or S, q |= ψ;
is required, and, on the other hand, actions may add to an agent’s knowledge.
We have already mentioned knowledge preconditions in Section 3. We can for-
mulate knowledge preconditions quite naturally using ATEL and its variants, and
the cooperation modality naturally and elegantly allows us to consider knowledge
preconditions for multiagent plans. The requirement that in order for an agent
a to be able to eventually bring about state of affairs ϕ, it must know ψ, might,
as a first attempt, be specified in ATEL as a
♦ϕ → Ka ψ. This intuitively says
that knowing ψ is a necessary requirement for having the ability to bring about ϕ.
However, this requirement is usually too strong. For instance, in order to be able
to ever open the safe, I don’t necessarily in general have to know the key right
now. A slightly better formulation might therefore be a
iϕ → Ka ψ. As an
overall constraint of the system, this property may help the agent to realize that
it has to possess the right knowledge in order to achieve ϕ. But taken as a local
formula, it does not tell us anything about what the agent should know if it wants
to bring about ϕ the day after tomorrow, or “sometime” for that matter. Taken
as a local constraint, a necessary knowledge condition to bring about ϕ might be
(¬i
iϕ) U Ki ψ. This expresses that our agent is not able to open the safe until
it knows its key. The other way around, an example of an ability that is generated
by possessing knowledge is the following, expressing that if Bob knows that the
combination of the safe is s, then he is able to open it (b
io), as long as the
combination remains unchanged.
Ki ψ → i iKi Ki ψ (16.3)
As a final example, in security protocols where agents i and j share some com-
mon secret (a key Si j , for instance), what one typically wants is (16.4), expressing
that i can send private information to j, without revealing the message to another
agent h:
3.4 CL-PC
Both ATL and coalition logic are intended as general purpose logics of cooperative
ability. In particular, neither has anything specific to say about the origin of the
powers that are possessed by agents and the coalitions of which they are a mem-
ber. These powers are just assumed to be implicitly defined within the effectivity
structures used to give a semantics to the languages. Of course, if we give a spe-
cific interpretation to these effectivity structures, then we will end up with a logic
with special properties. In [53], a variation of coalition logic was developed that
was intended specifically to reason about control scenarios, as follows. The basic
idea is that the overall state of a system is characterized by a finite set of variables,
which for simplicity are assumed to take Boolean values. Each agent in the sys-
tem is then assumed to control some (possibly empty) subset of the overall set of
variables, with every variable being under the control of exactly one agent. Given
this setting, in the coalition logic of propositional control (CL - PC), the operator
♦C ϕ means that there exists some assignment of values that the coalition C can
give to the variables under its control such that, assuming everything else in the
system remains unchanged, then if they make this assignment, then ϕ would be
true. The box dual C ϕ is defined in the usual way with respect to the diamond
ability operator ♦C . Here is a simple example:
Suppose the current state of the system is that variables p and q are
Chapter 16 797
false, while variable r is true, and further suppose that agent 1 con-
trols p and r, while agent 2 controls q. Then in this state, we have
for example: ♦1 (p ∧ r), ¬♦1 q, and ♦2 (q ∧ r). Moreover, for any
satisfiable propositional logic formula ψ over the variables p, q, and
r, we have ♦1,2 ψ.
ˆ ♦C
C
α = C̄ ϕ
Notice the quantifier alternation pattern ∃∀ in this definition, and compare this to
our discussion regarding α- and β-effectivity on page 788.
One of the interesting aspects of CL - PC is that by using this logic, it becomes
possible to explicitly reason in the object language about who controls what. Let
i be an agent, and let p be a system variable; let us define ctrl(i, p) as follows:
ˆ (♦i p) ∧ (♦i¬p)
ctrl(i, p) =
Thus, ctrl(i, p) means that i can assign p the value true, and i can also assign
p the value false. It is easy to see that if ctrl(i, p) is true in a system, then
this means that the variable p must be under the control of agent i. Starting
from this observation, a more detailed analysis of characterizing control of ar-
bitrary formulae was developed, in terms of the variables controlled by individual
agents [53]. In addition, [53] gives a complete axiomatization of CL - PC, and
shows that the model checking and satisfiability problems for the logic are both
PSPACE -complete. Building on this basic formalism, [52] investigates extensions
into the possibility of dynamic control, where variables can be “passed” from one
agent to another.
There is currently a flurry of activity in logics to reason about games (see [110]
for an overview paper) and modal logics for social choice (see [24] for an exam-
ple). Often these are logics that refer to the information of the agents (“players,”
in the case of games), and their actions (“moves” and “choices,” respectively).
The logics for such scenarios are composed from the building blocks described
in this chapter, with often an added logical representation of preferences [106] or
expected utility [59].
5 Exercises
1. Level 1 S5 Axioms:
(a) To what extent do you believe that each of the 5 axioms for knowledge
captures realistic properties of knowledge as we understand it in its
everyday sense?
(b) Now consider a god-like omniscient entity, able to reason perfectly,
although with an incomplete view of its environment (i.e., not able to
completely perceive its environment). To what extent do the axioms
make sense for such “idealized” reasoners?
(a) A ip ∧ E i¬q
(b) Ap U (q ∧ ¬r)
(c) (A p) ∧ (E♦¬q)
(a) Argue that for temporal epistemic logic, perfect recall implies that
Ki ϕ → Ki ϕ. Hint: use the fact that ψ is equivalent to
i
(ψ ∧ (ϕ ∧ ψ).
(b) What would be wrong if we replaced perfect recall with Ki ϕ →
iK ϕ? Hint: think about statements that refer to “now,” or state-
i
ments that refer to ignorance, i.e., ¬Ki ψ. See also [31, p. 130].
(c) Discuss the converse of perfect recall, i.e., iKi ϕ → Ki iϕ. What
does it express? Discuss situations where this makes sense, and situa-
tions where it does not apply.
800 Chapter 16
References
[1] T. Ågotnes, W. van der Hoek, J. A. Rodríguez-Aguilar, C. Sierra, and M.
Wooldridge. Multi-modal CTL: Completeness, complexity, and an application.
Studia Logica, 92(1):1–26, 2009.
[2] T. Ågotnes, W. van der Hoek, and M. Wooldridge. On the logic of coalitional
games. In Proceedings of the Fifth International Joint Conference on Autonomous
Agents and Multiagent Systems (AAMAS-2006), Hakodate, Japan, 2006.
[3] T. Ågotnes, W. van der Hoek, and M. Wooldridge. Temporal qualitative coalitional
games. In Proceedings of the Fifth International Joint Conference on Autonomous
Agents and Multiagent Systems (AAMAS-2006), Hakodate, Japan, 2006.
[4] T. Ågotnes, W. van der Hoek, and M. Wooldridge. Quantified coalition logic. In
Proceedings of the Twentieth International Joint Conference on Artificial Intelli-
gence (IJCAI-07), Hyderabad, India, 2007.
[5] H. Aldewereld, W. van der Hoek, and J.-J. Ch. Meyer. Rational teams: Logical
aspects of multi-agent systems. Fundamenta Informaticae, 63(2-3):159–183, 2004.
[6] J. F. Allen, J. Hendler, and A. Tate, editors. Readings in Planning. Morgan Kauf-
mann Publishers: San Mateo, CA, 1990.
[8] R. Alur and T. A. Henzinger. Reactive modules. Formal Methods in System Design,
15(11):7–48, July 1999.
[14] P. Blackburn, J. van Benthem, and F. Wolter, editors. Handbook of Modal Logic.
Elsevier, Amsterdam, 2006. To appear.
[15] M. E. Bratman. Intention, Plans, and Practical Reason. Harvard University Press:
Cambridge, MA, 1987.
[20] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. The MIT Press:
Cambridge, MA, 2000.
[24] Tijmen R. Daniëls. Social choice and the logic of simple. Journal of Logic and
Computation, 2010.
[25] D. C. Dennett. The Intentional Stance. The MIT Press: Cambridge, MA, 1987.
[29] E. A. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook
of Theoretical Computer Science Volume B: Formal Models and Semantics, pages
996–1072. Elsevier Science Publishers B.V.: Amsterdam, The Netherlands, 1990.
[32] R. E. Fikes and N. Nilsson. STRIPS: A new approach to the application of theorem
proving to problem solving. Artificial Intelligence, 2:189–208, 1971.
[34] M. Fisher. A survey of Concurrent M ETATE M — the language and its applications.
In D. M. Gabbay and H. J. Ohlbach, editors, Temporal Logic — Proceedings of
the First International Conference (LNAI Volume 827), pages 480–505. Springer-
Verlag: Berlin, Germany, July 1994.
[38] V. Goranko. Coalition games and alternating temporal logics. In J. van Benthem,
editor, Proceeding of the Eighth Conference on Theoretical Aspects of Rationality
and Knowledge (TARK VIII), pages 259–272, Siena, Italy, 2001.
[40] J. Y. Halpern and M. Y. Vardi. Model checking versus theorem proving: A mani-
festo. In V. Lifschitz, editor, AI and Mathematical Theory of Computation — Pa-
pers in Honor of John McCarthy, pages 151–176. The Academic Press: London,
England, 1991.
[41] D. Harel. First-Order Dynamic Logic (LNCS Volume 68). Springer-Verlag: Berlin,
Germany, 1979.
[42] D. Harel, D. Kozen, and J. Tiuryn. Dynamic Logic. The MIT Press: Cambridge,
MA, 2000.
[43] K. Hindriks, F. S. de Boer, W. van der Hoek, and J.-J. Ch. Meyer. A formal embed-
ding of AgentSpeak(L) in 3APL. In G. Antoniou and J. Slaney, editors, Advanced
Topics in Artificial Intelligence, number 1502 in LNAI, pages 155–166. Springer,
1998.
[44] K. V. Hindriks, F. S. de Boer, W. van der Hoek, and J.-J. Ch. Meyer. Agent pro-
gramming in 3APL. Autonomous Agents and Multi-Agent Systems, 2(4):357–402,
1999.
[45] K. V. Hindriks, F. S. de Boer, W. van der Hoek, and J.-J. Ch. Meyer. A formal se-
mantics for the core of AGENT-0. In E. Postma and M. Gyssens, editors, Proceed-
ings of the Eleventh Belgium-Netherlands Conference on Artificial Intelligence,
pages 27–34. 1999.
[46] J. Hintikka. Knowledge and Belief. Cornell University Press: Ithaca, NY, 1962.
[48] W. van der Hoek and M. Wooldridge. Model checking knowledge and time. In
D. Bos̆nac̆ki and S. Leue, editors, Model Checking Software, Proceedings of SPIN
2002 (LNCS Volume 2318), pages 95–111. Springer-Verlag: Berlin, Germany,
2002.
[49] W. van der Hoek and M. Wooldridge. Tractable multiagent planning for epistemic
goals. In Proceedings of the First International Joint Conference on Autonomous
Agents and Multiagent Systems (AAMAS-2002), pages 1167–1174, Bologna, Italy,
2002.
[50] W. van der Hoek and M. Wooldridge. Model checking cooperation, knowledge, and
time — a case study. Research in Economics, 57(3):235–265, September 2003.
[51] W. van der Hoek and M. Wooldridge. Time, knowledge, and cooperation:
Alternating-time temporal epistemic logic and its applications. Studia Logica,
75(1):125–157, 2003.
804 Chapter 16
[52] W. van der Hoek and M. Wooldridge. On the dynamics of delegation, cooperation,
and control: A logical account. In Proceedings of the Fourth International Joint
Conference on Autonomous Agents and Multiagent Systems (AAMAS-2005), pages
701–708, Utrecht, The Netherlands, 2005.
[53] W. van der Hoek and M. Wooldridge. On the logic of cooperation and propositional
control. Artificial Intelligence, 164(1-2):81–119, May 2005.
[54] G. Holzmann. The Spin model checker. IEEE Transactions on Software Engineer-
ing, 23(5):279–295, May 1997.
[56] U. Hustadt, C. Dixon, R. A. Schmidt, M. Fisher, J.-J. Ch. Meyer, and W. van der
Hoek. Verification with the KARO agent theory (extended abstract). In J. L. Rash,
C. A. Rouff, W. Truszkowski, D. Gordon, and M. G. Hinchey, editors, Procs For-
mal Approaches to Agent-Based Systems, FAABS 2000, number 1871 in LNAI,
pages 33–47, 2001.
[57] Thomas Icard, Eric Pacuit, and Yoav Shoham. Joint revision of beliefs and inten-
tion. In KR’10, pages 572–574, 2010.
[58] W. Jamroga and W. van der Hoek. Agents that know how to play. Fundamenta
Informaticae, 63(2-3):185–219, 2004.
[59] Wojciech Jamroga. A temporal logic for Markov chains. In Padgham, Parkes,
Müller, and Parsons, editors, Proc. of 7th Int. Conf. on Autonomous Agents and
Multi-Agent Systems (AAMAS 2008), pages 697–704, 2008.
[60] Wojciech Jamroga and Thomas Ågotnes. Constructive knowledge: What agents
can achieve under incomplete information. Journal of Applied Non-Classical Log-
ics, 17(4):423–475, 2007.
Müller, and M. Tambe, editors, Intelligent Agents II (LNAI Volume 1037), pages
331–346. Springer-Verlag: Berlin, Germany, 1996.
[64] H. Levesque, R. Reiter, Y. Lespérance, F. Lin, and R. Scherl. Golog: A logic pro-
gramming language for dynamic domains. Journal of Logic Programming, 31:59–
84, 1996.
[67] A. Lomuscio and F. Raimondi. MCMAS: a tool for verifying multi-agent sys-
tems. In Proceedings of The Twelfth International Conference on Tools and Al-
gorithms for the Construction and Analysis of Systems (TACAS-2006). Springer-
Verlag: Berlin, Germany, 2006.
[68] A. Lomuscio and M. Sergot. Deontic interpreted systems. Studia Logica, 75(1):63–
92, 2003.
[70] J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of
artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence
4, pages 463–502. Edinburgh University Press, 1969.
[71] J.-J. Ch. Meyer, F. S. de Boer, R. M. van Eijk, K. V. Hindriks, and W. van der Hoek.
On programming KARO agents. Logic Journal of the IGPL, 9(2):245–256, 2001.
[72] J.-J. Ch. Meyer and W. van der Hoek. Epistemic Logic for AI and Computer Sci-
ence. Cambridge University Press: Cambridge, England, 1995.
[73] J.-J. Ch. Meyer and R. J. Wieringa, editors. Deontic Logic in Computer Science —
Normative System Specification. John Wiley & Sons, 1993.
[75] R. C. Moore. Reasoning about knowledge and action. In Proceedings of the Fifth
International Joint Conference on Artificial Intelligence (IJCAI-77), Cambridge,
MA, 1977.
806 Chapter 16
[79] M. J. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press:
Cambridge, MA, 1994.
[80] M. Pauly. Logic for Social Software. PhD thesis, University of Amsterdam, 2001.
ILLC Dissertation Series 2001-10.
[82] M. Pauly. A modal logic for coalitional power in games. Journal of Logic and
Computation, 12(1):149–166, 2002.
[84] A. Pnueli. The temporal logic of programs. In Proceedings of the Eighteenth IEEE
Symposium on the Foundations of Computer Science, pages 46–57, 1977.
[87] A. S. Rao. AgentSpeak(L): BDI agents speak out in a logical computable lan-
guage. In W. Van de Velde and J. W. Perram, editors, Agents Breaking Away: Pro-
ceedings of the Seventh European Workshop on Modelling Autonomous Agents in
a Multi-Agent World, (LNAI Volume 1038), pages 42–55. Springer-Verlag: Berlin,
Germany, 1996.
Chapter 16 807
[89] A. S. Rao and M. Georgeff. BDI Agents: from theory to practice. In Proceedings
of the First International Conference on Multi-Agent Systems (ICMAS-95), pages
312–319, San Francisco, CA, June 1995.
[90] A. S. Rao and M. Georgeff. Decision procedures for BDI logics. Journal of Logic
and Computation, 8(3):293–344, 1998.
[91] A. S. Rao and M. P. Georgeff. Asymmetry thesis and side-effect problems in linear-
time and branching-time intention logics. In Proceedings of the Twelfth Interna-
tional Joint Conference on Artificial Intelligence (IJCAI-91), pages 498–504, Syd-
ney, Australia, 1991.
[96] R. Reiter. Knowledge in Action. The MIT Press: Cambridge, MA, 2001.
[97] S. Rosenschein and L. P. Kaelbling. The synthesis of digital machines with prov-
able epistemic properties. In J. Y. Halpern, editor, Proceedings of the 1986 Confer-
ence on Theoretical Aspects of Reasoning About Knowledge, pages 83–98. Morgan
Kaufmann Publishers: San Mateo, CA, 1986.
[99] M. Ryan and P.-Y. Schobbens. Agents and roles: Refinement in alternating-time
temporal logic. In J.-J. Ch. Meyer and M. Tambe, editors, Intelligent Agents VIII:
Proceedings of the Eighth International Workshop on Agent Theories, Architec-
tures, and Languages, ATAL-2001 (LNAI Volume 2333), pages 100–114, 2002.
[102] Y. Shoham and M. Tennenholtz. On the synthesis of useful social laws for artifi-
cial agent societies. In Proceedings of the Tenth National Conference on Artificial
Intelligence (AAAI-92), San Diego, CA, 1992.
[106] J. van Benthem, Patrick Girard, and Olivier Roy. Everything else being equal:
A modal logic approach to ceteris paribus preferences. Journal of Philosophical
Logic, 38:83–125, 2009.
[107] W. van der Hoek, K. V. Hindriks, F. S. de Boer, and J.-J. Ch. Meyer. Agent pro-
gramming with declarative goals. In C. Castelfranchi and Y. Lespérance, editors,
Intelligent Agents VII, Proceedings of the 6th Workshop on Agent Theories, Archi-
tectures, and Languages (ATAL), number 1986 in LNAI, pages 228–243, 2001.
[108] W. van der Hoek, W. Jamroga, and M. Wooldridge. A logic for strategic reasoning.
In Proceedings of the Fourth International Joint Conference on Autonomous Agents
and Multiagent Systems (AAMAS-2005), pages 157–153, Utrecht, The Nether-
lands, 2005.
[109] W. van der Hoek, W. Jamroga, and M. Wooldridge. Towards a theory of intention
revision. Synthese, 155(2):265–290, 2007.
[110] W. van der Hoek and M. Pauly. Modal logic for games and information. In P. Black-
burn, J. van Benthem, and F. Wolter, editors, Handbook of Modal Logic, pages
1077–1148. Elsevier, Amsterdam, 2006.
Chapter 16 809
[111] W. van der Hoek and M. Wooldridge. Multi-agent systems. In F. van Harmelen,
V. Lifschitz, and B. Porter, editors, Handbook of Knowledge Representation, pages
887–928. Elsevier, 2008.
[112] Leendert van der Torre. Contextual deontic logic: Normative agents, violations and
independence. Annals of Mathematics and Artificial Intelligence, 37:33–63, 2001.
[113] H. van Ditmarsch, W. van der Hoek, and B. Kooi. Dynamic Epistemic Logic.
Springer, Berlin, 2007.
[114] H. P. van Ditmarsch and B. Kooi. The secret of my success. Synthese, 151(2):201–
232, 2006.
[115] B. van Linder, W. van der Hoek, and J. J. Ch. Meyer. Formalizing abilities and
opportunities of agents. Fundameta Informaticae, 34(1,2):53–101, 1998.
[116] M. Y. Vardi. Branching vs. linear time: Final showdown. In T. Margaria and
W. Yi, editors, Proceedings of the 2001 Conference on Tools and Algorithms for the
Construction and Analysis of Systems, TACAS 2001 (LNCS Volume 2031), pages
1–22. Springer-Verlag: Berlin, Germany, April 2001.
[119] Gerhard Weiss and Peter Stone, editors. Knowing How to Play: Uniform Choices
in Logics of Agency. ACM Press, 2006.
[121] M. Wooldridge. Reasoning about Rational Agents. The MIT Press: Cambridge,
MA, 2000.
[122] M. Wooldridge, M.-P. Huget, M. Fisher, and S. Parsons. Model checking multi-
agent systems: The MABLE language and its applications. International Journal
on Artificial Intelligence Tools, 15(2):195–225, April 2006.
[123] M. Wooldridge and N. R. Jennings. Intelligent agents: Theory and practice. The
Knowledge Engineering Review, 10(2):115–152, 1995.
[125] M. Wooldridge and W. van der Hoek. On obligations and normative ability. Journal
of Applied Logic, 3:396–420, 2005.
810 Chapter 16
Game-Theoretic Foundations of
Multiagent Systems
1 Introduction
Multiagent systems can be roughly grouped into two categories: the cooperative
systems, where all the agents share a common goal and fully cooperate in order
to achieve it, and the non-cooperative systems, where each agent has its own de-
sires and preferences, which may conflict with those of other agents. The former
situation typically occurs when all agents are controlled by a single owner, which
might be the case in multirobot exploration or search-and-rescue missions. In con-
trast, the latter situation is more likely to occur when agents have different owners.
This is the case, for instance, in e-commerce settings, where agents represent dif-
ferent participants in an electronic marketplace, and all participants are trying to
maximize their own utility.
Even with cooperative agents, ensuring smooth functioning of a multiagent
system is not easy due to a variety of factors, ranging from unreliable communi-
cation channels to computational constraints. Going one step further and allowing
for non-cooperative agents adds a new dimension to the complexity of this prob-
lem, as the agents need to be incentivized to choose a desirable plan of action. This
difficulty is usually addressed by using the toolbox of game theory. Game theory
is a branch of mathematical economics that models and analyzes the behavior of
entities that have preferences over possible outcomes, and have to choose actions
in order to implement these outcomes. Thus, it is perfectly suited to provide a
812 Chapter 17
theoretical foundation for the analysis of multiagent systems that are composed of
self-interested agents.
In this chapter, we give a brief overview of the foundations of game theory.
We provide formal definitions of the basic concepts of game theory and illustrate
their use by intuitive examples; for a more detailed motivation of the underlying
theory and more elaborate examples and applications the reader is referred to [13],
or, for a multiagent perspective, to [18]. Further reading suggestions are listed in
Section 5.
We start by discussing normal-form games, i.e., games where all players have
complete information about each others’ preferences, and choose their actions si-
multaneously (Section 2). Then, we consider games where agents act sequentially
(Section 3). Finally, we explore the effects of randomness and incomplete infor-
mation (Section 4).
2 Normal-Form Games
In game theory, a game is an interaction among multiple self-interested agents.
To begin our exposition, we focus first on games where all participants know each
others’ likes and dislikes, and select their actions simultaneously. To describe
such a setting, we need to specify the following components:
• The set of agents, or players, that are involved in the game.
• For each agent, the set of actions, or strategies, available to this agent. We
refer to a vector of chosen actions (one for each agent) as an action profile.
• The set of possible outcomes, i.e., the results of collective actions: for now,
we assume that the outcomes are deterministic, i.e., are uniquely determined
by the actions selected by all agents.
• For each agent, a payoff function, which assigns a numeric value (this
agent’s “happiness”) to each outcome.
We limit ourselves to games with a finite number of players, though games with
a continuum of players have been considered in the literature as well, e.g., in the
context of traffic routing [16]. However, we do not assume that the players’ sets
of actions are finite.
It is assumed that all agents simultaneously choose their actions from their
respective sets of available actions; these actions determine the outcome, which,
in turn, determines the agents’ payoffs. Since the outcomes are uniquely deter-
mined by action profiles, to simplify notation, we can omit the outcomes from
the description of the game, and define payoff functions as mappings from action
Chapter 17 813
profiles to real numbers. Clearly, each agent would like to select an action that
maximizes its payoff; however, the optimal choice of action may depend on other
agents’ choices.
More formally, a normal-form game is defined as follows:
Definition 17.1 A normal-form game is a tuple G = N, (Ai )i∈N , (ui )i∈N
, where
N = {1, . . . , n} is the set of agents and for each i ∈ N, Ai is the set of actions,
or strategies, available to agent i and ui : × j∈N A j → R is the payoff function of
agent i, which assigns a numeric payoff to every action profile a = (a1 , . . . , an ) ∈
A1 × . . . × An .
We will often have to reason about the actions of one player while keeping the
actions of all other players fixed. For this reason it will be convenient to have a
special notation for the vector of actions of all players other than player i. Thus,
given an action profile a = (a1 , . . . , an ) and a player i ∈ N, we will denote by a−i
the vector (a1 , . . . , ai−1 , ai+1 , . . . , an ) ∈ × j∈N, j=i A j . We will also write (a−i , b) to
denote the action profile (a1 , . . . , ai−1 , b, ai+1 , . . . , an ), where b is some action in
Ai . Finally, we set A = A1 × . . . × An , and A−i = A1 × . . . × Ai−1 × Ai+1 × . . . × An .
We will now illustrate the notions we have just introduced with a simple, but
instructive example; the game described in Example 17.1 is one of the most well-
known normal-form games.
S C
S (−1, −1) (−4, 0)
C (0, −4) (−3, −3)
player); and the cell at the intersection of the i-th row and the j-th column contains
a pair (x, y), where x is the first player’s payoff at the action profile (ai , a j ), and
y is the second player’s payoff at this action profile. (You can also think of this
matrix as a combination of two matrices, one for each player.) For instance, for
the Prisoner’s Dilemma example considered above, the payoff matrix is given by
Table 17.1.
Now, the main question studied by game theory is how to predict the outcome
of a game, i.e., how to determine what strategies the players are going to choose.
Such predictions are known as solution concepts. We will now discuss several
classic solution concepts.
Definition 17.2 Given a normal-form game G = N, (Ai )i∈N , (ui )i∈N
and a pair
of actions ai , ai ∈ Ai , the action ai is said to weakly dominate ai if
for every a−i ∈ A−i , and the inequality in (17.1) is strict for at least one a−i ∈ A−i .
An action ai is said to strictly dominate ai if the inequality in (17.1) is strict for
every a−i ∈ A−i .
A strategy ai of agent i is said to be weakly/strictly dominant if it weak-
ly/strictly dominates all other strategies of that agent. Similarly, a strategy ai of
Chapter 17 815
F M
F (1, 3) (0, 0)
M (0, 0) (3, 1)
Table 17.2: The payoff matrix for the Battle of the Sexes.
Example 17.2 (Battle of the Sexes) Alice and Bob would like to spend the
evening together. Each of them has to choose between a football game (F) and a
movie (M). Bob prefers football to the movies, Alice prefers movies to the football
game, but both of them have a strong preference for spending the evening together.
Thus, if they end up choosing different activities, the evening is ruined for both of
them, so each of them gets a payoff of 0. If they both choose football, Bob’s payoff
is 3, and Alice’s payoff is 1. If they both choose the movie, Alice’s payoff is 3,
and Bob’s payoff is 1. This game can be represented by the payoff matrix given in
Table 17.2, where Alice is the row player, and Bob is the column player.
In Example 17.2, neither player has a (weakly) dominant strategy: Alice
prefers M to F when Bob chooses M, but she prefers F to M when Bob chooses F.
Indeed, both outcomes (F, F) and (M, M) appear reasonable in this situation. On
the other hand, common sense suggests that the outcome (M, F) is less plausible
than either (F, F) or (M, M): indeed, if the outcome is (M, F), Alice’s payoff is 0,
but she can unilaterally change her action to F, thereby increasing her payoff to
1. Similarly, Bob can unilaterally increase his payoff from 0 to 1 by changing his
action from F to M.
We will now try to formalize the intuition that allows us to say that in this
816 Chapter 17
game, (M, M) is more plausible than (F, M), so as to apply it to a wider range of
settings.
Definition 17.3 Given a normal-form game G = N, (Ai )i∈N , (ui )i∈N
, a strategy
profile a = (a1 , . . . , an ) is said to be a Nash equilibrium if for every agent i ∈ A
and for every action ai ∈ Ai we have
i.e., no agent can unilaterally increase its payoff by changing its action.
We say that an action a is a best response of player i to a strategy profile a−i
if ui (a−i , a) ≥ ui (a−i , a ) for every a ∈ Ai . Thus, a Nash equilibrium is a strat-
egy profile in which each player’s action is a best response to the other players’
actions.
It is immediate that in the Battle of the Sexes game (M, M) and (F, F) are Nash
equilibria, but (M, F) and (F, M) are not.
We will now analyze another normal-form game. Unlike the examples that we
have considered so far, it has more than two players.
H T
H (1, −1) (−1, 1)
T (−1, 1) (1, −1)
contribute to building the bridge is a Nash equilibrium, and so is the profile where
none of the players contributes. We leave it as an exercise for the reader to show
that this game does not have any other Nash equilibria.
Nash equilibrium is perhaps the most prevalent solution concept in game the-
ory. However, it has a few undesirable properties. First, playing a Nash equilib-
rium strategy is only rational if all other agents are playing according to the same
Nash equilibrium; while this assumption is reasonable if other agents are known
to be rational, in many real-life scenarios the rationality of other agents cannot be
assumed as given. In contrast, if an agent has a dominant strategy, it does not need
to assume anything about other agents. Assuming that all other agents act accord-
ing to a fixed Nash equilibrium is especially problematic if the Nash equilibrium
is not unique. This is the case, for instance, in the Battle of the Sexes game: if Al-
ice and Bob have to choose their actions simultaneously and independently, there
is no obvious way for them to choose between (M, M) and (F, F). Further, there
are games that do not have a Nash equilibrium, as illustrated in the next example.
Example 17.4 (Matching Pennies) Consider a 2-player game where each of the
players has a 1p coin, and has to place it on the table so that the upper side is
either heads (H) or tails (T ). If both coins face the same way (H, H or T, T ), the
first player (“matcher”) takes both coins, and hence wins 1p. Otherwise, (i.e., if
the outcome is H, T or T, H), the second player (“mismatcher”) takes both coins.
This game corresponds to the payoff matrix given in Table 17.3.
It is not hard to check that in this game none of the action profiles is a Nash
equilibrium: if the two players choose the same action, the second player can
deviate and change its payoff from −1 to 1, and if the two players choose different
actions, the first player can profitably deviate.
However, the game in Example 17.4 does have a stable state if we allow the
players to randomize. In the next section, we will discuss the effects of random-
ization in normal-form games.
818 Chapter 17
Definition 17.4 Given a normal-form game G = N, (Ai )i∈N , (ui )i∈N
in which
each agent’s set of actions is finite (without loss of generality, we can assume that
each agent has exactly m strategies available to it), a mixed strategy of an agent i
with a set of actions Ai = {a1i , . . . , ami } is a probability distribution over Ai , i.e., a
j
vector si = (si , . . . , si ) that satisfies si ≥ 0 for all j = 1, . . . , m, s1i + · · · + sm
1 m
i = 1.
A mixed strategy profile is a vector (s1 , . . . , sn ) of mixed strategies (one for each
agent).
The support supp(si ) of a mixed strategy si is the set of all actions that are
j j
assigned non-zero probability under si : supp(si ) = {ai | si > 0}.
Given a mixed strategy profile (s1 , . . . , sn ), the expected payoff of player i is
computed as
Ui (s1 , . . . , sn ) =
i
∑ si11 . . . sinn ui (ai11 , . . . , ainn ). (17.3)
(a11 ,...,ainn )∈A
We are now ready to define our next solution concept, namely, the mixed Nash
equilibrium.
Definition 17.5 Given a normal-form game G = N, (Ai )i∈N , (ui )i∈N
, a mixed
strategy profile (s1 , . . . , sn ) is a mixed Nash equilibrium if no agent can improve
its expected payoff by changing its strategy, i.e.,
Ui (s1 , . . . , sn ) ≥ Ui (s1 , . . . , si , . . . sn )
for every agent i ∈ N and every mixed strategy si of player i.
j j
Observe that an action ai corresponds to a mixed strategy si given by si = 1,
si = 0 for = j; we will refer to strategies of this form as pure strategies of
player i, and the notion of Nash equilibrium defined in Section 2.2 as pure Nash
equilibrium.
While the notion of mixed Nash equilibrium suffers from many of the same
conceptual problems as the pure Nash equilibrium, as well as some additional
ones, it has the following attractive property:
Theorem 17.1 Any normal-form game with a finite number of players and a finite
number of strategies for each player admits a Nash equilibrium in mixed strate-
gies.
Chapter 17 819
This result was proved by John Nash in his PhD thesis [11], and is one of the
cornerstones of modern game theory.
Going back to Example 17.4, we can verify that the mixed strategy profile
(s1 , s2 ), where s1 = s2 = (1/2, 1/2), is a mixed Nash equilibrium for the Matching
Pennies game; it can be seen that it is the only mixed Nash equilibrium of this
game.
We will now list several important properties of mixed Nash equilibria that can
be used to simplify the computation of the equilibrium strategies; for the proofs,
the reader is referred to [13].
for every mixed strategy si of player i. Hence for any two actions ai , a j ∈
supp(si ), we have
From this, we can derive that s12 = 3s22 ; together with s12 + s22 = 1, this implies that
s12 = 3/4, s22 = 1/4. Similarly, we can derive that if s12 = 0, s22 = 0, then s11 = 1/4,
s21 = 3/4. Thus, the Battle of the Sexes game admits a mixed Nash equilibrium in
which each player chooses its preferred action with probability 3/4.
Example 17.5 Consider two players X and Y who are engaged in the following
game: each player needs to name an integer between 1 and 10, and a player wins
(and gets a payoff of 1) if its number is closer to half of the average of the two
numbers; if both players name the same number, none of them wins. For instance,
if player X names nX = 9 and player Y names nY = 5, then Y wins, since 5 is
closer to 3.5 than 9 is.
Player X can reason as follows. Half of the average of the two numbers never
exceeds 5, so for player Y setting nY = 10 is strictly dominated by setting nY = 1.
Thus, it can assume that player Y never chooses nY = 10. The same argument
works for player X itself. Thus, both players can assume that the action space
is, in fact, limited to all integers between 1 and 9. Hence, half of the average of
Chapter 17 821
L R
L R
T (4, 1) (0, 1)
T (2, 2) (3, 1)
C (0, 2) (4, 0)
B (1, 3) (3, 3)
B (1, 4) (1, 5)
(a) Nash equilibrium in weakly
dominated strategies (b) Domination by a mixed
strategy
the two numbers does not exceed 4.5, and therefore for both players choosing 9
is strictly dominated by choosing 1. By repeating this argument, we can conclude
that the only pair of strategies that cannot be eliminated in this way is (1, 1). Note
that under this strategy profile neither of the players wins.
It can be shown that any pure Nash equilibrium will always survive iterated
elimination of strictly dominated strategies. For instance, in Example 17.5 the
strategy profile (1, 1) is a Nash equilibrium. Moreover, if an action belongs to
the support of a mixed Nash equilibrium, it will not be eliminated either. Thus,
iterated elimination of strictly dominated strategies can be viewed as a useful pre-
processing step for computing mixed Nash equilibria. Further, the set of strategies
surviving this procedure is independent of the elimination order. However, none of
these statements remains true if we eliminate weakly dominated strategies rather
than strictly dominated strategies: indeed, in the game given in Table 17.4a, R
is weakly dominated by L and B is weakly dominated by T , yet (B, R) is a pure
Nash equilibrium. Nevertheless, every game admits a (mixed) Nash equilibrium
that does not have any weakly dominated strategies in its support.
We remark that the notion of dominance introduced in Definition 17.2 extends
naturally to mixed strategies. Further, it can be shown that if an action a of player
i ∈ N dominates its action b in the sense of Definition 17.2, this remains to be
the case when the other players are allowed to use mixed strategies. On the other
hand, an action that is not dominated by any other action may nevertheless be
dominated by a mixed strategy. For instance, in the game given in Table 17.4b,
action B is not dominated (strictly or weakly) by either T or C, but is dominated
by their even mixture. Thus, when eliminating strictly dominated strategies, we
need to check for dominance by mixed strategies.
players possess infinite action spaces. Such games may appear at first glance to be
esoteric, but they are in fact quite common: a player’s strategy may be the amount
of time it spends on a certain activity, the amount of an infinitely divisible com-
modity it produces, or the amount of money it spends (money is usually assumed
to be infinitely divisible in game-theoretic literature). The reader can easily verify
that the notions of pure Nash equilibria, best response, and weakly/strictly dom-
inated strategies make perfect sense in these settings. The notion of mixed Nash
equilibrium can be extended to infinite action spaces as well, although a some-
what heavier machinery is required, which we do not discuss here. However, for
this more general setting Nash’s celebrated result [11] no longer applies and the
existence of a mixed Nash equilibrium is, in general, not guaranteed.
We will now present a simple example that highlights some of the issues that
arise in the analysis of games with infinite action spaces.
Example 17.6 Alice and Bob are bidding for a painting that is being sold by
means of a first-price sealed-bid auction (see Chapter 7): first, each player sub-
mits a bid in a sealed envelope, and then the envelopes are opened and the player
who submits the higher bid wins and pays his or her bid; the ties are broken in
Alice’s favor.
Suppose Alice values the painting at $300, Bob values the painting at $200,
and they both know each other’s values. Then if Alice wins the painting and
pays $x, her payoff is $300 − x, whereas if Bob wins and pays $y, his payoff is
$200 − y; note that both players’ payoffs may be negative. Each player can bid
any non-negative real number, i.e., each player’s strategy space is R+ .
We can immediately observe that (200, 200) is a Nash equilibrium of the game
described in Example 17.6. Indeed, under this bid vector Alice wins because of
the tie-breaking rule, and her payoff is 100. If Alice bids more, she will have to
pay more, and if she bids less, she will lose the auction, so her payoff will go
down from 100 to 0. On the other hand, if Bob bids less, the outcome will remain
the same, and if he bids more, he wins, but his payoff will be negative. The same
argument shows that any action profile of the form (x, x), where 200 ≤ x ≤ 300, is
a Nash equilibrium. In contrast, no action profile of the form (x, y), where x = y,
can be a Nash equilibrium of this game: the player who submits the higher bid has
an incentive to lower his or her bid a little (so that he or she remains the winning
bidder, but pays less).
Now, suppose that Alice values the painting at $200, whereas Bob values it at
$300. We claim that in this case our game has no pure Nash equilibrium. Indeed,
the same argument as above shows that in any Nash equilibrium, Alice and Bob
submit the same bid (and hence Alice wins). Further, this bid is at least $300,
since otherwise Bob can profitably deviate by increasing his bid. However, this
Chapter 17 823
means that Alice is losing money and can profitably deviate by lowering her bid
so as to lose the auction.
Example 17.7 Consider a task allocation scenario with two agents. Each agent
is assigned one individual task, and there is also a joint task that they need to
work on. Each agent derives utility from both tasks and the utility is a function
of the total effort spent. Let ai denote the percentage of time that agent i devotes
to the joint task, with ai ∈ [0, 1]. The remaining percentage, 1 − ai , is then spent
on the individual task. For given choices (a1 , a2 ) of the two agents, the payoff to
agent i is given by
ui (a1 , a2 ) = βi ln (a1 + a2 ) + 1 − ai , i = 1, 2,
where βi ∈ (0, 1] shows the relative importance that each agent assigns to the joint
task.
To analyze this example, note first that the tuple (0, 0) is not a Nash equilib-
rium: at this strategy profile each player has an incentive to work on the joint
task so as to avoid a payoff of −∞. Hence we restrict ourselves to the region
a1 + a2 > 0, so that the derivative of the logarithm is defined. Now, to find the
824 Chapter 17
best response of each agent, we should solve the equations ∂ui /∂ai = 0 and at
the same time take into account the fact that ai should be in [0, 1]. We obtain the
following best-response equations within the feasible region:
a1 = β1 − a2 , a2 = β2 − a1 .
It will be convenient to split the rest of the analysis into three cases:
Case 1: β1 = β2 = β. We can conclude that there are many Nash equilibria, which
are given by the set
For example, if β = 0.7, then the Nash equilibria are given by the tuples of the
form {(a1 , 0.7 − a1 ) | a1 ∈ [0, 0.7]}.
Case 2: β1 > β2 . Combining the best-response equations above with the con-
straint ai ∈ [0, 1] for i = 1, 2, we see that the best response of player 1 when player
2 chooses a2 is
β1 − a2 if a2 ≤ β1
a1 =
0 if β1 ≤ a2 ≤ 1
We have a similar best-response description for player 2. As β1 > β2 , we cannot
have a1 = β1 − a2 and a2 = β2 − a1 simultaneously, and hence either a1 = 0 or
a2 = 0. In fact, we can see that the first of these options does not lead to a Nash
equilibrium, and hence the only Nash equilibrium is (β1 , 0), i.e., the player who
assigns more value to the joint task is the only person to put any effort into it.
Case 3: β2 > β1 . By a similar argument, (0, β2 ) is the only Nash equilibrium.
For games with more than two players this definition is adjusted in a natural
way: the sum of the utilities of all players is required to be 0. We can replace the
constant 0 in Definition 17.6 with any other constant: any game in the resulting
class can be transformed into a zero-sum game that has the same set of Nash
equilibria by subtracting this constant from the payoff matrix of one of the players.
Obviously, a two-player zero-sum game is completely specified by the payoffs of
the row player, since the payoffs of the column player are simply the negative of
those of the row player. We will adopt this convention within this subsection, and
we will use the following running example to illustrate our exposition.
Example 17.8 Consider the 2 × 2 game represented by the following payoff ma-
trix for the row player. " #
4 2
A=
1 3
One way to start thinking about how to play in such a game is to be pessimistic
and assume the worst-case scenario for any chosen strategy. For the row player,
this means to assume that, no matter what it chooses, the other player will make
the choice that minimizes the payment to the row player, i.e., that for every row
i that it might choose, the value min j Ai j will be realized. Hence the row player
would play so as to maximize this minimum value. In the same spirit, if the
column player also thinks pessimistically, it will think that, no matter what it
chooses, the other player will pick the action that maximizes the payment, i.e., for
a column j the column player will end up paying maxi Ai j . The column player
will then want to ensure that it minimizes this worst-case payment. Therefore, the
two players are interested in achieving, respectively, the values
Let us now look at Example 17.8. We see that if the row player chooses the
first row, the worst-case scenario is that it receives 2 units of payoff. If it picks
the second row, it may do even worse and receive only 1 unit of payoff. Hence, it
would select the first row and v1 = 2. Similarly, we can argue that v2 = 3 and that
the column player would choose the second column. We can see, however, that
this type of pessimistic play does not lead to a Nash equilibrium, since the row
player would have an incentive to deviate. In fact, this game does not have a Nash
equilibrium in pure strategies.
We can now carry out the same reasoning over the space of mixed strategies.
For this we need to define the values
v̄1 := max min u1 (s, t), and v̄2 := min max u1 (s, t),
s t t s
826 Chapter 17
where the maximization and minimization above are over the set of mixed strate-
gies. Since we are now optimizing over a larger set, it follows that
v1 ≤ v̄1 ≤ v̄2 ≤ v2 .
To compute v̄1 in Example 17.8 (and generally for any 2 × 2 zero-sum game),
we observe that for any mixed strategy s = (s1 , 1 − s1 ) of the row player, the
payoff of each pure strategy of the column player will be a linear function of
s1 . Further, when computing mint u1 (s, t) for a given s, it suffices to consider the
pure strategies of the column player only. Hence, in the absence of dominated
strategies, the solution will be found at the intersection of two linear functions. In
particular, for our example, we have:
3. All Nash equilibria yield the same payoffs to the players, v̄ and −v̄, respec-
tively. Furthermore, if (s, t) and (s , t ) are Nash equilibria, then (s, t ) and
(s , t) are also Nash equilibria with the same payoff.
Chapter 17 827
Therefore, in zero-sum games the notion of Nash equilibrium does not face the
criticism regarding the existence of multiple equilibria and the problem of choos-
ing among them, since all equilibria provide the same payoffs and coordination
is never an issue. Furthermore, from the algorithmic perspective, even though
finding v̄ for an arbitrary zero-sum game is not as easy as for the 2 × 2 case of
Example 17.8, it is still a tractable problem (for more details, see Section 2.7).
A B
2 2
C D E F
3 Extensive-Form Games
The games studied in Section 2 clearly cannot capture situations where players
move sequentially. There are, however, many examples, such as board games
(chess, go, etc.), negotiation protocols, and open-cry auctions, which may evolve
in several rounds, so that players take turns and make a decision after they are
informed of the decisions of their opponents. In this section, we develop the tools
for analyzing such games.
To start, we will describe a simple game with sequential moves that will be
used as our running example throughout this section.
Example 17.9 Consider the game with two players depicted in Figure 17.1. In
this game, player 1 moves first and has to decide between the alternatives A and
B. After that, player 2 moves; depending on what player 1 chose, it will have
to decide either between C and D or between E and F. Finally, after the move
of player 2, the game ends and each player receives a payoff depending on the
terminal state that the game reached. For instance, if player 1 chooses A and
Chapter 17 829
• T = (V, E) is a rooted tree, where the set of nodes is partitioned into disjoint
sets V = V1 ∪V2 ∪ · · · ∪Vn ∪Vc ∪VL . Here Vi , i = 1, ..., n, denotes the set of
nodes where it is agent i’s turn to make a decision, Vc denotes the set of
nodes where a move is selected by chance, and VL denotes the set of leaves.
For every node v ∈ Vc , we are also given a probability distribution on the
set of edges leaving u.
/ The payoff
VL is the set of the four leaves. There are no chance moves, so Vc = 0.
function of player 1 is given by
We will now present a less abstract example that illustrates the applicability of
extensive-form games in the analysis of realistic scenarios.
Example 17.10 Alice and Bob have to share 8 identical cupcakes; the cupcakes
cannot be cut into pieces, so each player has to be allocated an integer number of
cupcakes. The cupcakes become stale quickly, so after each round of negotiation
half of them will have to be thrown out. The negotiation procedure consists of two
rounds of offers. In the first round, Alice proposes the number of cupcakes nA that
she should receive, where 0 ≤ nA ≤ 8. Bob can either accept this offer (in which
case Alice gets nA cupcakes, Bob gets 8 − nA cupcakes, and the game terminates),
or reject it. If Bob rejects, in the second round he can decide on the number of
cupcakes nB that he should receive; however, by this time half of the cupcakes will
have perished, so we have 0 ≤ nB ≤ 4. This game can be described by the tree
given in Figure 17.2.
0 1 2 3 4 5 6 7 8
B B B B B B B B B
acc 0 1 2 3 4
(4, 4) (4, 0) (3, 1) (2, 2) (1, 3) (0, 4)
Figure 17.2: Cupcake division game. A branch labeled with i corresponds to the
number of cupcakes the player decides to keep, and “acc” corresponds to Bob’s
decision to accept Alice’s offer. To save space, we only show Bob’s responses to
one of Alice’s actions.
Chapter 17 831
Revisiting again Example 17.9, we can check that the profiles that form Nash
equilibria are (B, (C, E)), (B, (D, E)), and (A, (C, F)). Let us inspect, for instance,
the first of these profiles. Player 1 has decided to play B, and player 2 has decided
to play C if it sees that player 1 played A, and E if player 1 played B. This results
in payoffs of 3 and 4, respectively. If player 1 were to change to strategy A, it
832 Chapter 17
2. the nodes of the tree Tp follow the partitioning of the original tree into the
disjoint sets V1 ∩ Tp ,V2 ∩ Tp , . . . ,Vn ∩ Tp , Vc ∩ Tp , VL ∩ Tp .
3. the set of (terminal) histories is simply the set of all sequences h for which
(h, h ) is a (terminal) history in G.
4. the payoffs for all agents at all leaves of Tp are the same as their payoffs at
the respective leaves of T .
Chapter 17 833
In Example 17.9, there are three subgames, namely the game G itself, which
/ the game G(A), in which player 1 has no choice, but player
corresponds to G(0);
2 has to choose between C and D; and the subgame G(B), where player 1 has no
choice and player 2 has to choose between E and F. In other words, the number
of subgames is equal to the number of nonterminal histories.
Given the notion of a subgame, what we would like to capture now is a stability
concept that is robust against possible mistakes or changes in the plan of action of
the other players. Consider a strategy profile s = (s1 , ..., sn ). Suppose now that the
game has started some time ago, and a (non-terminal) history h has occurred. This
history may or may not be consistent with the profile s: due to possible mistakes
or unexpected changes some of the players may have deviated from s. Intuitively,
the profile s is robust if for every player i, the strategy si projected on the subgame
G(h) is the best that i can do if, from the beginning of G(h) and onward, the rest
of the players adhere to s. This approach can be formalized as follows.
That is, at a subgame-perfect equilibrium, each strategy is not just a best re-
sponse at the start of the game, but also remains a best response to the other play-
ers’ strategies at any possible point that the game may reach. It is worth looking at
Example 17.9 again to clarify the definition. We can see that out of the three Nash
equilibria that we identified, (B, (C, E)) is indeed a subgame-perfect equilibrium,
since at both nodes where player 2 has to move, it is playing its optimal strat-
egy. On the other hand, (B, (D, E)) is not an SPE. Indeed, B is a best response of
player 1 to (D, E); however, in the left subtree where it is player 2’s turn to move,
it is not playing an optimal strategy and should have chosen C instead. Similarly,
(A, (C, F)) is not an SPE because player 2 is not playing optimally in the right
subtree. This shows that the concept of subgame-perfect equilibrium is stronger
than that of a Nash equilibrium.
Example 17.11 Consider the game with two players given in Figure 17.3.
Player 1 has to move first and decide between the alternatives A and B. After that,
there is a random event represented by the two chance nodes, and then player 2
Chapter 17 835
1
A B
c c
1/3 2/3 3/4 1/4
2 2 2 2
0 1 0 1 0 1 0 1
Figure 17.3: An extensive-form game with two players, chance moves, and non-
unique optimal strategy for player 2.
has to move. Note that among the terminal states, there are states where player
2 is indifferent, e.g., when choosing an action in the leftmost and the rightmost
subtree.
For games with finite strategy spaces, or, more generally, for games where an
optimal action exists at every node (given any strategy profile for the subgame
downward from this node), the extension of backward induction to handle all the
above-mentioned issues is as follows:
Let us consider again Example 17.11. To make the notation more compact, we
can use a 4-bit string to encode the strategy of player 2 for the four nodes where
it has to move, with the intended meaning that 0 stands for choosing the left leaf
and 1 stands for choosing the right leaf. Given this notation, the analysis proceeds
as follows:
836 Chapter 17
1. There are four subgames of length 1, in all of which player 2 has to move.
We can see that player 2 is indifferent in the leftmost subgame and in the
rightmost subgame. In the other two subgames there is a unique optimal
action. Hence, in total we have four optimal strategies for player 2, namely,
0100, 0101, 1100, and 1101.
2. There are two subgames of length 2, which both start with a chance move.
For all four choices of player 2 identified in the previous round, we find the
expected payoffs for the two players.
• In the left chance node, these are, respectively, (5/3, 1), (5/3, 1),
(7/3, 1), and (7/3, 1).
• In the right chance node, these are, respectively, (6/4, 2), (7/4, 2),
(6/4, 2), and (7/4, 2).
3. There is only one subgame of length 3, namely the game itself, and player
1 has to decide at the root whether to play A or B. For each of the opti-
mal actions of player 2, it can compare its expected payoff and see that it
should play B only when player 2 chooses the strategy 0101. Hence we have
identified four subgame-perfect equilibria, namely, (A, 0100), (B, 0101),
(A, 1100), and (A, 0101).
One can prove that the above procedure indeed captures all the subgame-
perfect equilibria of a game. For games where an optimal action does not exist for
some node in the tree, the procedure cannot terminate with a solution, and we can
conclude that the game does not possess a subgame-perfect equilibrium. Hence
the main conclusion of this subsection can be summarized in the theorem below.
Theorem 17.3 The strategy profiles returned by the backward induction proce-
dure are precisely the subgame-perfect equilibria of the game.
When the action spaces of all players are finite, we always have an optimal
action at every node of the game. Hence, the procedure of backward induction
will always terminate. This leads to the following corollary.
Corollary 17.1 Every extensive-form game where each node has a finite fan-out
has at least one subgame-perfect equilibrium.
4 Bayesian Games
So far we have assumed that players have full knowledge regarding the game they
are playing, i.e., they know all the possible plays of the game, they are always
Chapter 17 837
informed about the current configuration of the game, and they know the pref-
erences of the other players. However, in many settings in multiagent systems,
players may not be aware of what the other players think, and will have to choose
their strategy under uncertainty. Hence, there is a need for a framework that can
capture selfish behavior in such circumstances.
We will now present a model that addresses this issue. To keep the presentation
simple, we will focus on games that are played simultaneously, and we will first
illustrate some of the main concepts with two examples.
F M F M
F (3, 3) (0, 0) F (3, 0) (0, 3)
M (0, 0) (3, 3) M (0, 3) (3, 0)
(a) The game when Bob is (b) The game when Bob is
of type m2 . of type a2 .
We can view this situation as a game with two possible states that are com-
pletely specified by the type of player 2 (Bob). We can imagine that the game
is played as follows. Just before the game starts, the actual state is not known.
Then a random coin flip occurs that determines Bob’s type, i.e., his type is drawn
according to some probability distribution, which is public, and hence known to
Alice. For example, Bob may represent an entity chosen from some population
838 Chapter 17
that follows this distribution (i.e., all guys Alice ever socialized with). Alter-
natively, the type may depend on certain parameters of the environment, which
follow this publicly known distribution. Therefore, when the game starts, Bob
is fully informed of his own type and of Alice’s preferences, whereas Alice only
knows the probability distribution over Bob’s possible types.
To continue, we first need to determine what constitutes a strategy for Bob in
this game. Bob receives a signal (in our example, learns his type) in the beginning
of the game. Therefore, a strategy here is a plan of action for every possible value
of the signal (for every possible realization of the state of the game). For instance,
a possible strategy for Bob is to play M if his actual type is m2 and F if his actual
type is a2 . When the game starts, Bob receives his signal (and hence learns his
type) and performs the action that corresponds to this type.
To start analyzing the game, let us consider Alice first. In order for Alice to
choose an action, she needs to estimate her expected payoff for all possible strate-
gies of Bob. Since there are two possible actions, and two possible types, there
are exactly four different strategies for Bob. In Table 17.6, we see the expected
payoff of each action of Alice, against Bob’s possible strategies. We denote each
strategy of Bob by a tuple (x, y), where x is Bob’s action when he is of type m2
and y is Bob’s action when he is of type a2 .
For instance, in the strategy profile (F, (F, M)), where Alice chooses F, her
payoff is 3 with probability 2/3 and 0 with probability 1/3, yielding an expected
payoff of 2. It is now easy to verify that the pair of strategies (F, (F, M)) is stable,
i.e., no player wants to change his or her actions. Indeed, given Bob’s strategy
(F, M), Alice cannot improve her expected payoff by switching to M. As for
Bob, when he is of type m2 , the final outcome will be that both players select F,
hence Bob receives the maximum possible payoff. Similarly, when he is of type
a2 , he will play M, which means that the outcome of the game is (F, M); again,
this yields the maximum possible payoff for Bob, according to the right-hand side
of Table 17.5. As we see, for Bob we need to check that he does not have an
incentive to deviate for every possible type. This notion of stability is referred to
as the Bayes–Nash equilibrium, which we will define formally later in this section.
We now move to a slightly more complicated example. Building on the first
Chapter 17 839
example, suppose now that Alice’s type is uncertain as well. Let m1 and a1 be the
two possible types for Alice: Alice wants to meet Bob if she is of type m1 , and
avoid him if she is of type a1 . Suppose that these types occur with probability
1/2 each. Hence, when the game begins, the actual type of each player is deter-
mined and communicated to each player separately, and each player retains the
uncertainty regarding the type of the other player.
We can see that in this example there are 4 possible states of the game, namely,
the states (m1 , m2 ), (m1 , a2 ), (a1 , m2 ), and (a1 , a2 ). Alice cannot distinguish be-
tween states (m1 , m2 ) and (m1 , a2 ) since she receives the signal m1 in both of them,
and neither can she distinguish between (a1 , m2 ) and (a1 , a2 ). Similarly, Bob can-
not distinguish between the states (m1 , m2 ) and (a1 , m2 ). For each player, and for
each possible type of player, the beliefs induce a probability distribution on the
state space, e.g., for Alice’s type m1 , the distribution assigns probability 2/3 to
the state (m1 , m2 ), 1/3 to the state (m1 , a2 ), and 0 to the other two states, whereas
for Alice’s type a1 , the distribution is 2/3 on state (a1 , m2 ), 1/3 on state (a1 , a2 ),
and 0 on the remaining states.
To state the stability conditions, as in the first example, we assume that the
players select their plan of action before they receive their signal. Hence a strat-
egy for either player is to choose an action for every possible type that may be
realized. Informally, a strategy profile is in equilibrium if for each type of player,
the expected payoff cannot be improved, given (1) the belief of the player about
the state that the game is in and (2) the actions chosen for each type of other player.
Essentially this is as if we treat each type of a player as a separate player and ask
for a Nash equilibrium in this modified game.
Before moving to the formal definitions, we claim that in the second example
the tuple of strategies ((F, M), (F, F)) is stable under this Bayesian model. To see
this, consider first Alice’s type m1 . Under the given strategy profile, when Alice
is of type m1 , she chooses F. We can easily see that playing F is optimal, given
Alice’s belief about Bob’s type and given Bob’s strategy (F, F). This is because
when Alice is of type m1 , she wants to coordinate with Bob. Since Bob’s strategy
is (F, F), then obviously the best choice for Alice is to choose F. In a similar
way we can verify that the same holds when Alice is of type a1 , in which case she
chooses M. Then, given that she does not want to meet Bob, this is the best she
can do against (F, F). Regarding Bob, when he is of type m2 , his expected payoff
by playing F is 3/2, as he coordinates with Alice only half the time (according to
Bob’s belief). This is the best that he can achieve given his belief, hence there is
no incentive to deviate. The same holds when he is of type a2 .
840 Chapter 17
Definition 17.12 A Bayesian game consists of a set of agents N = {1, . . . , n}, and
for each agent i:
• a set of actions Ai ;
• a set of types Ti ;
• a utility function ui defined on pairs (a, t), where a is an action profile and t
is a state. Abusing notation, we write ui (s, t) to denote the payoff of agent i
Chapter 17 841
in state t when all players choose their actions according to s. The expected
payoff for agent i under a strategy profile s, given that it is of type ti , is then
In the Bayesian games discussed in the beginning of this section, both players
had finitely many possible types. While Bayesian games with infinite type spaces
may appear esoteric, they model a very important class of multiagent interactions,
namely, auctions. Auctions and their applications to multiagent system design are
discussed in detail in Chapter 7; here, we will just explain how they fit into the
framework of Bayesian games.
For concreteness, consider the first-price auction with one object for sale and
n players (bidders). Each player’s action space is the set of all bids it can submit,
i.e., the set R+ of all non-negative real numbers. Given the bids, the mechanism
selects the highest bidder (say, breaking ties according to a fixed player ordering),
allocates to it the item, and charges it the bid it has submitted. Each player’s type
is the value it assigns to the object, which can be any real number in a certain
interval; by normalizing, we can assume that each player’s value for the object
is a real number between 0 and 10. This value determines the utility it derives
from different auction outcomes: if it receives the object after bidding $5, its
utility is 1 when its value is $6, and −1 if its value is $4. The players’ types
are assumed to be drawn from a distribution D over [0, 10]n ; at the beginning of
the auction, each player learns its value, but, in general, does not observe other
players’ values. Then each player’s belief about the state of the game when its
value is vi is the probability density function of D, conditioned on i’s value being
vi . For instance, in the setting of independent private values, where D is a product
distribution, player i remains ignorant about other players’ types, and therefore
p(vi , v−i ) = p(vi , v−i ) for any vi , vi ∈ [0, 10] and any v−i ∈ [0, 10]n−1 . In contrast,
in the case of common values, where D only assigns non-zero weight to vectors
of the form (v, . . . , v), once player i learns that its value is vi , it assigns probability
1 to the state (vi , vi , . . . , vi ) and probability 0 to all other states.
We can now formalize our intuition of what it means for an outcome of a
Bayesian game to be stable.
is a strategy profile s = (s1 , . . . , sn ) such that for each i ∈ N and each ti ∈ Ti the
expected payoff of agent i with type ti at s is at least as high as its expected payoff
842 Chapter 17
One can see from this definition that an alternative way to define a Bayes–Nash
equilibrium is to consider each type of player as a separate player and consider the
Nash equilibria of this expanded game. The reader can easily verify that for the
second variant of the Battle of the Sexes game considered in this section (where
both Alice and Bob can be of type “meet” or “avoid”), the argument given in the
end of Section 4.1 shows that the strategy profile ((F, M), (F, F)) is a Bayes–Nash
equilibrium. In contrast, ((F, M), (M, F)) is not a Bayes–Nash equilibrium of this
game. Indeed, under this strategy profile, if Alice is of type m1 , her expected
payoff is 1: she coordinates with Bob only if he is of type a2 , i.e., with probability
1/3. On the other hand, by playing M when her type is m1 , she would increase her
expected payoff for this type to 2, i.e., she can profitably deviate from this profile.
Also, for the first-price auction example considered above, it can be verified
that if n = 2 and each player draws its value independently at random from the
uniform distribution on [0, 10] (i.e., D = U[0, 10]×U[0, 10]), then the game admits
a Bayes–Nash equilibrium in which each player bids half of its value, i.e., si (vi ) =
vi /2 for i = 1, 2.
5 Conclusions
We have provided a brief overview of three major classes of non-cooperative
games, namely, normal-form games, extensive-form games, and Bayesian games,
and presented the classic solution concepts for such games. A recent textbook of
Shoham and Leyton–Brown [18] discusses game-theoretic foundations of multi-
agent systems in considerably more detail. Undergraduate-level textbooks on
game theory include, among others, [5] and [12]; finally, there are plenty of more
advanced or graduate-level books ranging from classic ones such as [9, 14] to
more modern ones like [10, 13, 15].
6 Exercises
1. Level 1 Two players decide to play the following game. They start driving
toward each other and a collision is unavoidable unless one (or both) of
the drivers decides to change its driving course (chickens out). For each
player, the best outcome is that it keeps driving straight while the other
player chickens out. The next best outcome is that they both chicken out.
Chapter 17 843
The third-best option is that the player itself chickens out while the other
player drives straight, and finally the worst outcome is that they both keep
driving straight till the collision occurs. Write down a normal-form game
that represents this situation, and find its pure Nash equilibria.
2. Level 1 Consider a two-player game defined by the following payoff ma-
trix:
W X Y Z
A (15, 42) (13, 23) (9, 43) (0, 23)
B (2, 19) (2, 14) (2, 23) (1, 0)
C (20, 2) (20, 21) (19, 4) (3, 1)
D (70, 45) (3, 11) (0, 45) (1, 2)
Decide whether the following statements are true or false. Explain your
answer.
(a) A strictly dominates B.
(b) Z strictly dominates W .
(c) C weakly dominates D.
(d) X weakly dominates W .
(e) C is a best response to X.
(f) Z is a best response to A.
3. Level 1 Show that ( 13 L + 23 M, 12 T + 12 B) is a mixed Nash equilibrium of the
following game:
L M R
T (6, 22) (3, 26) (47, 22)
C (4, 4) (4, 2) (99, 42)
B (3, 22) (4.5, 18) (19, 19)
L M R
T (1, 5) (3, 16) (10, 10)
C (7, 8) (9, 3) (0, 5)
B (5, 0) (7, 6) (2, 3)
844 Chapter 17
5. Level 2 Two business partners are working on a joint project. In order for
the project to be successfully implemented, it is necessary that both partners
engage in the project and exert the same amount of effort. The payoff from
the project is 1 unit to each partner, whereas the cost of the effort required
is given by some constant c, with 0 < c < 1. This can be modeled by the
following game (where W stands for Work and S for Slack):
S W
S (0, 0) (0, −c)
W (−c, 0) (1 − c, 1 − c)
Find all pure and mixed Nash equilibria of this game. How do the mixed
equilibria change as a function of the effort c?
6. Level 2 Show that every 2 × 2 normal-form game that has more than two
pure strategy Nash equilibria possesses infinitely many mixed Nash equi-
libria.
7. Level 2 Find all pure and mixed Nash equilibria in the following game.
A B C D
X (0, 0) (5, 2) (3, 4) (6, 5)
Y (2, 6) (3, 5) (5, 3) (1, 0)
Hint: One way to reason about 2 × n normal-form games is to consider first
a candidate mixed strategy for player 1, say (π, 1 − π). Then one should
consider the payoff provided by the pure strategies of player 2, given (π, 1 −
π), and find all the possible values of π for which a mixed Nash equilibrium
is possible, i.e., one should find the values of π for which at least two of the
column player’s strategies can belong to the support of an equilibrium. For
this, you may exploit the properties listed after Theorem 17.1.
8. Level 2 Two students have to complete a joint assignment for a course. The
final grade depends on the amount of effort exerted by the students. Each
student wants to have the assignment completed, but at the same time each
does not want to work much more than the other. This is captured by the
following utility function: let a1 and a2 denote the amount of effort exerted
by each of the students, a1 , a2 ∈ R. Then
ui (ai , a j ) = ai (c + a j − ai ), i = 1, 2, j = 3 − i,
where c is a given constant.
This function captures the fact that if ai exceeds a j by at least c, then the
utility of player i becomes negative. Find the pure Nash equilibria of this
game.
Chapter 17 845
9. Level 2 Find the value and the Nash equilibria in the following zero-sum
games:
A B
(i) X 2 7
Y 4 3
A B C D
(ii) X 5 2 3 4
Y 4 6 5 8
Hint: For (ii), try to generalize the technique described in Section 2.6 to
2 × n zero-sum games.
10. Level 2 Show that the existence of a polynomial-time algorithm for solving
3-player zero-sum games with finitely many actions per player would imply
the existence of a polynomial-time algorithm for finding a Nash equilibrium
in any 2-player normal-form game with finitely many actions per player.
11. Level 2 Consider the following game G with two players P1 and P2:
A B
2 2
C D E F
12. Level 2 Two candidates A and B compete in an election. There are n voters;
k of them support A, and m = n − k of them support B. Each voter can
either vote (V) or abstain (A), and incurs a cost c, 0 < c < 1, when he or she
votes. Each voter obtains a payoff of 2 when its preferred candidate wins
846 Chapter 17
(gets a strict majority of votes), a payoff of 1 if both candidates get the same
number of votes, and a payoff of 0 if its preferred candidate loses. Thus, if a
voter abstains, its payoffs for win, tie, and loss are 2, 1, and 0, respectively,
and if he or she votes, the payoffs are 2 − c, 1 − c, and −c, respectively.
Find the pure Nash equilibria of this game.
13. Level 2 Consider the voting game with abstentions described in the previ-
ous exercise, but suppose that players vote one by one in a fixed order, and
each player observes the actions of all players that vote before him or her.
For any fixed ordering of the players, the resulting game is an extensive-
form game with n players, in which each player moves exactly once and
chooses between voting (V) and abstaining (A). We will say that a voter is
an A-voter if he or she prefers candidate A over candidate B, and a B-voter
otherwise. Compute the subgame-perfect equilibrium for the following se-
quences of 3 voters:
(a) A, B, A.
(b) B, A, A.
(c) A, A, B.
15. Level 2 Consider a variant of the previous exercise that intends to capture
the fact that players in such games do care about the amount of money
received by the other players. Suppose that the protocol is the same as
before, but now the utility of each player is given by ui = xi − βx3−i , i = 1, 2,
where xi is the amount received by player i, x3−i is the amount received by
the opponent, and β is a positive constant capturing how envious the players
are, i.e., how much each player cares about the amount received by the other
player. Find the subgame-perfect equilibria of this new game.
players has to decide whether to remove one or two disks from the axis.
The person who will remove the last disk wins 1 unit of money, paid by the
other player. Suppose that player 1 moves first.
17. Level 2 Two agents are involved in a dispute. Each of them can either fight
or yield. The first agent is publicly known to be of medium strength; the
second agent is either strong (i.e., stronger than the first agent) or weak
(i.e., weaker than the first agent). The second agent knows its strength; the
first agent thinks that agent 2 is strong with probability α and weak with
probability 1 − α. The payoffs are as follows: if an agent decides to yield,
its payoff is 0 irrespective of what the other agent chooses. If an agent
fights and its opponent yields, its payoff is 1, irrespective of their strength.
Finally, if they both decide to fight, the stronger agent gets a payoff of 1,
and the weaker agent gets a payoff of −1. Find all Bayes–Nash equilibria
of this game if
References
[1] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing
two-player Nash equilibria. Journal of ACM, 56(3), 2009.
[2] Xi Chen, Shang-Hua Teng, and Paul Valiant. The approximation complexity of
win-lose games. In SODA’07: 18th Annual ACM-SIAM Symposium on Discrete
Algorithms, pages 159–168, 2007.
[3] Vincent Conitzer and Tuomas Sandholm. New complexity results about Nash equi-
libria. Games and Economic Behavior, 63(2):621–641, 2008.
[6] Itzhak Gilboa and Eitan Zemel. Nash and correlated equilibria: Some complexity
considerations. Games and Economic Behavior, 1(1):80–93, 1989.
[7] Michael Kearns. Graphical games. In Noam Nisan, Tim Roughgarden, Eva Tardos,
and Vijay Vazirani, editors, Algorithmic Game Theory, pages 159–178. Cambridge
University Press, 2007.
[8] Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games
using simple strategies. In EC’03: ACM Conference on Electronic Commerce, pages
36–41, 2003.
[9] Duncan Luce and Howard Raiffa. Games and Decisions. Wiley, New York, 1957.
[10] Roger Myerson. Game Theory: Analysis of Conflict. Harvard University Press,
Cambridge, Massachusetts, 1991.
[12] Martin Osborne. An Introduction to Game Theory. Oxford University Press, 2004.
[13] Martin Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, 1994.
[14] Guillermo Owen. Game Theory. Academic Press, New York, 2nd edition, 1982.
[16] Tim Roughgarden. Selfish Routing and the Price of Anarchy. MIT Press, 2005.
[17] Reinhard Selten. Reexamination of the perfectness concept for equilibrium points
in extensive games. International Journal of Game Theory, 4:25–55, 1975.
[18] Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game
Theoretic and Logical Foundations. Cambridge University Press, 2009.
[19] Haralampos Tsaknakis and Paul Spirakis. An optimization approach for approxi-
mate Nash equilibria. Internet Mathematics, 5(4):365–382, 2008.
[20] Haralampos Tsaknakis and Paul Spirakis. Practical and efficient approximations
of Nash equilibria for win-lose games based on graph spectra. In WINE’10: 6th
Workshop on Internet and Network Economics, pages 378–390, 2010.
[22] John von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen,
100:295–320, 1928.
[23] John von Neumann and Oskar Morgenstern. Theory of Games and Economic Be-
havior. Princeton University Press, 1944.
Subject Index
XOR-language, 250