Kok&Otros Multi RobotDecisionMakingUsingCoordinationGraphs
Kok&Otros Multi RobotDecisionMakingUsingCoordinationGraphs
Jelle R. Kok
Matthijs T. J. Spaan
Nikos Vlassis
Intelligent Autonomous Systems Group, Informatics Institute
Faculty of Science, University of Amsterdam, The Netherlands
{jellekok,mtjspaan,vlassis}@science.uva.nl
Abstract
Within a group of cooperating agents the decision
making of an individual agent depends on the actions
of the other agents. In dynamic environments, these
dependencies will change rapidly as a result of the
continuously changing state. Via a context-specific
decomposition of the problem into smaller subproblems, coordination graphs offer scalable solutions to
the problem of multiagent decision making. We will
apply coordination graphs to the continuous domain
by assigning roles to the agents and then coordinating
the different roles. Finally, we will demonstrate this
method in the RoboCup soccer simulation domain.
Introduction
In this paper we will describe a framework to coordinate multiple robots using coordination graphs. We
assume a group of robotic agents that are embedded
in a continuous and dynamic domain and are able to
perceive their surroundings with sensors. The continuous nature of the state space makes the direct application of context-specific CGs difficult. To alleviate
the problem, we propose a discretization of the state
by assigning roles to the agents, and subsequently apply the CG-based method to the derived set of roles.
It turns out that such an approach offers additional
benefits: the set of roles allows for the definition of
natural coordination rules that exploit prior knowledge about the domain. This greatly simplifies the
modeling and the solution of the problem at hand.
The setup of the paper is as follows. In Section 2
we review the coordination problem from a gametheoretic point of view, and in Section 3 we explain
the concept of a CG. In Section 4 we will describe
our framework to coordinate agents in a continuous
dynamic environment using roles, followed by an extensive example using the RoboCup soccer simulation
domain in Section 5. Finally, we give our conclusions
and discuss possible further extensions in Section 6.
G1
PSfrag replacements
G2
G3
G4
Coordination graphs
A coordination graph (CG) represents the coordination requirements of a system [4]. A node in the
graph represents an agent, while an edge in the graph
defines a (possible directed) dependency between two
agents. Only interconnected agents have to coordinate their actions at any particular instance. Figure 1
shows a possible CG for a 4-agent problem. In this
example, G2 has to coordinate with G1 , G4 has to coordinate with G3 , G3 has to coordinate with both G4
and G1 , and G1 has to coordinate with both G2 and
G3 . When the global payoff function can be decomposed as a sum of local payoff functions, the global
coordination problem can be replaced by a number of
easier local coordination problems. The agents can
then find the joint optimal action by using an efficient
variable elimination algorithm in combination with a
message passing scheme [4].
The algorithm assumes that each agent knows its
neighbors in the graph (but not necessarily their payoff function which might depend on other agents).
Each agent is eliminated from the graph by solving a local optimization problem that involves only
in-front-of-same-door(G1 , G2 )
a1 = enterDoor
a2 = enterDoor : 50i
This rule indicates that when the two agents are located in front of the same door and both select the
same action (entering the door), the global payoff
value will be reduced by 50. When the state is not
consistent with the above rule (and the agents are
not located in front of the same door), the rule does
not apply and the agents do not have to coordinate
their actions. By conditioning on the current state
the agents can discard all irrelevant rules, and as a
consequence the CG is dynamically updated and simplified. Each agent thus only needs to observe that
part of the state mentioned in its value rules.
For a more extensive example, see Figure 2. Beneath the left graph all value rules, defined over binary
action and context variables, are depicted together
with the agent the rule applies to. The coordination
dependencies between the agents are represented by
directed edges, where each (child) agent has an incoming edge from the (parent) agent that affects its
decision. After the agents observe the current state,
x = true, the last rule does not apply anymore and
can be removed. As a consequence, the optimal joint
rag replacements
G1
G2
G1
G3
G4
G1
G2
G3
G4
ha1 a3 x
ha1 a2 x
ha2 x
ha3 a2 x
ha3 a4 x
G2
G3
G4
: 4i
: 5i
: 2i
: 5i
: 10i
ha1 a3
ha1 a2
ha2
ha3 a2
: 4i
: 5i
: 2i
: 5i
We are interested in problems that involve multiple robots that are embedded in a continuous domain, have sensors with which they can observe their
surroundings, and need to coordinate their actions.
As a main example we will use the RoboCup simulation soccer domain (see [3] and references therein) in
which a team of eleven agents have to fulfill a com-
Experiments
We have implemented this framework in our simulation robot soccer team UvA Trilearn [3] to improve upon the ball passing between teammates. The
RoboCup soccer server [2] provides a fully distributed
dynamic multi-robot domain with both teammates
and adversaries. It models many real-world complexities such as noise in object movement, noisy sensors
and actuators, limited physical ability and restricted
communication.
The RoboCup soccer simulation does not allow
agents to communicate with more than one agent at
the same time, which makes it impossible to apply the
original variable elimination algorithm. Therefore, we
have decided to make the state fully observable to all
agents. This makes communication superfluous, since
each agent can model the complete variable elimination algorithm by itself (see [6] for details). This has
no effect on the outcome of the algorithm.
In the non-coordinating case a teammate moves to
the interception point only after he has observed a
change in the ball velocity (after someone has passed
the ball) and concludes that he is the fastest teammate
to the ball. Before the ball changes velocity, he has
no notion of the fact that he will soon receive the ball
and does not coordinate with the passing player.
To accomplish coordination, all agents are first dynamically assigned a role based on the current state.
Next, these roles are coordinated by performing the
variable elimination algorithm using predefined value
rules that make use of the available actions and context variables. Hereafter, we will describe in more
detail how we use coordination graphs in order to coordinate the passer and the receiver, but also the receiver with the second receiver, that is, the player who
will be passed to by the first receiver.
First, we have implemented a role assignment
function that assigns the roles interceptor, passer,
receiver, and passive among the agents using the
continuous state information. The assignment of roles
can be computed directly from the current state information. For instance, the fastest player to the ball will
has-role-receiver(j)
isPassBlocked(i, j, dir)
ai = passTo(j, dir)
aj = moveTo(dir) : u(j, dir)i j 6= i
1 North is directed towards the opponent goal and center
corresponds to a pass directly to the current agent position.
2 Note that we enumerate all rules using variables. The complete list of value rules is the combination of all possible instantiations of these variables. In all rules, dir D.
hppasser
2
hppasser
3
hppasser
4
; ai = clearBall : 10i
; is-in-front-of-goal(i)
is-empty-space(i, n)
ai = dribble(n) : 30i
ai = score : 100i
hpreceiver
5
has-role-interceptor(j)
isPassBlocked(j, i, dir)
ai = moveTo(dir) : u(i, dir)i
hpreceiver
6
has-role-passer(j)
has-role-receiver(k)
isPassBlocked(k, i, dir)
j 6= i
the complete strategy of the team when playing different kinds of opponents.
The above rules contain a lot of contextdependencies represented in the state variables. In
Figure 3 we simplified the coordination graph by
conditioning on the roles, if we now condition further on the specific context variables, we get the
graph depicted in Figure 4, corresponding to the
following value rules (we assume for simplicity that
only the context variables isPassBlocked(1, 2, s) and
isPassBlocked(2, 3, nw) are true):
G1 : hppasser
1
aj = passTo(k, dir2)
ak = moveTo(dir2)
ai = moveTo(dir) : u(i, dir)i j, k 6= i
hpreceiver
7
intercep.
hp8
hppassive
9
moveToStratPos : 10i
G2 :
intercept : 100i
G3 :
moveToStratPos : 10i
a1 = passTo(2, s)
a2 = moveTo(s) : 50i
; a1 = dribble(n) : 30i
; a1 = clearBall : 10i
hppasser
2
hppasser
3
hpreceiver
7
receiver
hp6
hpreceiver
7
a2 = moveToStratPos : 10i
a1 = passTo(2, dir)
a2 = moveTo(dir)
a3 = moveTo(nw) : 30i
a3 = moveToStratPos : 10i
Now the variable elimination algorithm can be performed. Each agent is eliminated from the graph by
maximizing its local payoff. In the case that agent 1 is
eliminated first, it gathers all value rules that contain
a1 and distributes its conditional strategy
hppasser
1
a2 = moveTo(s)
a3 = moveTo(nw) : 80i
hppasser
1
a2 = moveTo(s)
a3 = moveTo(nw) : 50i
hppasser
1
a2 = moveTo(s) : 30i
Figure 4: The coordination graph at Fig. 3 after conditioning on the state variables. The passer (agent 1)
decides to pass the ball to the first receiver (agent 2),
while the second receiver (agent 3) moves to a good
position for the first receiver to pass the ball to.
Table 1: Results of 10 games against ourselves, with
and without coordination in passing.
Wins
Draws
Losses
Avg. score
Passing %
With
5
3
2
0.9 ( 1.19)
82.72 ( 2.06)
Without
2
3
5
0.2 ( 0.42)
64.62 ( 2.17)
We showed how coordination graphs can be successfully applied to cases where a group of robotic
agents are embedded in a dynamic and continuous domain. We assigned roles in order to abstract from the
continuous state to a discrete context, allowing the application of existing techniques for discrete-state CGs.
Currently, we assume that each agent observes that
part of the state that affects its local decisions and
its role assignment. As future work, we would like
to apply the same framework to domains where the
agents do not observe all required state information.
Possible solutions would be to make the action of the
References
[1] C. Boutilier. Planning, learning and coordination in
multiagent decision processes. In Proc. Conf. on Theoretical Aspects of Rationality and Knowledge, 1996.
[2] M. Chen, E. Foroughi, F. Heintz, Z. Huang,
S. Kapetanakis, K. Kostiadis, J. Kummeneje, I. Noda,
O. Obst, P. Riley, T. Steffens, Y. Wang, and X. Yin.
RoboCup Soccer Server for Soccer Server Version 7.07
and later, 2002. At http://sserver.sourceforge.net/.
[3] R. de Boer and J. R. Kok. The incremental development of a synthetic multi-agent system: The UvA
Trilearn 2001 robotic soccer simulation team. Masters thesis, University of Amsterdam, The Netherlands, Feb. 2002.
[4] C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14. The MIT Press, 2002.
[5] C. Guestrin, S. Venkataraman, and D. Koller.
Context-specific multiagent coordination and planning
with factored MDPs. In AAAI 8th Nation. Conf. on
Artificial Intelligence, Edmonton, Canada, July 2002.
[6] J. R. Kok, M. T. J. Spaan, and N. Vlassis. An approach
to noncommunicative multiagent coordination in continuous domains. In M. Wiering, editor, Benelearn
2002: Proceedings of the Twelfth Belgian-Dutch Conference on Machine Learning, pages 4652, Utrecht,
The Netherlands, Dec. 2002.
[7] M. J. Osborne and A. Rubinstein. A course in game
theory. MIT Press, 1994.
[8] M. T. J. Spaan, N. Vlassis, and F. C. A. Groen.
High level coordination of agents based on multiagent
Markov decision processes with roles. In A. Saffiotti,
editor, IROS02 Workshop on Cooperative Robotics,
Lausanne, Switzerland, Oct. 2002.