0% found this document useful (0 votes)
30 views

Multi-Agent Algorithms For Solving Graphical Games

Uploaded by

katezq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Multi-Agent Algorithms For Solving Graphical Games

Uploaded by

katezq
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Multi-Agent Algorithms for Solving Graphical Games

David Vickrey Daphne Koller


Computer Science Department Computer Science Department
Stanford University Stanford University
Stanford, CA 94305-9010 Stanford, CA 94305-9010
[email protected] [email protected]

Abstract centralized computation paradigm making them unsuitable


for our distributed setting.
Consider the problem of a group of agents trying to find a
stable strategy profile for a joint interaction. A standard ap- We propose an approach that modifies both the represen-
proach is to describe the situation as a single multi-player tation of the game and the notion of a solution. Following
game and find an equilibrium strategy profile of that game. the work of LaMura (2000), Koller and Milch (2001), and
However, most algorithms for finding equilibria are computa- Kearns, Littman, and Singh (2001a), we use a structured
tionally expensive; they are also centralized, requiring that all representations of games, that exploits the locality of inter-
relevant payoff information be available to a single agent (or action that almost always exists in complex multi-agent in-
computer) who must determine the entire equilibrium profile. teractions, and allows games with large numbers of agents
In this paper, we exploit two ideas to address these problems. to be described compactly. Our representation is based
We consider structured game representations, where the in- on the graphical game framework of Kearns, Littman, and
teraction between the agents is sparse, an assumption that
Singh (KLS hereafter), which applies to simultaneous-move
holds in many real-world situations. We also consider the
slightly relaxed task of finding an approximate equilibrium. games. We wish to find algorithms that can take advantage
We present two algorithms for finding approximate equilib- of this structure to find good strategy profiles effectively, and
ria in these games, one based on a hill-climbing approach in a decentralized way.
and one on constraint satisfaction. We show that these al- It turns out that this goal is much easier to achieve when
gorithms exploit the game structure to achieve faster compu- solving a relaxed problem. While philosophically satisfy-
tation. They are also inherently local, requiring only limited ing, the Nash equilibrium requirement is often overly strin-
communication between directly interacting agents. They can gent. Although agents arguably strive to maximize their ex-
thus be scaled to games involving large numbers of agents, pected utility, in practice inertia or a sense of commitment
provided the interaction between the agents is not too dense. will cause an agent to abide by an agreed equilibrium even
if it is slightly suboptimal for him. Thus, it often suffices to
1 Introduction require that the strategy profile form an approximate equi-
Consider a system consisting of multiple interacting agents, librium, one where each agent’s incentive to deviate is no
collaborating to perform a task. The agents have to inter- more than some small . 

act with each other to make sure that the task is completed, We present two techniques for finding approximate equi-
but each might still have slightly different preferences, e.g., libria in structured games. The first uses a greedy hill-
relating to the amount of resources each expends in complet- climbing approach to optimize a global score function,
ing its part of the task. whose global optima are precisely equilibria. The sec-
The framework of game theory (von Neumann & Mor- ond uses a constraint satisfaction approach over a discretized
genstern 1944; Fudenberg & Tirole 1991) tells us that we space of agent strategies; somewhat surprisingly, the algo-
should represent a multi-agent interaction as a game, and rithm of KLS turns out to be a special case of this algorithm.
find a strategy profile that forms a Nash equilibrium (Nash We show that these algorithms allow the agents to determine
1950). We can do so using one of several algorithms for a joint strategy profile using local communication between
finding equilibria in games. (See (McKelvey & McLennan agents. We present some preliminary experimental results
1996) for a survey.) Unfortunately, this approach is severely over randomly generated single-stage games, where we vary
limited in its ability to handle complex multi-agent interac- the number of agents and the density of the interaction. Our
tions. First, in most cases, the size of the standard game rep- results show that our algorithms can find high-quality ap-
resentations grows exponentially in . Second, for games in- proximate equilibria in much larger games than have been
volving more than two players, existing solution algorithms previously solved.
scale extremely poorly even in the size of the game represen-
tation. Finally, all of the standard algorithms are based on a 2 Graphical games
In this section, we introduce some basic notation and ter-


Copyright c 2002, American Association for Artificial Intelli-


gence (www.aaai.org). All rights reserved. minology for game theory, and describe the framework of
graphical games. to build on his land — a factory, a shopping mall, or a resi-
The conceptually simplest and perhaps best-studied rep- dential complex. His utility depends on what he builds and
resentation of game is the normal form. In a normal form on what is built north, south,K and across the road from his
game, each player (agent)  chooses an action  from its land. All of the decisions are made simultaneously. In this
action set 
  
  . For simplicity of notation, we case, agent hk ’s parents are  , hZU2l and hZ  . Note that
assume that     . The players are also
_
the normal form representation consists of g matrices each
allowed to play mixed strategies !"#$%!" &!' (  !' ) where
<

of size m , whereas in the graphical game, each matrix has
! +* is the probability that  plays  +* . If the player assigns size at most m nipo=, (agents at the beginning and end of the
probability 1 to one action — ! +* -, — and zero to the oth- road have smaller matrices).
ers, it is said to be playing a pure strategy, which we denote If we modify the problem slightly and assume that the pre-
as . +* . We use / to denote a strategy profile for the set of vailing wind is from east to west, so that agents on the east
players, and define 01/(23& &! 64 5 to be the same as / except that side are not concerned
K with what is built acrossK the street,
7 plays ! 4 instead of !" . thenK we haveK an asymmetric graphical game, where agent
Each player also has an associated payoff matrix 89 that hj ’s parents are &
hZU2l'
hj _  , whereas agent  ’s parents
specifies the payoff, or utility, for player : under each of the are U2l   .
  possible combinations of strategies: 8;<0%= < " (   <> 5 is _
the reward for 3 when for all ? , =* plays  * . Given a profile 3 Function Minimization
/ , we define the expected utility (or payoff) for
 as Our first algorithm uses a hill-climbing approach to find an
@
<01/ 5  A ! & B ((
!'" H 8I
01& B
 < E ((( <" H 5  approximate equilibrium. We define a score function that
CBD  EFD+G+G+G D  H measures the distance of a given strategy profile away from
an equilibrium. We then use a greedy local search algorithm
Given a set of mixed strategies / , one strategy per player, that starts from a random initial strategy profile and gradu-
we define the regret of  with respect to / to be the most ally improves the profile until a local maximum of the score
 can gain (on expectation) by diverging from the strategy function is reached.
profile / :JLKM More precisely, for a strategy profile / , we define qZ06/ 5 to
@ @ JrKM
be the sum of the regrets of the players:

01/ 5 ONQS<P"T R 0
 0&0U!>2&
! 4 5&5WV &01/ 5
5 

qZ06/ 5 pA &01/ 5 
A Nash equilibrium is a set of mixed strategies / where each 
player’s regret is 0. The Nash equilibrium condition means
that no player can increase his expected reward by unilater- This function is nonnegative and is equal to 0 exactly when /
ally changing his strategy. The seminal result of game the- is a Nash equilibrium. It is continuous in each of the separate
ory is that any game has at least one Nash equilibrium (Nash probabilities !'s* but nondifferentiable.
1950) in mixed strategies. An -approximate Nash equilib- We can minimize qk01/ 5 using a variety of function min-
rium is a strategy profile / such that each player’s regret is


imization techniques that apply to continuous but non-


at most . 
differentiable functions. In the context of unstructured
A graphical game (Kearns, Littman, & Singh 2001a) as- games, this approach has been explored by (McKelvey
sumes that each player’s reward function depends on the ac- 1992). More recently, LaMura and Pearson (2001) have ap-
tions of a subset of the players rather than on all other play- plied simulated annealing to this task. We chose to explore
ers’ actions. Specifically,  ’s utility depends on the actions greedy hill climbing, as it lends itself particularly well to
of some subset X  of the other players, as well as on its own exploiting the special structure of the graphical game.
action. Thus, each player’s payoff matrix 8  depends only Our algorithm repeatedly chooses a player and changes
on Y XZ<Y[, different decision variables, and therefore has that player’s strategy so as to maximally improve the global
]\ ^  \ _  entries instead of   . We can describe this type score. More precisely, we define the gain for a player  as
the amount that global score function would decrease if 
K game using a directed graph 06`a <b 5 . The nodes in `
of
changed its strategy so as to minimize the score function:
correspond to the players, and we have a directed edge
c0d  % * 5fe b from  to * if  e X * , i.e., if ? ’s util- t

01/ 5 uNQS<P'T R qk01/ 5xV qZ0&06/l2&
! 4 &5 5zy 
ity depends on  ’s strategy. Thus, the parents of  in the wv
graph are the players on whose action  ’s value depends.
We note that our definition is a slight extension of the defini- Note that this is very different from having the player change
tion of KLS, as they assumed that the dependency relation- its strategy to the one that most improves its own utility.
ship between players was symmetric, so that their graph was Here, the player takes into consideration the effects of its
undirected. strategy change on the other players.
Our algorithm t first chooses an initial random strategy /
Example 1: Consider the following example, based on a and calculates <01/ 5 for each : . It then iterates over the
similar example in (Koller & Milch 2001). Suppose a road following steps:
is being built from north to south through undeveloped land, t
and g Kagents have purchased plots of land along the road t
1. Choose the player  for which <01/ 5 is largest.
K 2. If <01/ 5 is positive, update !"{s argmaxS<T qk01/ 5|V
— the agents hi" ((F
hj on the west side and the agents v
 (((  on the east side. Each agent needs to choose what qZ0
01/}23
&! 4 5&5~y ; otherwise, stop.
t
t
3. For each player * such that *>01/ 5 may have changed, of mixed strategies, so that the tables represent a discrete
recalculate Jr*>K01M / 5 . grid of the players’ strategy profiles. Since this variant does
Notice that  06/ 5 depends only on the strategies of  not explore the entire strategy space, it is limited to finding
and its parents in / . Thus changing a player’s strategy only approximate equilibria. (Two other variants (KLS 2001a;
affects the terms of the score function corresponding to that 2001b) compute exact equilibria, but only apply in the very
player and its children. We can use this to implement steps limited case of two actions per player.)
(2) and (3) efficiently. A somewhat laborious yet straight- It turns out that the KLS algorithm can be viewed as ap-
forward algebraic analysis shows that: plying nonserial dynamic programming or variable elimi-
nation (Bertele & Brioschi 1972) to a constraint satisfaction
t following optimization problem is equiv-
Proposition 2: The
problem (CSP) generated by the graphical game. In this sec-
alent to finding &06/ 5 and the maximizing !  4 :
@  @ tion, we present a CSP formulation of the problem of find-
Maximize:  0&06/ 23 &! 64 5
5xV *  ^ 0 * V
* 0
01/ 23 &! ~4 5&5&5 ing Nash equilibria in a general graphical game, and show
Subject to: ! 
4   how variable elimination can be applied to solve it. Unlike
! @ 4 -  , the KLS algorithm, our algorithm also applies to asymmet-
 * * 0&
0 01/ 23 &! 4 5 2=*
. * 5&5  ?  ric and non-tree-structured games. We can also solve the
@ problem as a constrained optimization rather than a con-
As the expected utility functions * are linear in the ! 
4 ,
straint satisfaction problem, potentially improving the com-
this optimization problem is simply a linear program whose
putational performance of the KLS approach.
parameters are the strategy probabilities of player ] , and
whose coefficients involve the utilities only of
 and its chil- Constraint Satisfaction There are many ways of formulat-
dren. Thus, the player 3 can optimize its strategy efficiently, ing the -equilibrium problem as a CSP. Most simply, each
variable `7 corresponds to the player  and takes values in


based only on its own utility function and that of its children
in the graph. We can therefore execute the optimization in the strategy space of  . The constraints   ensure that each
step (2) efficiently. In our asymmetric Road player has regret at most in response to the strategies of its
K example, an 

agent h  could optimize K its


K strategy based only on its chil- parents. (Recall that the each player’s regret depends only
dren — h %2] and h   ; similarly, an agent  needs to con- its strategy and those of its parents.) Specifically, the “legal”
_
sider its children — %2] ,   and h  . set for   is JrKM
_
To execute step (3), we note that when  changes its strat- >$%!  0%! * 5 * ^
 )kY
 0U!  & ! ^  5 
egy, the regrets of 3 and its children change; and when the


regret of =* changes, the gains of * and its parents change. This constraint is over all of the variables in X    :< .
More formally, when we change the strategy of  , the lin- The variables in this CSP have continuous domains,
ear program for some other player * changes only if one of which means that standard techniques for solving CSPs do
the expected utility terms changes. Since we only have such not directly apply. We adopt the gridding technique pro-
terms over =* and its children, and the payoff of a player is posed by KLS, which defines a discrete value space for each
t only if the strategy at one of its parents changes,
affected variable. Thus, the size of these constraints is exponential in
then *06/ 5 will change only if the strategy of * , or one of the maximum family size (number of neighbors of a node),
its parents, its children, or its spouses (other parents of its with the base of the exponent growing with the discretization
children) is changed. (Note the intriguing similarity to the density.
definition of a Markov blanket in Bayesian networks (Pearl Variable elimination is a general-purpose nonserial dy-
1988).) Thus, in step (3), we only need to update the gain namic programming algorithm that has been applied to sev-
of a limited number of players. In our Road example, if eral frameworks, including CSPs. Roughly speaking, we
K
we change the strategy for K k
h  , we K need to update the gain eliminate variables one at a time, combining the constraints
of: hZ%2] and hZ  (both parents and children);  (only a relating to that variable into a single constraint, that de-
_
parent); and hkU2 , hj , %2] , and   (spouses). scribes the constraints induced over its neighboring vari-
_ _
We note that our hill climbing algorithm is not guaranteed ables. We briefly review the algorithm in the context of the
to find a global minimum of qk01/ 5 . However, we can use constraints described above.
a variety of techniques such as random restarts in order to Example 3 : Consider the three-player graphical game
have a better chance of finding a good local minimum. Also, shown in Fig. 1(a), where we have discretized @ the strategy
local minima that we find are often fairly good approximate space of ` into three strategies and those of and  into
equilibria (since the score function corresponds quite closely two strategies. Suppose we have chosen an such that the
constraints@ for ` and  are given by Fig. 1(b),(c) (the con-


to the quality of an approximate equilibrium).


@
straint for is not shown). The constraint for ` , for exam-
4 CSP algorithms ple, is indexed by the strategies of and ` ; a ‘Y’ in the
Our second approach to solving graphical games uses a @
table denotes that ` ’s strategy has at most regret with re-
spect@ to ’s strategy. Eliminating ` produces a constraint


very different approach, motivated by the recent work of


Kearns, Littman, and Singh (2001a; 2001b). They propose over and  as shown in Fig. 1(d). Consider the 0] &hi 5
a dynamic programming style algorithm for the special case entry of the resulting constraint. We check each possible
when the graphical game is a symmetric undirected tree. strategy for ` . If ` were playing > , then ` would not have
Their algorithm has several variants. For our purposes, acceptable regret with respect to ] , and  ’s strategy, hi ,
the most relevant (KLS 2001a) discretizes each player’s set would not have acceptable regret with respect to  . If `
]  ] 3 action Road game (which is discussed in Section 6), to guar-
  .6 .4 antee a +g -approximate equilibrium, the KLS discretization
 Y Y  .1 .2 would need to be approximately  o , which means we
U  Y  .5 0 would need about 1250 grid points per strategy.
(b) (e) Cost Minimization
 "   " 
An alternative to viewing the regret bounds as hard con-
V h  Y h  .4 .1 .5 straints is to try to directly reduce the worst-case regret over
hj Y hZ .7 .4 .2 the players. This approach, which is a variant of a cost-
(c) (f) minimization problem (CMP), allows us to choose an arbi-
W l  ]  trary grid density and find the best equilibrium for that den-
hi Y Y hi .1 .2 sity. In our CMP algorithm, we replace the constraints with
(a) hZ Y hZ .4 .2 tables which have the same structure but instead of contain-
(d) (g) ing ‘Y’ or being blank, they simply contain the regret of the
player under that set of strategies. More precisely, we have
Figure 1: (a) A simple 3-player graphical game. (b) Con- one initial factor for each player  , which contains
JLKM one en-
straint table for ` . (c) Constraint table for  . (d) Con- try for each possible strategy profile 0%!C&
!  5 for 3 and its
parents XZ . The value of this entry is simply
^ &0U!'
&! ^  5 .
straint table after elimination of ` . (e) Regret table for ` .
(f) Regret table for  . (g) Regret table after elimination of (As we discussed, regret only depends on the strategies of
` . the player and his parents.)
Example 4: Consider again the three-player graphical game
 , ` ’s strategy would be acceptable with re- of Fig. 1(a). The regret tables for `a  @
are shown in
@
were playing
Fig. 1(e),(f). Eliminating ` produces a table over and
spect to ’s, but  ’s would not be acceptable with respect
to ` ’s. However, if ` were playing  , then both ` and   , shown in Fig. 1(g). Consider the 0  &h  5 entry of the
would be playing acceptable strategies. As there is a value resulting table. We check each possible strategy for ` . If
of ` which will produce an acceptable completion, the en- ` plays  , then ` would have regret .4 with respect to ] ,
try in the corresponding table is ‘Y’. The 0  &h  5 entry is and  ’s strategy, h  , would have regret .4 with respect to
not ‘Y’ since there is no strategy of ` which will ensure that  ; thus, we can only obtain a minimal regret of .4 when `
both ` and  are playing acceptably. plays  . If ` plays  , ` would have regret 0, but  would
have regret .5, so the minimum regret over all players is .5.
In general, we can eliminate variables one by one, until Finally, if ` plays  , then ` would have regret .2 and 
we are left with a constraint over a single variable. If the would have regret .1, for a minimum regret of .2. Thus, the
domain of this variable is empty, the CSP is unsatisfiable. minimum value over all strategies of ` of the lowest achiev-
Otherwise, we can pick one of its legal values, and execute able regret is .2.
this process in reverse to gradually extend each partial as-
signment to a partial assignment involving one additional More generally, our elimination step in the CMP algo-
variable. Note that we can also use this algorithm to find all rithm is similar to the CSP algorithm, except that now
solutions to the CSP: at every place where we have several the entry in the table is the minimum achievable value
legal assignments to a variable, we pursue all of them rather (over strategies of the eliminated player) of the maximum
than picking one. over all tables  involving  the eliminated player. More pre-
For undirected trees, using an “outside-in” elimination or- cisely, let  " (F  be a set of factors each contain- 
der, variable elimination ends up being very similar to the ing 3 , and let  * be the set of nodes contained  in * .
When we eliminate  , we generate a new factor over
  *    * V 
  as follows: For
KLS algorithm. We omit details for lack of space. How-
ever, the variable elimination algorithm also applies as is to the variables  
a
graphical games that are not trees, and to asymmetric games. given set of  policies ! , the corresponding entry  in is
Furthermore, the realization that our algorithms are simply N
S NQP"R * * 0U! &!' 5   y . Each entry in a factor * cor-
solving a CSP opens the door to the application of alterna- responds to some v strategy profile for the players in  * . In-
tive CSP algorithms, some of which might perform better in tuitively, it represents an upper bound on the regret of some
certain types of games. of these players, assuming this strategy profile is played. To
Note that the value of is used in the CSP algorithm

eliminate  , we consider all of his strategies, and choose the
to define the constraints; if we run the algorithm with too one that guarantees us the lowest regret.
coarse a grid, it might return an answer that says that no After eliminating all of the players, the result is the best
such equilibrium exists. Thus to be sure of obtaining an -  achievable worst-case regret — the one that achieves the
optimal equilibrium, we must choose the grid according to minimal regret for the player whose regret is largest. The
the bound provided by KLS. Fortunately, the proof given is associated completion is precisely the approximate equilib-
not specific to undirected trees, and thus we are provided rium that achieves the best possible . We note that the CSP


with a gridding density (which is exponential only in the algorithm essentially corresponds to first rounding the en-
maximum family size) which will guarantee we find a so- tries in the CMP tables to either 0 or 1, using as the round-


lution. Unfortunately, the bound is usually very pessimistic ing cutoff, and then running CMP; an assignment is a solu-
and leads to unreasonably fine grids. For example, in a 2- tion to the CSP iff it has value 0 in the CMP.
Finally, note that all of the variable elimination algorithms all of the players’ strategies must be a best response to the
naturally use local message passing between players in the strategy profiles of their parents. Our decomposition guar-
game. In the tree-structured games, the communication di- antees this property for all the players besides ] . To satisfy
rectly follows the structure of the graphical game. In more the best-response requirement for  we must address two
complex games, the variable elimination process might lead issues. First, it may be the case that for a particular strategy
to interactions between players that are not a priori directly choice of 3 , there is no total equilibrium, and thus we may
related to each other. In general, the communication will have to try several (or all) of his strategies in order to find
be along edges in the triangulated graph of the graphical an equilibrium. Second, if  has parents in both subgames,
game (Lauritzen & Spiegelhalter 1988). However, the com- we must consider both subgames when reasoning about  ,
munication tends to stay localized to “regions” in the graph, eliminating our ability to decouple them. Our algorithm be-
except for graphs with many direct interactions between “re- low addresses both of these difficulties.
mote” players.

ters  (( , where each 
 
We decompose the graph into a set of overlapping clus-

] ((( 173 . These
clusters are organized into a tree . If  and are two
5 Hybrid algorithms
We now present two algorithms that combine ideas from the
two techniques presented above, and which have some of the


neighboring clusters, we define  to be the intersection
 . If 3 e is such that XZ 
, then we say
that 7 is associated with . If all of a node’s parents are
advantages of both. contained in two clusters (and are therefore in the separator
Approximate equilibrium refinement between them), we associate it arbitrarily with one cluster or
the other.

One problem with the CSP algorithm is the rapid growth
of the tables as the grid resolution increases. One solution Definition 5: We say that is a cluster tree for a graphical


is to find an approximate equilibrium using some method, game if the following conditions hold:
construct a fine grid around the region of the approximate
Running intersection: If  e  and  e
equilibrium strategy profile, and use the CMP or CSP al-
3 is also in every  that is on the (unique) path in
then


gorithms to find a better equilibrium over that grid. If we
between  and .
find a better equilibrium in this finer grid, we recenter our
No interaction: All 3 are associated with a cluster.
grid around this point, shifting our search to a slightly dif-
ferent part of the space. If we do not find a better equilibrium The no interaction condition implies that the best response
with the specified grid granularity, we restrict our search to criterion for players in the separator involves at most one of
a smaller part of the space but use a finer grid. This process the two neighboring clusters, thereby eliminating the inter-
is repeated until some threshold is reached. action with both subgames.
Note that this strategy does not guarantee that we will We now use a CSP to find an assignment to the separators
eventually get to an exact equilibrium. In some cases, our
first equilibrium might be at a region where there is a local
 
that is consistent with some global equilibrium. We have
one CSP variable for each separator  , whose value space
are joint strategies ! for the players in the separator. We
minimum of the cost function, but no equilibrium. In this
case, the more refined search may improve the quality of
the approximate equilibrium, but will not lead to finding an 
have a binary constraint for every pair of neighboring sepa-
rators  and that is satisfied iff there exists a strategy
profile / for for which the following conditions hold:
  
exact equilibrium.
Subgame decomposition 1. / is consistent with the separators ! and ! .
A second approach is based on the idea that we can de- 2. For each 3 associated with , the strategy !' is an -
best response to !  ; note that all of 3 ’s parents are in


compose a single large game into several subgames, solve ^


, so their strategies are specified.
  
each separately, and then combine the results to get an equi-
librium for the entire game. We can implement this general It is not hard to show that an assignment ! for the sep-
scheme using an approach that is motivated by the clique arators that satisfies all these constraints is consistent with
tree algorithm for Bayesian network inference (Lauritzen & an approximate global equilibrium. First, the constraints as-
Spiegelhalter 1988). sert that there is a way of completing the partial strategy
To understand the intuition, consider a game that is com- profile with a strategy profile for the players in the clusters.
posed of two almost independent subgames. Specifically, we Second, the running intersection property implies that if a
can divide the players into two groups  and whose player appears in two clusters, it appears in every separa-
only overlap is the single player  . We assume that the tor along the way; condition (1) then implies that the same
games are independent given  , in other words, for any

?  : , if * e 
 , then X *  . If we fix a strategy
strategy is assigned to that player in all the clusters where it
appears. Finally, according to the no interaction condition,
!' of 7 , then the two halves of the game no longer interact. each player is associated with some cluster, and that cluster
Specifically, we can find an equilibrium for the players in specifies the strategies of its parents. Condition (2) then tells
 , ensuring that the players’ strategies are a best response us that this player’s strategy is an -best response to its par-
both to each other’s strategies and to !" , without consider-


ents. As all players are playing -best responses, the overall




ing the strategies of players in the other cluster. However, strategy profile is an equilibrium.
we must make sure that these strategy profiles will combine There remains the question of how we determine the ex-
to form an equilibrium for the entire game. In particular, istence of an approximate equilibrium within a cluster given
strategy profiles for the separators. If we use the CSP algo- We also tested the algorithms on symmetric 3-action
rithm, we have gained nothing: using variable elimination games structured as a ring of rings, with payoffs chosen at
within each cluster is equivalent to using variable elimina- random from (, y . The results are shown in Fig. 2(c),(d).
tion (using some particular ordering) over the entire CSP. v
For the graph shown, we varied the number of nodes on the
However, we can solve each subgame using our hill climb- internal ring; each node on the inner ring is also part of an
ing approach, giving us yet another hybrid algorithm — one outer ring of size 20. Thus, the games contain as many as
where a CSP approach is used to combine the answers ob- 400 nodes. For this set of results, we set the gridding den-
tained by the hill-climbing algorithm in different clusters. sity for cost minimization to  , so there were 6 strategies
per node. The reduced strategy space explains why the algo-
6 Experimental Results rithm is so much faster than the refinement hybrid: each step
of the hybrid is similar to an entire run of cost minimization
We tested hill climbing, cost minimization, and the approx- (for these graphs, the hybrid is run approximately 40 times).
imate equilibrium refinement hybrid on two types of games. The errors obtained by the different algorithms are some-
The first was the Road game described earlier. We tested what different in the case of rings of rings. Here, refinement
two different types of payoffs. One set of payoffs corre- only improves accuracy by about a factor of 2, while cost
sponded to a situation where each developer can choose to minimization is quite accurate. In order to explain this, we
build a park, a store, or a housing complex; stores want to tested simple rings, using cost minimization over only pure
be next to houses but next to few other stores; parks want strategies. Based on 1000 trial runs, for 20 player rings, the
to be next to houses; and houses want to be next to exactly best pure strategy equilibria has  23.9% of the time;
between e (  y 45.8% of the time; e   , y 25.7%;


one store and as many parks as possible. This game has pure
and - v
  , 4.6%. We also tested (but did notv include results
 

strategy equilibria for all road lengths; thus, it is quite easy 

to solve using cost minimization where only the pure strate- for) undirected trees with random payoffs. Again, using a
gies of each developer are considered. A 200 player game low gridding density for variable elimination, we obtained
can be solved in about 1 second. For the same 200 player results similar to those for rings of rings. Thus, it appears
game, hill climbing took between 10 and 15 seconds to find that, with random payoffs, fairly good equilibria often exist
an approximate equilibrium with between .01 and .04 (the


in pure strategies.
payoffs range from 0 to 2). Clearly the discretization density of cost minimization has
In the other payoff structure, each land developer plays a huge effect on the speed of the algorithm. Fig. 2(e)&(f)
a game of paper, rock, scissors against each of his neigh- shows the results for CMP using different discretization lev-
bors; his total payoff is the sum of the payoffs in these sep- els as well as for hill climbing, over simple rings of various
arate games, so that the maximum payoff per player is 3. sizes with random payoffs in [0,1]. The level of discretiza-
This game has no pure strategy equilibria; thus, we need to tion impacts performance a great deal, and also noticeably
choose a finer discretization in order to achieve reasonable affects solution quality. Somewhat surprisingly, even the
results. Fig. 2(a),(b) shows the running times and equilibria lowest level of discretization performs better than hill climb-
quality for each of the three algorithms. Cost minimization ing. This is not in general the case, as variable elimination
was run with a grid density of +g (i.e., the allowable strate- may be intractable for games with high graph width.
gies all have components that are multiples of +g ). Since In order to get an idea of the extent of the improvement
each player has three possible actions, the resulting grid has relative to standard, unstructured approaches, we converted
21 strategies per player. The hybrid algorithm was run start- each graphical game into a corresponding strategic form
ing from the strategy computed by hill-climbing. The nearby game (by duplicating entries), which expands the size of the
area was then discretized so as to have 6 strategies per player game exponentially. We then attempted to find equilibria us-
within a region of size roughly   around the current equi- ing the available game solving package Gambit1 specifically
librium. We ran the hybrid as described above until the total using the QRE algorithm with default settings. (QRE seems
size is less than  , . to be the fastest among the algorithms implemented in Gam-
Each algorithm appears to scale approximately linearly bit). For a road length of 1 (a 2-player game) QRE finds an
with the number of nodes, as expected. Given that the num- equilibrium in 20 seconds; for a road of length 2, QRE takes
ber of strategies used for the hybrid is less than that used 7min56sec; and for a road of length 3, about 2h30min.
for the actual variable elimination, it is not surprising that Overall, the results indicate that these algorithms can find
cost minimization takes considerably longer than the hybrid. good approximate equilibria in a reasonable amount of time.
The equilibrium error is uniformly low for cost minimiza- Cost minimization has a much lower variance in running
tion; this is not surprising as, in this game, the uniform strat- time, but can get expensive when the grid size is large. The
egy 0&," m= (,' m (," m 5 is always an equilibrium. The quality quality of the answers obtained even with coarse grids are
of the equilibria produced by all three algorithms is fairly often surprisingly good, particularly when random payoffs
good, with a worst value of about 10% of the maximum


are used so that there are pure strategy profiles that are al-
payoffs in the game. The error of the equilibria produced most equilibria. Our algorithms provide us with a criterion
by hill climbing grows with the game size, a consequence for evaluating the error of a candidate solution, allowing us
of the fact that the hill-climbing search is over a higher- to refine our answer when the error is too large. In such
dimensional space. Somewhat surprising is the extent to cases, the hybrid algorithm is often a good approach.
which the hybrid approach improves the quality of the equi-
1
libria, at least for this type of game. http://www.hss.caltech.edu/gambit/Gambit.html.
40
1800 1800

1600 1600 35

Execution Time (s)


1400 1400 30
Execution Time (s)

Execution Time (s)


1200 1200 25
1000 1000
20
800 800
15
600 600
10
400 400
5
200 200

0 0 0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size

(a) (c) (e)


0.4 0.12 0.1

0.09
0.35
0.1
0.08

Equilibrium Value
0.3
0.07
Equilibrium Error
Equilibrium Error

0.08
0.25 0.06

0.2 0.06 0.05

0.04
0.15
0.04 0.03
0.1
0.02
0.02
0.05 0.01

0 0
0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size

(b) (d) (f)

Figure 2: Comparison of Algorithms as number of players varies: Dashed for hill climbing, solid for cost minimization, dotted
for refinement. Road games: (a) running time; (b) equilibrium error. Ring of rings: (c) running time; (d) equilibrium error.
CMP on single ring with different grid density and hill climbing in simple ring. Dashed line indicates hill climbing, solid lines
with squares, diamonds, triangles correspond to grid densities of ,  (3 strategies),  (6 strategies), and  m m m (10 strategies)
respectively. (e) running time; (f) equilibrium error.
7 Conclusions References
Bertele, U., and Brioschi, F. 1972. Nonserial Dynamic Program-
ming. New York: Academic Press.
In this paper, we considered the problem of collaboratively Fudenberg, D., and Tirole, J. 1991. Game Theory. MIT Press.
finding approximate equilibria in a situation involving multi- Kearns, M.; Littman, M.; and Singh, S. 2001a. Graphical models
ple interacting agents. We focused on the idea of exploiting for game theory. In Proc. UAI.
the locality of interaction between agents, using graphical Kearns, M.; Littman, M.; and Singh, S. 2001b. An efficient exact
games as an explicit representation of this structure. We algorithm for singly connected graphical games. In Proc. 14th
provided two algorithms that exploit this structure to sup- NIPS.
port solution algorithms that are both computationally effi- Koller, D., and Milch, B. 2001. Multi-agent influence diagrams
cient and utilize distributed collaborative computation that for representing and solving games. In Proc. IJCAI.
respects the “lines of communication” between the agents. LaMura, P. 2000. Game networks. In Proc. UAI, 335–342.
Lauritzen, S. L., and Spiegelhalter, D. J. 1988. Local computa-
Both strongly use the locality of regret: hill climbing in
tions with probabilities on graphical structures and their applica-
the score function, and CSP in the formulation of the con- tion to expert systems. J. Royal Stat. Soc. B 50(2):157–224.
straints. We showed that our techniques provide good solu- McKelvey, R., and McLennan, A. 1996. Computation of equi-
tions for games with a very large number of agents. libria in finite games. In Handbook of Computational Economics,
We believe that our techniques can be applied much more volume 1. Elsevier Science. 87–142.
broadly; in particular, we plan to apply them in the much McKelvey, R. 1992. A Liapunov function for Nash equilibria.
unpublished.
richer multi-agent influence diagram framework of (Koller Nash, J. 1950. Equilibrium points in n-person games. PNAS
& Milch 2001), which provides a structured representation, 36:48–49.
similar to graphical games, but for substantially more com- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems.
plex situations involving time and information. Morgan Kaufmann.
Acknowledgments. We are very grateful to Ronald Parr for many Pearson, M., and La Mura, P. 2001. Simulated annealing of game
useful discussions. This work was supported by the DoD MURI equilibria: A simple adaptive procedure leading to nash equilib-
program administered by the Office of Naval Research under Grant rium. Unpublished manuscript.
N00014-00-1-0637, and by Air Force contract F30602-00-2-0598 von Neumann, J., and Morgenstern, O. 1944. Theory of games
under DARPA’s TASK program. and economic behavior. Princeton Univ. Press.

You might also like