Multi-Agent Algorithms For Solving Graphical Games
Multi-Agent Algorithms For Solving Graphical Games
act with each other to make sure that the task is completed, We present two techniques for finding approximate equi-
but each might still have slightly different preferences, e.g., libria in structured games. The first uses a greedy hill-
relating to the amount of resources each expends in complet- climbing approach to optimize a global score function,
ing its part of the task. whose global optima are precisely equilibria. The sec-
The framework of game theory (von Neumann & Mor- ond uses a constraint satisfaction approach over a discretized
genstern 1944; Fudenberg & Tirole 1991) tells us that we space of agent strategies; somewhat surprisingly, the algo-
should represent a multi-agent interaction as a game, and rithm of KLS turns out to be a special case of this algorithm.
find a strategy profile that forms a Nash equilibrium (Nash We show that these algorithms allow the agents to determine
1950). We can do so using one of several algorithms for a joint strategy profile using local communication between
finding equilibria in games. (See (McKelvey & McLennan agents. We present some preliminary experimental results
1996) for a survey.) Unfortunately, this approach is severely over randomly generated single-stage games, where we vary
limited in its ability to handle complex multi-agent interac- the number of agents and the density of the interaction. Our
tions. First, in most cases, the size of the standard game rep- results show that our algorithms can find high-quality ap-
resentations grows exponentially in . Second, for games in- proximate equilibria in much larger games than have been
volving more than two players, existing solution algorithms previously solved.
scale extremely poorly even in the size of the game represen-
tation. Finally, all of the standard algorithms are based on a 2 Graphical games
In this section, we introduce some basic notation and ter-
based only on its own utility function and that of its children
in the graph. We can therefore execute the optimization in the strategy space of . The constraints ensure that each
step (2) efficiently. In our asymmetric Road player has regret at most in response to the strategies of its
K example, an
regret of =* changes, the gains of * and its parents change. This constraint is over all of the variables in X :< .
More formally, when we change the strategy of , the lin- The variables in this CSP have continuous domains,
ear program for some other player * changes only if one of which means that standard techniques for solving CSPs do
the expected utility terms changes. Since we only have such not directly apply. We adopt the gridding technique pro-
terms over =* and its children, and the payoff of a player is posed by KLS, which defines a discrete value space for each
t only if the strategy at one of its parents changes,
affected variable. Thus, the size of these constraints is exponential in
then *06/ 5 will change only if the strategy of * , or one of the maximum family size (number of neighbors of a node),
its parents, its children, or its spouses (other parents of its with the base of the exponent growing with the discretization
children) is changed. (Note the intriguing similarity to the density.
definition of a Markov blanket in Bayesian networks (Pearl Variable elimination is a general-purpose nonserial dy-
1988).) Thus, in step (3), we only need to update the gain namic programming algorithm that has been applied to sev-
of a limited number of players. In our Road example, if eral frameworks, including CSPs. Roughly speaking, we
K
we change the strategy for K k
h , we K need to update the gain eliminate variables one at a time, combining the constraints
of: hZ%2] and hZ (both parents and children); (only a relating to that variable into a single constraint, that de-
_
parent); and hkU2 , hj , %2] , and (spouses). scribes the constraints induced over its neighboring vari-
_ _
We note that our hill climbing algorithm is not guaranteed ables. We briefly review the algorithm in the context of the
to find a global minimum of qk01/ 5 . However, we can use constraints described above.
a variety of techniques such as random restarts in order to Example 3 : Consider the three-player graphical game
have a better chance of finding a good local minimum. Also, shown in Fig. 1(a), where we have discretized @ the strategy
local minima that we find are often fairly good approximate space of ` into three strategies and those of and into
equilibria (since the score function corresponds quite closely two strategies. Suppose we have chosen an such that the
constraints@ for ` and are given by Fig. 1(b),(c) (the con-
with a gridding density (which is exponential only in the algorithm essentially corresponds to first rounding the en-
maximum family size) which will guarantee we find a so- tries in the CMP tables to either 0 or 1, using as the round-
lution. Unfortunately, the bound is usually very pessimistic ing cutoff, and then running CMP; an assignment is a solu-
and leads to unreasonably fine grids. For example, in a 2- tion to the CSP iff it has value 0 in the CMP.
Finally, note that all of the variable elimination algorithms all of the players’ strategies must be a best response to the
naturally use local message passing between players in the strategy profiles of their parents. Our decomposition guar-
game. In the tree-structured games, the communication di- antees this property for all the players besides ] . To satisfy
rectly follows the structure of the graphical game. In more the best-response requirement for we must address two
complex games, the variable elimination process might lead issues. First, it may be the case that for a particular strategy
to interactions between players that are not a priori directly choice of 3 , there is no total equilibrium, and thus we may
related to each other. In general, the communication will have to try several (or all) of his strategies in order to find
be along edges in the triangulated graph of the graphical an equilibrium. Second, if has parents in both subgames,
game (Lauritzen & Spiegelhalter 1988). However, the com- we must consider both subgames when reasoning about ,
munication tends to stay localized to “regions” in the graph, eliminating our ability to decouple them. Our algorithm be-
except for graphs with many direct interactions between “re- low addresses both of these difficulties.
mote” players.
ters (( , where each
We decompose the graph into a set of overlapping clus-
] ((( 173 . These
clusters are organized into a tree . If and are two
5 Hybrid algorithms
We now present two algorithms that combine ideas from the
two techniques presented above, and which have some of the
neighboring clusters, we define to be the intersection
. If 3 e is such that XZ
, then we say
that 7 is associated with . If all of a node’s parents are
advantages of both. contained in two clusters (and are therefore in the separator
Approximate equilibrium refinement between them), we associate it arbitrarily with one cluster or
the other.
One problem with the CSP algorithm is the rapid growth
of the tables as the grid resolution increases. One solution Definition 5: We say that is a cluster tree for a graphical
is to find an approximate equilibrium using some method, game if the following conditions hold:
construct a fine grid around the region of the approximate
Running intersection: If e and e
equilibrium strategy profile, and use the CMP or CSP al-
3 is also in every that is on the (unique) path in
then
gorithms to find a better equilibrium over that grid. If we
between and .
find a better equilibrium in this finer grid, we recenter our
No interaction: All 3 are associated with a cluster.
grid around this point, shifting our search to a slightly dif-
ferent part of the space. If we do not find a better equilibrium The no interaction condition implies that the best response
with the specified grid granularity, we restrict our search to criterion for players in the separator involves at most one of
a smaller part of the space but use a finer grid. This process the two neighboring clusters, thereby eliminating the inter-
is repeated until some threshold is reached. action with both subgames.
Note that this strategy does not guarantee that we will We now use a CSP to find an assignment to the separators
eventually get to an exact equilibrium. In some cases, our
first equilibrium might be at a region where there is a local
that is consistent with some global equilibrium. We have
one CSP variable for each separator , whose value space
are joint strategies ! for the players in the separator. We
minimum of the cost function, but no equilibrium. In this
case, the more refined search may improve the quality of
the approximate equilibrium, but will not lead to finding an
have a binary constraint for every pair of neighboring sepa-
rators and that is satisfied iff there exists a strategy
profile / for for which the following conditions hold:
exact equilibrium.
Subgame decomposition 1. / is consistent with the separators ! and ! .
A second approach is based on the idea that we can de- 2. For each 3 associated with , the strategy !' is an -
best response to ! ; note that all of 3 ’s parents are in
ing the strategies of players in the other cluster. However, strategy profile is an equilibrium.
we must make sure that these strategy profiles will combine There remains the question of how we determine the ex-
to form an equilibrium for the entire game. In particular, istence of an approximate equilibrium within a cluster given
strategy profiles for the separators. If we use the CSP algo- We also tested the algorithms on symmetric 3-action
rithm, we have gained nothing: using variable elimination games structured as a ring of rings, with payoffs chosen at
within each cluster is equivalent to using variable elimina- random from
(, y . The results are shown in Fig. 2(c),(d).
tion (using some particular ordering) over the entire CSP. v
For the graph shown, we varied the number of nodes on the
However, we can solve each subgame using our hill climb- internal ring; each node on the inner ring is also part of an
ing approach, giving us yet another hybrid algorithm — one outer ring of size 20. Thus, the games contain as many as
where a CSP approach is used to combine the answers ob- 400 nodes. For this set of results, we set the gridding den-
tained by the hill-climbing algorithm in different clusters. sity for cost minimization to
, so there were 6 strategies
per node. The reduced strategy space explains why the algo-
6 Experimental Results rithm is so much faster than the refinement hybrid: each step
of the hybrid is similar to an entire run of cost minimization
We tested hill climbing, cost minimization, and the approx- (for these graphs, the hybrid is run approximately 40 times).
imate equilibrium refinement hybrid on two types of games. The errors obtained by the different algorithms are some-
The first was the Road game described earlier. We tested what different in the case of rings of rings. Here, refinement
two different types of payoffs. One set of payoffs corre- only improves accuracy by about a factor of 2, while cost
sponded to a situation where each developer can choose to minimization is quite accurate. In order to explain this, we
build a park, a store, or a housing complex; stores want to tested simple rings, using cost minimization over only pure
be next to houses but next to few other stores; parks want strategies. Based on 1000 trial runs, for 20 player rings, the
to be next to houses; and houses want to be next to exactly best pure strategy equilibria has
23.9% of the time;
between e
(
y 45.8% of the time; e
, y 25.7%;
one store and as many parks as possible. This game has pure
and - v
, 4.6%. We also tested (but did notv include results
to solve using cost minimization where only the pure strate- for) undirected trees with random payoffs. Again, using a
gies of each developer are considered. A 200 player game low gridding density for variable elimination, we obtained
can be solved in about 1 second. For the same 200 player results similar to those for rings of rings. Thus, it appears
game, hill climbing took between 10 and 15 seconds to find that, with random payoffs, fairly good equilibria often exist
an approximate equilibrium with between .01 and .04 (the
in pure strategies.
payoffs range from 0 to 2). Clearly the discretization density of cost minimization has
In the other payoff structure, each land developer plays a huge effect on the speed of the algorithm. Fig. 2(e)&(f)
a game of paper, rock, scissors against each of his neigh- shows the results for CMP using different discretization lev-
bors; his total payoff is the sum of the payoffs in these sep- els as well as for hill climbing, over simple rings of various
arate games, so that the maximum payoff per player is 3. sizes with random payoffs in [0,1]. The level of discretiza-
This game has no pure strategy equilibria; thus, we need to tion impacts performance a great deal, and also noticeably
choose a finer discretization in order to achieve reasonable affects solution quality. Somewhat surprisingly, even the
results. Fig. 2(a),(b) shows the running times and equilibria lowest level of discretization performs better than hill climb-
quality for each of the three algorithms. Cost minimization ing. This is not in general the case, as variable elimination
was run with a grid density of
+g (i.e., the allowable strate- may be intractable for games with high graph width.
gies all have components that are multiples of
+g ). Since In order to get an idea of the extent of the improvement
each player has three possible actions, the resulting grid has relative to standard, unstructured approaches, we converted
21 strategies per player. The hybrid algorithm was run start- each graphical game into a corresponding strategic form
ing from the strategy computed by hill-climbing. The nearby game (by duplicating entries), which expands the size of the
area was then discretized so as to have 6 strategies per player game exponentially. We then attempted to find equilibria us-
within a region of size roughly
around the current equi- ing the available game solving package Gambit1 specifically
librium. We ran the hybrid as described above until the total using the QRE algorithm with default settings. (QRE seems
size is less than
, . to be the fastest among the algorithms implemented in Gam-
Each algorithm appears to scale approximately linearly bit). For a road length of 1 (a 2-player game) QRE finds an
with the number of nodes, as expected. Given that the num- equilibrium in 20 seconds; for a road of length 2, QRE takes
ber of strategies used for the hybrid is less than that used 7min56sec; and for a road of length 3, about 2h30min.
for the actual variable elimination, it is not surprising that Overall, the results indicate that these algorithms can find
cost minimization takes considerably longer than the hybrid. good approximate equilibria in a reasonable amount of time.
The equilibrium error is uniformly low for cost minimiza- Cost minimization has a much lower variance in running
tion; this is not surprising as, in this game, the uniform strat- time, but can get expensive when the grid size is large. The
egy 0&," m= (,' m (," m 5 is always an equilibrium. The quality quality of the answers obtained even with coarse grids are
of the equilibria produced by all three algorithms is fairly often surprisingly good, particularly when random payoffs
good, with a worst value of about 10% of the maximum
are used so that there are pure strategy profiles that are al-
payoffs in the game. The error of the equilibria produced most equilibria. Our algorithms provide us with a criterion
by hill climbing grows with the game size, a consequence for evaluating the error of a candidate solution, allowing us
of the fact that the hill-climbing search is over a higher- to refine our answer when the error is too large. In such
dimensional space. Somewhat surprising is the extent to cases, the hybrid algorithm is often a good approach.
which the hybrid approach improves the quality of the equi-
1
libria, at least for this type of game. http://www.hss.caltech.edu/gambit/Gambit.html.
40
1800 1800
1600 1600 35
0 0 0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size
0.09
0.35
0.1
0.08
Equilibrium Value
0.3
0.07
Equilibrium Error
Equilibrium Error
0.08
0.25 0.06
0.04
0.15
0.04 0.03
0.1
0.02
0.02
0.05 0.01
0 0
0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size
Figure 2: Comparison of Algorithms as number of players varies: Dashed for hill climbing, solid for cost minimization, dotted
for refinement. Road games: (a) running time; (b) equilibrium error. Ring of rings: (c) running time; (d) equilibrium error.
CMP on single ring with different grid density and hill climbing in simple ring. Dashed line indicates hill climbing, solid lines
with squares, diamonds, triangles correspond to grid densities of ,
(3 strategies),
(6 strategies), and
m
m
m (10 strategies)
respectively. (e) running time; (f) equilibrium error.
7 Conclusions References
Bertele, U., and Brioschi, F. 1972. Nonserial Dynamic Program-
ming. New York: Academic Press.
In this paper, we considered the problem of collaboratively Fudenberg, D., and Tirole, J. 1991. Game Theory. MIT Press.
finding approximate equilibria in a situation involving multi- Kearns, M.; Littman, M.; and Singh, S. 2001a. Graphical models
ple interacting agents. We focused on the idea of exploiting for game theory. In Proc. UAI.
the locality of interaction between agents, using graphical Kearns, M.; Littman, M.; and Singh, S. 2001b. An efficient exact
games as an explicit representation of this structure. We algorithm for singly connected graphical games. In Proc. 14th
provided two algorithms that exploit this structure to sup- NIPS.
port solution algorithms that are both computationally effi- Koller, D., and Milch, B. 2001. Multi-agent influence diagrams
cient and utilize distributed collaborative computation that for representing and solving games. In Proc. IJCAI.
respects the “lines of communication” between the agents. LaMura, P. 2000. Game networks. In Proc. UAI, 335–342.
Lauritzen, S. L., and Spiegelhalter, D. J. 1988. Local computa-
Both strongly use the locality of regret: hill climbing in
tions with probabilities on graphical structures and their applica-
the score function, and CSP in the formulation of the con- tion to expert systems. J. Royal Stat. Soc. B 50(2):157–224.
straints. We showed that our techniques provide good solu- McKelvey, R., and McLennan, A. 1996. Computation of equi-
tions for games with a very large number of agents. libria in finite games. In Handbook of Computational Economics,
We believe that our techniques can be applied much more volume 1. Elsevier Science. 87–142.
broadly; in particular, we plan to apply them in the much McKelvey, R. 1992. A Liapunov function for Nash equilibria.
unpublished.
richer multi-agent influence diagram framework of (Koller Nash, J. 1950. Equilibrium points in n-person games. PNAS
& Milch 2001), which provides a structured representation, 36:48–49.
similar to graphical games, but for substantially more com- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems.
plex situations involving time and information. Morgan Kaufmann.
Acknowledgments. We are very grateful to Ronald Parr for many Pearson, M., and La Mura, P. 2001. Simulated annealing of game
useful discussions. This work was supported by the DoD MURI equilibria: A simple adaptive procedure leading to nash equilib-
program administered by the Office of Naval Research under Grant rium. Unpublished manuscript.
N00014-00-1-0637, and by Air Force contract F30602-00-2-0598 von Neumann, J., and Morgenstern, O. 1944. Theory of games
under DARPA’s TASK program. and economic behavior. Princeton Univ. Press.