0% found this document useful (0 votes)

40 views7 pages

Multi-Agent Algorithms For Solving Graphical Games

Uploaded by

katezq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views7 pages

Multi-Agent Algorithms For Solving Graphical Games

Uploaded by

katezq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Multi-Agent Algorithms for Solving Graphical Games

David Vickrey Daphne Koller

Computer Science Department Computer Science Department
Stanford University Stanford University
Stanford, CA 94305-9010 Stanford, CA 94305-9010
[email protected] [email protected]

Abstract centralized computation paradigm making them unsuitable

for our distributed setting.
Consider the problem of a group of agents trying to find a
stable strategy profile for a joint interaction. A standard ap- We propose an approach that modifies both the represen-
proach is to describe the situation as a single multi-player tation of the game and the notion of a solution. Following
game and find an equilibrium strategy profile of that game. the work of LaMura (2000), Koller and Milch (2001), and
However, most algorithms for finding equilibria are computa- Kearns, Littman, and Singh (2001a), we use a structured
tionally expensive; they are also centralized, requiring that all representations of games, that exploits the locality of inter-
relevant payoff information be available to a single agent (or action that almost always exists in complex multi-agent in-
computer) who must determine the entire equilibrium profile. teractions, and allows games with large numbers of agents
In this paper, we exploit two ideas to address these problems. to be described compactly. Our representation is based
We consider structured game representations, where the in- on the graphical game framework of Kearns, Littman, and
teraction between the agents is sparse, an assumption that
Singh (KLS hereafter), which applies to simultaneous-move
holds in many real-world situations. We also consider the
slightly relaxed task of finding an approximate equilibrium. games. We wish to find algorithms that can take advantage
We present two algorithms for finding approximate equilib- of this structure to find good strategy profiles effectively, and
ria in these games, one based on a hill-climbing approach in a decentralized way.
and one on constraint satisfaction. We show that these al- It turns out that this goal is much easier to achieve when
gorithms exploit the game structure to achieve faster compu- solving a relaxed problem. While philosophically satisfy-
tation. They are also inherently local, requiring only limited ing, the Nash equilibrium requirement is often overly strin-
communication between directly interacting agents. They can gent. Although agents arguably strive to maximize their ex-
thus be scaled to games involving large numbers of agents, pected utility, in practice inertia or a sense of commitment
provided the interaction between the agents is not too dense. will cause an agent to abide by an agreed equilibrium even
if it is slightly suboptimal for him. Thus, it often suffices to
1 Introduction require that the strategy profile form an approximate equi-
Consider a system consisting of multiple interacting agents, librium, one where each agent’s incentive to deviate is no
collaborating to perform a task. The agents have to inter- more than some small .

act with each other to make sure that the task is completed, We present two techniques for finding approximate equi-
but each might still have slightly different preferences, e.g., libria in structured games. The first uses a greedy hill-
relating to the amount of resources each expends in complet- climbing approach to optimize a global score function,
ing its part of the task. whose global optima are precisely equilibria. The sec-
The framework of game theory (von Neumann & Mor- ond uses a constraint satisfaction approach over a discretized
genstern 1944; Fudenberg & Tirole 1991) tells us that we space of agent strategies; somewhat surprisingly, the algo-
should represent a multi-agent interaction as a game, and rithm of KLS turns out to be a special case of this algorithm.
find a strategy profile that forms a Nash equilibrium (Nash We show that these algorithms allow the agents to determine
1950). We can do so using one of several algorithms for a joint strategy profile using local communication between
finding equilibria in games. (See (McKelvey & McLennan agents. We present some preliminary experimental results
1996) for a survey.) Unfortunately, this approach is severely over randomly generated single-stage games, where we vary
limited in its ability to handle complex multi-agent interac- the number of agents and the density of the interaction. Our
tions. First, in most cases, the size of the standard game rep- results show that our algorithms can find high-quality ap-
resentations grows exponentially in . Second, for games in- proximate equilibria in much larger games than have been
volving more than two players, existing solution algorithms previously solved.
scale extremely poorly even in the size of the game represen-
tation. Finally, all of the standard algorithms are based on a 2 Graphical games
In this section, we introduce some basic notation and ter-

Copyright c 2002, American Association for Artificial Intelli-

gence (www.aaai.org). All rights reserved. minology for game theory, and describe the framework of
graphical games. to build on his land — a factory, a shopping mall, or a resi-
The conceptually simplest and perhaps best-studied rep- dential complex. His utility depends on what he builds and
resentation of game is the normal form. In a normal form on what is built north, south,K and across the road from his
game, each player (agent) chooses an action from its land. All of the decisions are made simultaneously. In this
action set

. For simplicity of notation, we case, agent hk ’s parents are , hZU2l and hZ . Note that
assume that . The players are also
_
the normal form representation consists of g matrices each
allowed to play mixed strategies !"#$%!" &!' ( !') where
<

of size m , whereas in the graphical game, each matrix has
! +* is the probability that plays +* . If the player assigns size at most m nipo=, (agents at the beginning and end of the
probability 1 to one action — ! +* -, — and zero to the oth- road have smaller matrices).
ers, it is said to be playing a pure strategy, which we denote If we modify the problem slightly and assume that the pre-
as . +* . We use / to denote a strategy profile for the set of vailing wind is from east to west, so that agents on the east
players, and define 01/(23& &! 64 5 to be the same as / except that side are not concerned
K with what is built acrossK the street,
7 plays ! 4 instead of !" . thenK we haveK an asymmetric graphical game, where agent
Each player also has an associated payoff matrix 89 that hj ’s parents are &
hZU2l'
hj _ , whereas agent ’s parents
specifies the payoff, or utility, for player : under each of the are U2l .
possible combinations of strategies: 8;<0%= <" ( <> 5 is _
the reward for 3 when for all ? , =* plays * . Given a profile 3 Function Minimization
/ , we define the expected utility (or payoff) for
as Our first algorithm uses a hill-climbing approach to find an
@
<01/ 5 A ! & B ((
!'" H 8I
01& B
< E ((( <" H 5 approximate equilibrium. We define a score function that
CBD EFD+G+G+G D H measures the distance of a given strategy profile away from
an equilibrium. We then use a greedy local search algorithm
Given a set of mixed strategies / , one strategy per player, that starts from a random initial strategy profile and gradu-
we define the regret of with respect to / to be the most ally improves the profile until a local maximum of the score
can gain (on expectation) by diverging from the strategy function is reached.
profile / :JLKM More precisely, for a strategy profile / , we define qZ06/ 5 to
@ @ JrKM
be the sum of the regrets of the players:

01/ 5 ONQS<P"T R 0
0&0U!>2&
! 4 5&5WV &01/ 5
5

qZ06/ 5 pA &01/ 5
A Nash equilibrium is a set of mixed strategies / where each
player’s regret is 0. The Nash equilibrium condition means
that no player can increase his expected reward by unilater- This function is nonnegative and is equal to 0 exactly when /
ally changing his strategy. The seminal result of game the- is a Nash equilibrium. It is continuous in each of the separate
ory is that any game has at least one Nash equilibrium (Nash probabilities !'s* but nondifferentiable.
1950) in mixed strategies. An -approximate Nash equilib- We can minimize qk01/ 5 using a variety of function min-
rium is a strategy profile / such that each player’s regret is

imization techniques that apply to continuous but non-

at most .
differentiable functions. In the context of unstructured
A graphical game (Kearns, Littman, & Singh 2001a) as- games, this approach has been explored by (McKelvey
sumes that each player’s reward function depends on the ac- 1992). More recently, LaMura and Pearson (2001) have ap-
tions of a subset of the players rather than on all other play- plied simulated annealing to this task. We chose to explore
ers’ actions. Specifically, ’s utility depends on the actions greedy hill climbing, as it lends itself particularly well to
of some subset X of the other players, as well as on its own exploiting the special structure of the graphical game.
action. Thus, each player’s payoff matrix 8 depends only Our algorithm repeatedly chooses a player and changes
on Y XZ<Y[, different decision variables, and therefore has that player’s strategy so as to maximally improve the global
]\ ^ \ _ entries instead of . We can describe this type score. More precisely, we define the gain for a player as
the amount that global score function would decrease if
K game using a directed graph 06`a <b 5 . The nodes in `
of
changed its strategy so as to minimize the score function:
correspond to the players, and we have a directed edge
c0d % * 5fe b from to * if e X * , i.e., if ? ’s util- t

01/ 5 uNQS<P'T R qk01/ 5xV qZ0&06/l2&
! 4 &5 5zy
ity depends on ’s strategy. Thus, the parents of in the wv
graph are the players on whose action ’s value depends.
We note that our definition is a slight extension of the defini- Note that this is very different from having the player change
tion of KLS, as they assumed that the dependency relation- its strategy to the one that most improves its own utility.
ship between players was symmetric, so that their graph was Here, the player takes into consideration the effects of its
undirected. strategy change on the other players.
Our algorithm t first chooses an initial random strategy /
Example 1: Consider the following example, based on a and calculates <01/ 5 for each : . It then iterates over the
similar example in (Koller & Milch 2001). Suppose a road following steps:
is being built from north to south through undeveloped land, t
and g Kagents have purchased plots of land along the road t
1. Choose the player for which <01/ 5 is largest.
K 2. If <01/ 5 is positive, update !"{s argmaxS<T qk01/ 5|V
— the agents hi" ((F
hj on the west side and the agents v
((( on the east side. Each agent needs to choose what qZ0
01/}23
&! 4 5&5~y ; otherwise, stop.
t
t
3. For each player * such that *>01/ 5 may have changed, of mixed strategies, so that the tables represent a discrete
recalculate Jr*>K01M / 5 . grid of the players’ strategy profiles. Since this variant does
Notice that 06/ 5 depends only on the strategies of not explore the entire strategy space, it is limited to finding
and its parents in / . Thus changing a player’s strategy only approximate equilibria. (Two other variants (KLS 2001a;
affects the terms of the score function corresponding to that 2001b) compute exact equilibria, but only apply in the very
player and its children. We can use this to implement steps limited case of two actions per player.)
(2) and (3) efficiently. A somewhat laborious yet straight- It turns out that the KLS algorithm can be viewed as ap-
forward algebraic analysis shows that: plying nonserial dynamic programming or variable elimi-
nation (Bertele & Brioschi 1972) to a constraint satisfaction
t following optimization problem is equiv-
Proposition 2: The
problem (CSP) generated by the graphical game. In this sec-
alent to finding &06/ 5 and the maximizing ! 4 :
@ @ tion, we present a CSP formulation of the problem of find-
Maximize: 0&06/ 23 &! 64 5
5xV * ^ 0 * V
* 0
01/ 23 &! ~4 5&5&5 ing Nash equilibria in a general graphical game, and show
Subject to: !
4 how variable elimination can be applied to solve it. Unlike
! @ 4 - , the KLS algorithm, our algorithm also applies to asymmet-
* * 0&
0 01/ 23 &! 4 5 2=*
. * 5&5 ? ric and non-tree-structured games. We can also solve the
@ problem as a constrained optimization rather than a con-
As the expected utility functions * are linear in the !
4 ,
straint satisfaction problem, potentially improving the com-
this optimization problem is simply a linear program whose
putational performance of the KLS approach.
parameters are the strategy probabilities of player ] , and
whose coefficients involve the utilities only of
and its chil- Constraint Satisfaction There are many ways of formulat-
dren. Thus, the player 3 can optimize its strategy efficiently, ing the -equilibrium problem as a CSP. Most simply, each
variable `7 corresponds to the player and takes values in

based only on its own utility function and that of its children
in the graph. We can therefore execute the optimization in the strategy space of . The constraints ensure that each
step (2) efficiently. In our asymmetric Road player has regret at most in response to the strategies of its
K example, an

agent h could optimize K its

K strategy based only on its chil- parents. (Recall that the each player’s regret depends only
dren — h %2] and h ; similarly, an agent needs to con- its strategy and those of its parents.) Specifically, the “legal”
_
sider its children — %2] , and h . set for is JrKM
_
To execute step (3), we note that when changes its strat- >$%! 0%! * 5 * ^
)kY
0U! & ! ^ 5
egy, the regrets of 3 and its children change; and when the

regret of =* changes, the gains of * and its parents change. This constraint is over all of the variables in X :< .
More formally, when we change the strategy of , the lin- The variables in this CSP have continuous domains,
ear program for some other player * changes only if one of which means that standard techniques for solving CSPs do
the expected utility terms changes. Since we only have such not directly apply. We adopt the gridding technique pro-
terms over =* and its children, and the payoff of a player is posed by KLS, which defines a discrete value space for each
t only if the strategy at one of its parents changes,
affected variable. Thus, the size of these constraints is exponential in
then *06/ 5 will change only if the strategy of * , or one of the maximum family size (number of neighbors of a node),
its parents, its children, or its spouses (other parents of its with the base of the exponent growing with the discretization
children) is changed. (Note the intriguing similarity to the density.
definition of a Markov blanket in Bayesian networks (Pearl Variable elimination is a general-purpose nonserial dy-
1988).) Thus, in step (3), we only need to update the gain namic programming algorithm that has been applied to sev-
of a limited number of players. In our Road example, if eral frameworks, including CSPs. Roughly speaking, we
K
we change the strategy for K k
h , we K need to update the gain eliminate variables one at a time, combining the constraints
of: hZ%2] and hZ (both parents and children); (only a relating to that variable into a single constraint, that de-
_
parent); and hkU2 , hj , %2] , and (spouses). scribes the constraints induced over its neighboring vari-
_ _
We note that our hill climbing algorithm is not guaranteed ables. We briefly review the algorithm in the context of the
to find a global minimum of qk01/ 5 . However, we can use constraints described above.
a variety of techniques such as random restarts in order to Example 3 : Consider the three-player graphical game
have a better chance of finding a good local minimum. Also, shown in Fig. 1(a), where we have discretized @ the strategy
local minima that we find are often fairly good approximate space of ` into three strategies and those of and into
equilibria (since the score function corresponds quite closely two strategies. Suppose we have chosen an such that the
constraints@ for ` and are given by Fig. 1(b),(c) (the con-

to the quality of an approximate equilibrium).

@
straint for is not shown). The constraint for ` , for exam-
4 CSP algorithms ple, is indexed by the strategies of and ` ; a ‘Y’ in the
Our second approach to solving graphical games uses a @
table denotes that ` ’s strategy has at most regret with re-
spect@ to ’s strategy. Eliminating ` produces a constraint

very different approach, motivated by the recent work of

Kearns, Littman, and Singh (2001a; 2001b). They propose over and as shown in Fig. 1(d). Consider the 0] &hi 5
a dynamic programming style algorithm for the special case entry of the resulting constraint. We check each possible
when the graphical game is a symmetric undirected tree. strategy for ` . If ` were playing > , then ` would not have
Their algorithm has several variants. For our purposes, acceptable regret with respect to ] , and ’s strategy, hi ,
the most relevant (KLS 2001a) discretizes each player’s set would not have acceptable regret with respect to . If `
] ] 3 action Road game (which is discussed in Section 6), to guar-
.6 .4 antee a +g -approximate equilibrium, the KLS discretization
Y Y .1 .2 would need to be approximately o , which means we
U Y .5 0 would need about 1250 grid points per strategy.
(b) (e) Cost Minimization
" "
An alternative to viewing the regret bounds as hard con-
V h Y h .4 .1 .5 straints is to try to directly reduce the worst-case regret over
hj Y hZ .7 .4 .2 the players. This approach, which is a variant of a cost-
(c) (f) minimization problem (CMP), allows us to choose an arbi-
W l ] trary grid density and find the best equilibrium for that den-
hi Y Y hi .1 .2 sity. In our CMP algorithm, we replace the constraints with
(a) hZ Y hZ .4 .2 tables which have the same structure but instead of contain-
(d) (g) ing ‘Y’ or being blank, they simply contain the regret of the
player under that set of strategies. More precisely, we have
Figure 1: (a) A simple 3-player graphical game. (b) Con- one initial factor for each player , which contains
JLKM one en-
straint table for ` . (c) Constraint table for . (d) Con- try for each possible strategy profile 0%!C&
! 5 for 3 and its
parents XZ . The value of this entry is simply
^ &0U!'
&! ^ 5 .
straint table after elimination of ` . (e) Regret table for ` .
(f) Regret table for . (g) Regret table after elimination of (As we discussed, regret only depends on the strategies of
` . the player and his parents.)
Example 4: Consider again the three-player graphical game
, ` ’s strategy would be acceptable with re- of Fig. 1(a). The regret tables for `a @
are shown in
@
were playing
Fig. 1(e),(f). Eliminating ` produces a table over and
spect to ’s, but ’s would not be acceptable with respect
to ` ’s. However, if ` were playing , then both ` and , shown in Fig. 1(g). Consider the 0 &h 5 entry of the
would be playing acceptable strategies. As there is a value resulting table. We check each possible strategy for ` . If
of ` which will produce an acceptable completion, the en- ` plays , then ` would have regret .4 with respect to ] ,
try in the corresponding table is ‘Y’. The 0 &h 5 entry is and ’s strategy, h , would have regret .4 with respect to
not ‘Y’ since there is no strategy of ` which will ensure that ; thus, we can only obtain a minimal regret of .4 when `
both ` and are playing acceptably. plays . If ` plays , ` would have regret 0, but would
have regret .5, so the minimum regret over all players is .5.
In general, we can eliminate variables one by one, until Finally, if ` plays , then ` would have regret .2 and
we are left with a constraint over a single variable. If the would have regret .1, for a minimum regret of .2. Thus, the
domain of this variable is empty, the CSP is unsatisfiable. minimum value over all strategies of ` of the lowest achiev-
Otherwise, we can pick one of its legal values, and execute able regret is .2.
this process in reverse to gradually extend each partial as-
signment to a partial assignment involving one additional More generally, our elimination step in the CMP algo-
variable. Note that we can also use this algorithm to find all rithm is similar to the CSP algorithm, except that now
solutions to the CSP: at every place where we have several the entry in the table is the minimum achievable value
legal assignments to a variable, we pursue all of them rather (over strategies of the eliminated player) of the maximum
than picking one. over all tables involving the eliminated player. More pre-
For undirected trees, using an “outside-in” elimination or- cisely, let " (F be a set of factors each contain-
der, variable elimination ends up being very similar to the ing 3 , and let * be the set of nodes contained in * .
When we eliminate , we generate a new factor over
* * V
as follows: For
KLS algorithm. We omit details for lack of space. How-
ever, the variable elimination algorithm also applies as is to the variables
a
graphical games that are not trees, and to asymmetric games. given set of policies ! , the corresponding entry in is
Furthermore, the realization that our algorithms are simply N
S NQP"R * * 0U! &!' 5 y . Each entry in a factor * cor-
solving a CSP opens the door to the application of alterna- responds to some v strategy profile for the players in * . In-
tive CSP algorithms, some of which might perform better in tuitively, it represents an upper bound on the regret of some
certain types of games. of these players, assuming this strategy profile is played. To
Note that the value of is used in the CSP algorithm

eliminate , we consider all of his strategies, and choose the
to define the constraints; if we run the algorithm with too one that guarantees us the lowest regret.
coarse a grid, it might return an answer that says that no After eliminating all of the players, the result is the best
such equilibrium exists. Thus to be sure of obtaining an - achievable worst-case regret — the one that achieves the
optimal equilibrium, we must choose the grid according to minimal regret for the player whose regret is largest. The
the bound provided by KLS. Fortunately, the proof given is associated completion is precisely the approximate equilib-
not specific to undirected trees, and thus we are provided rium that achieves the best possible . We note that the CSP

with a gridding density (which is exponential only in the algorithm essentially corresponds to first rounding the en-
maximum family size) which will guarantee we find a so- tries in the CMP tables to either 0 or 1, using as the round-

lution. Unfortunately, the bound is usually very pessimistic ing cutoff, and then running CMP; an assignment is a solu-
and leads to unreasonably fine grids. For example, in a 2- tion to the CSP iff it has value 0 in the CMP.
Finally, note that all of the variable elimination algorithms all of the players’ strategies must be a best response to the
naturally use local message passing between players in the strategy profiles of their parents. Our decomposition guar-
game. In the tree-structured games, the communication di- antees this property for all the players besides ] . To satisfy
rectly follows the structure of the graphical game. In more the best-response requirement for we must address two
complex games, the variable elimination process might lead issues. First, it may be the case that for a particular strategy
to interactions between players that are not a priori directly choice of 3 , there is no total equilibrium, and thus we may
related to each other. In general, the communication will have to try several (or all) of his strategies in order to find
be along edges in the triangulated graph of the graphical an equilibrium. Second, if has parents in both subgames,
game (Lauritzen & Spiegelhalter 1988). However, the com- we must consider both subgames when reasoning about ,
munication tends to stay localized to “regions” in the graph, eliminating our ability to decouple them. Our algorithm be-
except for graphs with many direct interactions between “re- low addresses both of these difficulties.
mote” players.

ters (( , where each

We decompose the graph into a set of overlapping clus-

] ((( 173 . These
clusters are organized into a tree . If and are two
5 Hybrid algorithms
We now present two algorithms that combine ideas from the
two techniques presented above, and which have some of the

neighboring clusters, we define to be the intersection
. If 3 e is such that XZ
, then we say
that 7 is associated with . If all of a node’s parents are
advantages of both. contained in two clusters (and are therefore in the separator
Approximate equilibrium refinement between them), we associate it arbitrarily with one cluster or
the other.

One problem with the CSP algorithm is the rapid growth
of the tables as the grid resolution increases. One solution Definition 5: We say that is a cluster tree for a graphical

is to find an approximate equilibrium using some method, game if the following conditions hold:
construct a fine grid around the region of the approximate
Running intersection: If e and e
equilibrium strategy profile, and use the CMP or CSP al-
3 is also in every that is on the (unique) path in
then

gorithms to find a better equilibrium over that grid. If we
between and .
find a better equilibrium in this finer grid, we recenter our
No interaction: All 3 are associated with a cluster.
grid around this point, shifting our search to a slightly dif-
ferent part of the space. If we do not find a better equilibrium The no interaction condition implies that the best response
with the specified grid granularity, we restrict our search to criterion for players in the separator involves at most one of
a smaller part of the space but use a finer grid. This process the two neighboring clusters, thereby eliminating the inter-
is repeated until some threshold is reached. action with both subgames.
Note that this strategy does not guarantee that we will We now use a CSP to find an assignment to the separators
eventually get to an exact equilibrium. In some cases, our
first equilibrium might be at a region where there is a local

that is consistent with some global equilibrium. We have
one CSP variable for each separator , whose value space
are joint strategies ! for the players in the separator. We
minimum of the cost function, but no equilibrium. In this
case, the more refined search may improve the quality of
the approximate equilibrium, but will not lead to finding an
have a binary constraint for every pair of neighboring sepa-
rators and that is satisfied iff there exists a strategy
profile / for for which the following conditions hold:

exact equilibrium.
Subgame decomposition 1. / is consistent with the separators ! and ! .
A second approach is based on the idea that we can de- 2. For each 3 associated with , the strategy !' is an -
best response to ! ; note that all of 3 ’s parents are in

compose a single large game into several subgames, solve ^

, so their strategies are specified.

each separately, and then combine the results to get an equi-
librium for the entire game. We can implement this general It is not hard to show that an assignment ! for the sep-
scheme using an approach that is motivated by the clique arators that satisfies all these constraints is consistent with
tree algorithm for Bayesian network inference (Lauritzen & an approximate global equilibrium. First, the constraints as-
Spiegelhalter 1988). sert that there is a way of completing the partial strategy
To understand the intuition, consider a game that is com- profile with a strategy profile for the players in the clusters.
posed of two almost independent subgames. Specifically, we Second, the running intersection property implies that if a
can divide the players into two groups and whose player appears in two clusters, it appears in every separa-
only overlap is the single player . We assume that the tor along the way; condition (1) then implies that the same
games are independent given , in other words, for any

? : , if * e
, then X * . If we fix a strategy
strategy is assigned to that player in all the clusters where it
appears. Finally, according to the no interaction condition,
!' of 7 , then the two halves of the game no longer interact. each player is associated with some cluster, and that cluster
Specifically, we can find an equilibrium for the players in specifies the strategies of its parents. Condition (2) then tells
, ensuring that the players’ strategies are a best response us that this player’s strategy is an -best response to its par-
both to each other’s strategies and to !" , without consider-

ents. As all players are playing -best responses, the overall

ing the strategies of players in the other cluster. However, strategy profile is an equilibrium.
we must make sure that these strategy profiles will combine There remains the question of how we determine the ex-
to form an equilibrium for the entire game. In particular, istence of an approximate equilibrium within a cluster given
strategy profiles for the separators. If we use the CSP algo- We also tested the algorithms on symmetric 3-action
rithm, we have gained nothing: using variable elimination games structured as a ring of rings, with payoffs chosen at
within each cluster is equivalent to using variable elimina- random from (, y . The results are shown in Fig. 2(c),(d).
tion (using some particular ordering) over the entire CSP. v
For the graph shown, we varied the number of nodes on the
However, we can solve each subgame using our hill climb- internal ring; each node on the inner ring is also part of an
ing approach, giving us yet another hybrid algorithm — one outer ring of size 20. Thus, the games contain as many as
where a CSP approach is used to combine the answers ob- 400 nodes. For this set of results, we set the gridding den-
tained by the hill-climbing algorithm in different clusters. sity for cost minimization to , so there were 6 strategies
per node. The reduced strategy space explains why the algo-
6 Experimental Results rithm is so much faster than the refinement hybrid: each step
of the hybrid is similar to an entire run of cost minimization
We tested hill climbing, cost minimization, and the approx- (for these graphs, the hybrid is run approximately 40 times).
imate equilibrium refinement hybrid on two types of games. The errors obtained by the different algorithms are some-
The first was the Road game described earlier. We tested what different in the case of rings of rings. Here, refinement
two different types of payoffs. One set of payoffs corre- only improves accuracy by about a factor of 2, while cost
sponded to a situation where each developer can choose to minimization is quite accurate. In order to explain this, we
build a park, a store, or a housing complex; stores want to tested simple rings, using cost minimization over only pure
be next to houses but next to few other stores; parks want strategies. Based on 1000 trial runs, for 20 player rings, the
to be next to houses; and houses want to be next to exactly best pure strategy equilibria has 23.9% of the time;
between e ( y 45.8% of the time; e , y 25.7%;

one store and as many parks as possible. This game has pure
and - v
, 4.6%. We also tested (but did notv include results

strategy equilibria for all road lengths; thus, it is quite easy

to solve using cost minimization where only the pure strate- for) undirected trees with random payoffs. Again, using a
gies of each developer are considered. A 200 player game low gridding density for variable elimination, we obtained
can be solved in about 1 second. For the same 200 player results similar to those for rings of rings. Thus, it appears
game, hill climbing took between 10 and 15 seconds to find that, with random payoffs, fairly good equilibria often exist
an approximate equilibrium with between .01 and .04 (the

in pure strategies.
payoffs range from 0 to 2). Clearly the discretization density of cost minimization has
In the other payoff structure, each land developer plays a huge effect on the speed of the algorithm. Fig. 2(e)&(f)
a game of paper, rock, scissors against each of his neigh- shows the results for CMP using different discretization lev-
bors; his total payoff is the sum of the payoffs in these sep- els as well as for hill climbing, over simple rings of various
arate games, so that the maximum payoff per player is 3. sizes with random payoffs in [0,1]. The level of discretiza-
This game has no pure strategy equilibria; thus, we need to tion impacts performance a great deal, and also noticeably
choose a finer discretization in order to achieve reasonable affects solution quality. Somewhat surprisingly, even the
results. Fig. 2(a),(b) shows the running times and equilibria lowest level of discretization performs better than hill climb-
quality for each of the three algorithms. Cost minimization ing. This is not in general the case, as variable elimination
was run with a grid density of +g (i.e., the allowable strate- may be intractable for games with high graph width.
gies all have components that are multiples of +g ). Since In order to get an idea of the extent of the improvement
each player has three possible actions, the resulting grid has relative to standard, unstructured approaches, we converted
21 strategies per player. The hybrid algorithm was run start- each graphical game into a corresponding strategic form
ing from the strategy computed by hill-climbing. The nearby game (by duplicating entries), which expands the size of the
area was then discretized so as to have 6 strategies per player game exponentially. We then attempted to find equilibria us-
within a region of size roughly around the current equi- ing the available game solving package Gambit1 specifically
librium. We ran the hybrid as described above until the total using the QRE algorithm with default settings. (QRE seems
size is less than , . to be the fastest among the algorithms implemented in Gam-
Each algorithm appears to scale approximately linearly bit). For a road length of 1 (a 2-player game) QRE finds an
with the number of nodes, as expected. Given that the num- equilibrium in 20 seconds; for a road of length 2, QRE takes
ber of strategies used for the hybrid is less than that used 7min56sec; and for a road of length 3, about 2h30min.
for the actual variable elimination, it is not surprising that Overall, the results indicate that these algorithms can find
cost minimization takes considerably longer than the hybrid. good approximate equilibria in a reasonable amount of time.
The equilibrium error is uniformly low for cost minimiza- Cost minimization has a much lower variance in running
tion; this is not surprising as, in this game, the uniform strat- time, but can get expensive when the grid size is large. The
egy 0&," m= (,' m (," m 5 is always an equilibrium. The quality quality of the answers obtained even with coarse grids are
of the equilibria produced by all three algorithms is fairly often surprisingly good, particularly when random payoffs
good, with a worst value of about 10% of the maximum

are used so that there are pure strategy profiles that are al-
payoffs in the game. The error of the equilibria produced most equilibria. Our algorithms provide us with a criterion
by hill climbing grows with the game size, a consequence for evaluating the error of a candidate solution, allowing us
of the fact that the hill-climbing search is over a higher- to refine our answer when the error is too large. In such
dimensional space. Somewhat surprising is the extent to cases, the hybrid algorithm is often a good approach.
which the hybrid approach improves the quality of the equi-
1
libria, at least for this type of game. http://www.hss.caltech.edu/gambit/Gambit.html.
40
1800 1800

1600 1600 35

Execution Time (s)

1400 1400 30
Execution Time (s)

Execution Time (s)

1200 1200 25
1000 1000
20
800 800
15
600 600
10
400 400
5
200 200

0 0 0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size

(a) (c) (e)

0.4 0.12 0.1

0.09
0.35
0.1
0.08

Equilibrium Value
0.3
0.07
Equilibrium Error
Equilibrium Error

0.08
0.25 0.06

0.2 0.06 0.05

0.04
0.15
0.04 0.03
0.1
0.02
0.02
0.05 0.01

0 0
0
0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100
Road Length Internal Nodes Ring Size

(b) (d) (f)

Figure 2: Comparison of Algorithms as number of players varies: Dashed for hill climbing, solid for cost minimization, dotted
for refinement. Road games: (a) running time; (b) equilibrium error. Ring of rings: (c) running time; (d) equilibrium error.
CMP on single ring with different grid density and hill climbing in simple ring. Dashed line indicates hill climbing, solid lines
with squares, diamonds, triangles correspond to grid densities of , (3 strategies), (6 strategies), and m m m (10 strategies)
respectively. (e) running time; (f) equilibrium error.
7 Conclusions References
Bertele, U., and Brioschi, F. 1972. Nonserial Dynamic Program-
ming. New York: Academic Press.
In this paper, we considered the problem of collaboratively Fudenberg, D., and Tirole, J. 1991. Game Theory. MIT Press.
finding approximate equilibria in a situation involving multi- Kearns, M.; Littman, M.; and Singh, S. 2001a. Graphical models
ple interacting agents. We focused on the idea of exploiting for game theory. In Proc. UAI.
the locality of interaction between agents, using graphical Kearns, M.; Littman, M.; and Singh, S. 2001b. An efficient exact
games as an explicit representation of this structure. We algorithm for singly connected graphical games. In Proc. 14th
provided two algorithms that exploit this structure to sup- NIPS.
port solution algorithms that are both computationally effi- Koller, D., and Milch, B. 2001. Multi-agent influence diagrams
cient and utilize distributed collaborative computation that for representing and solving games. In Proc. IJCAI.
respects the “lines of communication” between the agents. LaMura, P. 2000. Game networks. In Proc. UAI, 335–342.
Lauritzen, S. L., and Spiegelhalter, D. J. 1988. Local computa-
Both strongly use the locality of regret: hill climbing in
tions with probabilities on graphical structures and their applica-
the score function, and CSP in the formulation of the con- tion to expert systems. J. Royal Stat. Soc. B 50(2):157–224.
straints. We showed that our techniques provide good solu- McKelvey, R., and McLennan, A. 1996. Computation of equi-
tions for games with a very large number of agents. libria in finite games. In Handbook of Computational Economics,
We believe that our techniques can be applied much more volume 1. Elsevier Science. 87–142.
broadly; in particular, we plan to apply them in the much McKelvey, R. 1992. A Liapunov function for Nash equilibria.
unpublished.
richer multi-agent influence diagram framework of (Koller Nash, J. 1950. Equilibrium points in n-person games. PNAS
& Milch 2001), which provides a structured representation, 36:48–49.
similar to graphical games, but for substantially more com- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems.
plex situations involving time and information. Morgan Kaufmann.
Acknowledgments. We are very grateful to Ronald Parr for many Pearson, M., and La Mura, P. 2001. Simulated annealing of game
useful discussions. This work was supported by the DoD MURI equilibria: A simple adaptive procedure leading to nash equilib-
program administered by the Office of Naval Research under Grant rium. Unpublished manuscript.
N00014-00-1-0637, and by Air Force contract F30602-00-2-0598 von Neumann, J., and Morgenstern, O. 1944. Theory of games
under DARPA’s TASK program. and economic behavior. Princeton Univ. Press.

CCC 2022
No ratings yet
CCC 2022
5 pages
Diss PDF
No ratings yet
Diss PDF
174 pages
Game Theory Slides Chapter 1-2
No ratings yet
Game Theory Slides Chapter 1-2
20 pages
Nash Equilibria Detection For Multi-Player Games: Rodica Ioana Lung, Tudor Dan Mihoc, D. Dumitrescu
No ratings yet
Nash Equilibria Detection For Multi-Player Games: Rodica Ioana Lung, Tudor Dan Mihoc, D. Dumitrescu
5 pages
Finding The Nash Equilibria of N - Person Noncooper
No ratings yet
Finding The Nash Equilibria of N - Person Noncooper
24 pages
An Introduction To Game Theory Notes - Latest
No ratings yet
An Introduction To Game Theory Notes - Latest
125 pages
Midterm Review 2: Introduction To Game Theory
No ratings yet
Midterm Review 2: Introduction To Game Theory
24 pages
EC220 - Slides 10
No ratings yet
EC220 - Slides 10
28 pages
Lec 6
No ratings yet
Lec 6
79 pages
Distributed Consensus Protocol For Multi-Agent Differential Graphical Games
No ratings yet
Distributed Consensus Protocol For Multi-Agent Differential Graphical Games
5 pages
Game Thoery PPT - Unit-3 v1
No ratings yet
Game Thoery PPT - Unit-3 v1
56 pages
Lecture 04
No ratings yet
Lecture 04
17 pages
Game-Theoretic Learning in Distributed Control: Jason R. Marden and Jeff S. Shamma
No ratings yet
Game-Theoretic Learning in Distributed Control: Jason R. Marden and Jeff S. Shamma
36 pages
Adversarial Search
No ratings yet
Adversarial Search
109 pages
Algorithms For Computing Approximate Nash Equilibria: Vangelis Markakis
No ratings yet
Algorithms For Computing Approximate Nash Equilibria: Vangelis Markakis
38 pages
Ssentials of AME Heory: A Concise Multidisciplinary Introduction
No ratings yet
Ssentials of AME Heory: A Concise Multidisciplinary Introduction
4 pages
Game Theory
No ratings yet
Game Theory
119 pages
Algorithmic Game Theory 1st Edition Edition Nisan N. (Ed) PDF Download
No ratings yet
Algorithmic Game Theory 1st Edition Edition Nisan N. (Ed) PDF Download
52 pages
Hold Up Game and Many More Extensive Form
No ratings yet
Hold Up Game and Many More Extensive Form
46 pages
Haurie A., An Introduction To Dynamic Games LCTN
No ratings yet
Haurie A., An Introduction To Dynamic Games LCTN
125 pages
Game-Theoretic Analysis Tools: Tuomas Sandholm
No ratings yet
Game-Theoretic Analysis Tools: Tuomas Sandholm
16 pages
Dokumen - Pub Game Theory An Introduction With Step by Step Examples 1nbsped 9783031375767 9783031375743
No ratings yet
Dokumen - Pub Game Theory An Introduction With Step by Step Examples 1nbsped 9783031375767 9783031375743
471 pages
Games, The Mini-Max Algorithm
No ratings yet
Games, The Mini-Max Algorithm
160 pages
Multi Agent Systems - Algorithmic, Theoretic, and Logical Foundations - ToC
0% (1)
Multi Agent Systems - Algorithmic, Theoretic, and Logical Foundations - ToC
10 pages
Aaai2013 Sandholm Poker Ai 01
No ratings yet
Aaai2013 Sandholm Poker Ai 01
90 pages
Game Theory
No ratings yet
Game Theory
32 pages
RandomGames ComputerScience Application
No ratings yet
RandomGames ComputerScience Application
32 pages
Lecture 4-6 Game Theory and Adversarial Search 24
No ratings yet
Lecture 4-6 Game Theory and Adversarial Search 24
173 pages
Quantum Game Theory - I: A Comprehensive Study
No ratings yet
Quantum Game Theory - I: A Comprehensive Study
14 pages
Ahead Procedure, Minimaz Searching: (1) at The Program's
No ratings yet
Ahead Procedure, Minimaz Searching: (1) at The Program's
5 pages
Game Theory Alive
No ratings yet
Game Theory Alive
178 pages
1 s2.0 S0196677409000340 Main
No ratings yet
1 s2.0 S0196677409000340 Main
12 pages
2018 - A Survey On The Combined Use of Optimization Methods and Game Theory
No ratings yet
2018 - A Survey On The Combined Use of Optimization Methods and Game Theory
22 pages
Systems Resource Management
No ratings yet
Systems Resource Management
48 pages
Nectar
No ratings yet
Nectar
1 page
Sequential Games of Perfect Information UPF
No ratings yet
Sequential Games of Perfect Information UPF
58 pages
Game Theory
No ratings yet
Game Theory
4 pages
Generalized Nash Equilibrium Problems
No ratings yet
Generalized Nash Equilibrium Problems
35 pages
Game Theory Lecture2-Algs For Normal Form - 1539935907074
No ratings yet
Game Theory Lecture2-Algs For Normal Form - 1539935907074
32 pages
Game Theo
100% (2)
Game Theo
587 pages
Debasis Sir Game Theory I Notes
No ratings yet
Debasis Sir Game Theory I Notes
172 pages
Game Theory
100% (8)
Game Theory
587 pages
Ai-Unit 3
No ratings yet
Ai-Unit 3
67 pages
Mathematics - Game Theory
100% (4)
Mathematics - Game Theory
115 pages
Multi-Scalegamesrepresentingandsolvinggames Onnetworks
No ratings yet
Multi-Scalegamesrepresentingandsolvinggames Onnetworks
22 pages
Preference-CFR: Beyond Nash Equilibrium For Better Game Strategies
No ratings yet
Preference-CFR: Beyond Nash Equilibrium For Better Game Strategies
16 pages
Module-1 Chapter 5 Game Playing - PPTX - Google Slides
No ratings yet
Module-1 Chapter 5 Game Playing - PPTX - Google Slides
50 pages
Game Theory, Alive. Anna R. Karlin and Yuval Peres PDF
No ratings yet
Game Theory, Alive. Anna R. Karlin and Yuval Peres PDF
397 pages
4 Game Complete
No ratings yet
4 Game Complete
77 pages
FAI: Multiagent Systems: Important Links
No ratings yet
FAI: Multiagent Systems: Important Links
96 pages
Game Theory
No ratings yet
Game Theory
34 pages
Lecture 1
No ratings yet
Lecture 1
24 pages
Winning Monopoly
From Everand
Winning Monopoly
Dr. Glenn Seidman
5/5 (1)
Everyverse RPG
From Everand
Everyverse RPG
Dennis J. Parizek
1/5 (1)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Mechanism Design: Fundamentals and Applications
From Everand
Mechanism Design: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Making Games for Impact
From Everand
Making Games for Impact
Kurt Squire
No ratings yet
The Mathematics of Games of Strategy
From Everand
The Mathematics of Games of Strategy
Melvin Dresher
4/5 (2)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Game Design Praxiology
No ratings yet
Game Design Praxiology
183 pages
Is Social Media Bad For Mental Health and Wellbeing? Exploring The Perspectives of Adolescents
No ratings yet
Is Social Media Bad For Mental Health and Wellbeing? Exploring The Perspectives of Adolescents
13 pages
Multi-Robot Path Planning in A Dynamic en
No ratings yet
Multi-Robot Path Planning in A Dynamic en
19 pages
An Algorithmic Solution To The Couple Casino
No ratings yet
An Algorithmic Solution To The Couple Casino
5 pages
Complexity Results For Some Classes of Strategy Games Fischer - Felix
No ratings yet
Complexity Results For Some Classes of Strategy Games Fischer - Felix
177 pages
1, 2018 Safety & Environmental Health Law 265 - Lecture 1
No ratings yet
1, 2018 Safety & Environmental Health Law 265 - Lecture 1
37 pages
Essay Benefits and Disadvantages Living in Dormitories or in Apartments For Students
No ratings yet
Essay Benefits and Disadvantages Living in Dormitories or in Apartments For Students
5 pages
Final
100% (1)
Final
52 pages
Corporate Foresight A New Frontier For S
No ratings yet
Corporate Foresight A New Frontier For S
2 pages
2 Predictive Testing
No ratings yet
2 Predictive Testing
2 pages
Hint On Final
No ratings yet
Hint On Final
5 pages
AISC DG31 Example 004
No ratings yet
AISC DG31 Example 004
15 pages
Vacancy AD For EDBCA
No ratings yet
Vacancy AD For EDBCA
4 pages
Fundamental of Marketing M Com 3rd Sem
0% (1)
Fundamental of Marketing M Com 3rd Sem
58 pages
Detailed Lesson Plan in Mathematics 4
No ratings yet
Detailed Lesson Plan in Mathematics 4
4 pages
Grade IX Physics Revision For Annual Examination
No ratings yet
Grade IX Physics Revision For Annual Examination
3 pages
Complex Number and Quadratic Equations
No ratings yet
Complex Number and Quadratic Equations
7 pages
ABUBAKAR MODEL 1-Layout1
No ratings yet
ABUBAKAR MODEL 1-Layout1
1 page
M.N.Srinivas - Indian Village
No ratings yet
M.N.Srinivas - Indian Village
40 pages
Industrial & Engineering Chemistry Research Volume 27 Issue 11 1988 [Doi 10.1021%2Fie00083a023] Hufton, Jeffrey R.; Bravo, Jose L.; Fair, James R. -- Scale-up of Laboratory Data for Distillation Columns Containing Cor
No ratings yet
Industrial & Engineering Chemistry Research Volume 27 Issue 11 1988 [Doi 10.1021%2Fie00083a023] Hufton, Jeffrey R.; Bravo, Jose L.; Fair, James R. -- Scale-up of Laboratory Data for Distillation Columns Containing Cor
5 pages
Invariant Interval Minkovski Diagrams Doppler Effect
No ratings yet
Invariant Interval Minkovski Diagrams Doppler Effect
58 pages
Battle of Tirad Pass and Gen. G. Del Pilar
No ratings yet
Battle of Tirad Pass and Gen. G. Del Pilar
31 pages
Literature Review Ubc
100% (2)
Literature Review Ubc
8 pages
Angela Marie Cena - Reflections From The Business
No ratings yet
Angela Marie Cena - Reflections From The Business
4 pages
Relevance of Western School of Philosophy in The Present Education System of Indian
No ratings yet
Relevance of Western School of Philosophy in The Present Education System of Indian
6 pages
Eals 1ST Quarter Reviewer
No ratings yet
Eals 1ST Quarter Reviewer
5 pages
A Study of Task Oriented Teaching in Relation To 21 Century Skills (The 4C'S) Based On Steam Education at Secondary School Level
No ratings yet
A Study of Task Oriented Teaching in Relation To 21 Century Skills (The 4C'S) Based On Steam Education at Secondary School Level
6 pages
Vindicating Sycorax PDF
No ratings yet
Vindicating Sycorax PDF
14 pages
Chapter 3 Exam Review - Summarizing Univariate Data SOLUTIONS
No ratings yet
Chapter 3 Exam Review - Summarizing Univariate Data SOLUTIONS
9 pages
Final Examination Timetable - 2023.2024 Second Semester - Level 100-400
No ratings yet
Final Examination Timetable - 2023.2024 Second Semester - Level 100-400
20 pages
AHRICertificate ARUM264BTE5
No ratings yet
AHRICertificate ARUM264BTE5
1 page
Homogenizing & Emulsification: Lectyre 7
100% (1)
Homogenizing & Emulsification: Lectyre 7
27 pages
9th Social English
No ratings yet
9th Social English
294 pages
Chapter 8 Organizational Leadership
No ratings yet
Chapter 8 Organizational Leadership
9 pages
FMSQ Revision Notes
No ratings yet
FMSQ Revision Notes
5 pages