0% found this document useful (0 votes)
25 views17 pages

Swim Till You Sink: Computing The Limit of A Game

Uploaded by

Cristian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views17 pages

Swim Till You Sink: Computing The Limit of A Game

Uploaded by

Cristian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Swim till You Sink:

Computing the Limit of a Game

Rashida Hakim1 , Jason Milionis1 , Christos Papadimitriou1 , and Georgios Piliouras2

Columbia University
1

([email protected], [email protected], [email protected])


2
Google DeepMind ([email protected])

Abstract. During 2023, two interesting results were proven about the limit behavior of
game dynamics: First, it was shown that there is a game for which no dynamics converges
to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately
capture the limit behavior of natural game dynamics. These two results have created a need
arXiv:2408.11146v1 [cs.GT] 20 Aug 2024

and opportunity to articulate a principled computational theory of the meaning of the game
that is based on game dynamics. Given any game in normal form, and any prior distribution
of play, we study the problem of computing the asymptotic behavior of a class of natural
dynamics called the noisy replicator dynamics as a limit distribution over the sink equilibria
of the game. When the prior distribution has pure strategy support, we prove this distribution
can be computed efficiently, in near-linear time to the size of the best-response graph. When
the distribution can be sampled — for example, if it is the uniform distribution over all mixed
strategy profiles — we show through experiments that the limit distribution of reasonably
large games can be estimated quite accurately through sampling and simulation.

Keywords: Replicator Dynamics · Sink Equilibria.

Fig. 1. The better-response graph of the 3 × 3 × 3 game depicting the hitting probabilities of the pure
profiles as pie charts.
2 R. Hakim et al.

1 Introduction

The Nash equilibrium has been the quintessence of Game Theory. The field started its modern
existence in 1950 with Nash’s definition and existence theorem, and the Nash equilibrium remained
for three quarters of a century its paramount solution concept — the others exist as refinements,
generalizations, or contradistinctions. During the past three decades, during which Game Theory
came under intense computational scrutiny, the Nash equilibrium has lost some of its appeal, as its
fundamental incompatibility with computation became apparent. The Nash equilibrium has been
shown intractable to compute or approximate in normal-form games [17,11,18], while its other well-
known deficiency of computational nature — the ambiguity of non-uniqueness and the quagmire
of equilibrium selection [23] — had already been known.
A long list of game dynamics — that is, dynamical systems, continuous- or discrete-time,
defined on mixed strategy profiles — proposed by economists over the decades are all known to fail
to converge consistently to the Nash equilibrium. This included Nash’s own discrete-time dynamics
used in his proof, the well-known replicator dynamics treated in this paper, and many others. Given
this, the following question acquired some importance:

Question 1: We know that every game has a Nash equilibrium. But does every game have
a dynamics that converges to the Nash equilibria of the game?

A negative answer would be another serious setback for the Nash equilibrium, and impetus
would be added to efforts (see for example [39,34]) to elevate the limit behavior of natural game
dynamics as a proposed “meaning of the game,” an alternative to the Nash equilibrium. An im-
portant obstacle to these efforts was that the nature of the limit behavior of natural dynamics in
general games had been lacking the required clarity. It had been known for 40 years since Con-
ley’s seminal work [15] that the right concept of limit behavior in a general dynamical system is
a system of topological objects known as its chain recurrent components. However, this concept is
mathematically intractable for general dynamical systems: there can be infinitely many such sets,
of unbounded complexity.

Question 2: Is there a concrete characterization, in terms of familiar game-theoretic con-


cepts, of the chain recurrent sets in the special case of natural game dynamics in normal-
form games?

During this past year, there was important progress on both questions.

1. It was proven in [31] that the answer of Question 1 above is negative: there is a game for which
no game dynamics can converge to the Nash equilibria — that is, there is no dynamics such
that the fate of all initial strategy profiles are the Nash equilibria, and the Nash equilibria are
themselves fixed points of the dynamical system. Thus, the Nash equilibrium is fundamentally
incapable of capturing asymptotic player behavior.
2. Biggar and Shames establish a useful characterization of the chain recurrent sets of natural
recurrent dynamics [8]: it was shown that each chain recurrent component of a game under
the replicator dynamics contains the union of one or more sink equilibria of the game. Sink
equilibria, first defined by Goemans et al. [22] in the context of the price of anarchy, are the
sink strongly connected components of the better response graph of the game.

We believe that these two results open an important opportunity to articulate a new approach
to understanding a normal-form game. Instead of considering it, as game theorists have been doing
so far, as the specification of an intractable equilibrium selection problem, we propose to see it as
a specification of the limit behavior of the players. According to this point of view [39], a game
is a mapping from a prior distribution over mixed strategy profiles (MSPs) to the resulting limit
distribution if the players engage in an intuitive and well accepted natural behavior called noisy
replicator dynamics, which is related to multiplicative weight updates and will be defined soon.
This is the quest we are pursuing and advancing in this paper.
Swim till You Sink: Computing the Limit of a Game 3

Our contributions

– We propose a concrete, unambiguous, and computationally tractable conception of a game as


a mapping from any prior distribution over MSPs to the sink equilibria of the game, namely
the limit distribution of the noisy replicator dynamics when initialized at the prior.
– We initiate the study of the efficiency of its computation. As a baby step, in the next section
we show that the sink equilibria can be computed in time linear in the description of the game.
We also point out that they are intractable for various families of implicit games.
– We prove that the mapping from a prior to a distribution over sink equilibria can be calculated
explicitly and efficiently (near linear in the size of the game description) when the prior has
pure strategy support. This is highly nontrivial because the better response graph of the game
may contain many directed cycles of length two with infinitesimal transition probability ϵ,
corresponding to tie edges; the analysis must be carried out at the ϵ → 0 limit. The algorithm
involves a number of novel graph-theoretic concepts and techniques relating to Markov chains,
and the deployment of near-linear algorithms for directed Laplacian system solving as well as a
dynamic algorithm for incrementally maintaining the strongly connected components (SCCs)
of a graph.
– We also show through extensive experimentation that the general case (arbitrary prior) can be
solved efficiently for quite large games.

Related work

Non-convergence of learning dynamics in games. The difficulty of learning dynamics to


converge to Nash equilibria in games is punctuated by a plethora of diverse negative results span-
ning numerous disciplines such as game theory, economics, computer science and control the-
ory [5,16,27,21,39,48,1,2,24,4,26,30,12,49]. Recently, [31] capped off this stream of negative results
with a general impossibility result showing that there is no game dynamics that achieve global
convergence to Nash for all games, a result that is independent of any complexity theoretic or
uncoupledness assumptions on the dynamics. Besides such worst case theoretical results, detailed
experimental studies suggest that chaos is commonplace in game dynamics, and emerges even in
low dimensional systems across a variety of game theoretic applications [44,45,36,41,13,6,29,37].
Dynamical systems for learning in games. This extensive list of non-equilibrating results
has inspired a program for linking game theory to dynamical systems [38,39] and Conley’s fun-
damental theorem of dynamical systems [15]. These tools have since been applied in multi-agent
ML settings such as developing novel rankings as well as training methodologies for agents playing
games [34,43,33,35]. Finally, Peyton Young’s paper on conventions [40] is an important precursor
of our point of view in the Economics literature, focusing in the special case of games in which the
sink SCCs are pure strategy equilibria.
Sink equilibria. The notion of sink equilibrium, a strongly connected component with no
outgoing arcs in the strategy profile graph associated with a game, was introduced in [22]. They also
defined an analogue to the notion of Price of Anarchy [28], the Price of Sinking, the ratio between
the worst case value of a sink equilibrium and the value of the socially optimal solution. The value
of a sink equilibrium is defined as the expected social value of the steady state distribution induced
by a random walk on that sink. Later work established further connections between Price of Sinking
and Price of Anarchy via the (λ, µ)-robustness framework [42]. A number of negative, PSPACE-
hard complexity results for analyzing and approximating sink equilibria have been established in
different families of succinct games [20,32].
Finally, [25] compute the limiting stationary distribution of an irreducible MC with vanishing
edges; their technique can be used in our framework to solve for the time averaged long run behavior
within a sink SCC.

2 Preliminaries

We assume the standard definitions of a normal-form game G with p players with pure strategy
sets {Si }, and their utilities Ui . We denote by |G| the size of the description of G. The better-or-
equal response graph B(G) has the pure strategy profiles as nodes, and an edge from u to v if u
4 R. Hakim et al.

and v differ in the strategy of only one player i, and Ui (v) ≥ Ui (u). Let E be the set of edges of
B(G). Notice that, because of the tie edges and the transitive edges, the number of edges in the
response graph can be much larger than the size of the description of the game, i.e., |E| = Ω(|G|).
The sink equilibria of G are the sink strongly connected components (sink SCCs) of B(G), that is,
maximal sets of nodes with paths between all pairs, such that there is no edge leaving this set. We
shall define many other novel graph-theoretic concepts for this graph in the next section. Our first
theorem delineates the complexity of finding the sink equilibria of a game:

Theorem 1. The sink equilibria can be computed in time near-linear in the description of the game
presented in normal form, whereas computing them in a graphical game is PSPACE-complete.

Proof. The first claim follows from the fact that, even though (as pointed out in the preceding
paragraph) B(G) has more edges than the size of the description of the game, there is an equivalent
graph of linear size with the same transitive closure (and therefore strongly connected components)
obtained as follows: For each player and each pure strategy profile for other players, consider the
subgraph of only the nodes that correspond to the strategy profile for the other players together
with some action for the player. Sort the nodes by their outcome for the player. For each node in
order from lowest to highest outcome, create an edge to the next node in increasing order as well
as the last node of the same outcome if the next node has a higher outcome (only if this would not
be a self-loop). This preserves transitive closure as compared to B(G) and each node has at most
outgoing 2 edges (it sums to ≤ 3/2 edges per node on average in the worst case). Besides sorting
this can be done in linear time.
The second claim follows from known results [19]; the result holds for other forms of succinct
descriptions of games, such as Bayesian or extensive form games.

Next we define the noisy replicator dynamics on G [39], a noisy generalization of the classical
replicator dynamics [46]. It is a function mapping the set of MSPs of G to itself as follows:

– ϕ(x) = ∂G(x + η · BRx + Nx (0, δ)), where


– BRx is the unit best response vector at x projected to the subspace of x — that is, containing
zeros at all coordinates in which x is zero; this ensures that the support of x never increases;
– N (0, δ)x is Gaussian noise, also projected;
– the function ∂G maps (x + η · BRx + N (0, δ))x either to itself, if it is inside the domain of the
game’s MSPs, or to the closest point in the support’s boundary, otherwise;
– and δ, η > 0 are important small parameters.

Justification. The replicator dynamics [47,46] has been for four decades the standard model for
the evolution of strategic behavior. In connection with Economics and Game Theory, it has the
important advantage of invariance under positive affine changes in the players’ utilities. For our
purposes, it is approximated via the noisy version of Multiplicative Weights Update (MWU) [3].
Projecting the noise to the support of the current MSP x is motivated by evolution and extinc-
tion, and is instrumental for fast convergence. This precise dynamics has been used extensively in
reinforcement learning for game play, see for example [34,35].
Finally, we define a dynamics on the pure strategy profiles (that is, a Markov chain), called the
Conley-Markov Chain of G, or CMC(G) [39,22]. If (u, v) is an edge of B(G) corresponding to a
defection of player i, its probability in CMC(G) is proportional to Ui (v)−Ui (u), with the edges out
of each node u normalized to one. It is not hard to see that this is the limit of the noisy replicator
dynamics as the noise goes to zero and the MSP goes to u. Importantly, however, CMC(G) also
has an infinitesimal probability ϵ for each tie edge. (Note that this probability is used symbolically
as it descends to zero, and, in the interest of clarity, it does not affect the normalization at u.) This
treatment of tie edges reflects two things: First, it was shown in [7] that tie edges must be included
in the calculation of the sink equilibria for their theorem to hold; and second, to incorporate tie
edges in a way compatible with Conley’s theorem [15] is to think of them as conduits of a balanced
random walk on the undirected edge between the two nodes in which the MSP is changing via tiny
steps of σ at a time, so that it will take Θ(1/σ 2 ) steps for the transition to be completed, justifying
its infinitesimal transition probability.
Swim till You Sink: Computing the Limit of a Game 5

3 A Combinatorial Algorithm for the Hitting Probabilities


We start by collapsing all sink SCCs of CMC(G) to single absorbing nodes. Our main goal is to
compute the hitting probabilities from each node i of CM C(G) to each of the absorbing nodes —
that is, the probability that a path starting from i will end up in the node — albeit in the limit as
ϵ → 0. The hitting probabilities can be defined in two equivalent ways, both of which we will use
in our proofs. Define hiS to be the hitting probability of node i to sink S, that is, the probability
that a path from i will eventually be absorbed by S. Let pij be the transition probability of node
i to node j (the weight of the edge (i, j) or 0 if there is no edge). Then the hitting probabilities
are the smallest non-negative numbers that satisfy the following system of equations.
 P
 pij hjS , if i ̸∈ S.
hiS = j∈nodes (1)
 1, if i ∈ S

If we define ΨiS as the (potentially infinite) set of paths that start at i and end at some node
in S, then we equivalently have the following set of equations.
X
hiS = Pr[p] (2)
p∈ΨiS

In this section we prove the following:


4
Theorem 2. The limit hitting probabilities of CMC(G) can be computed in time O(|E| 3 ), where
E is the set of edges of CMC(G).
Significant progress has been made in solving linear systems associated with weighted directed
graphs such as Equation 1 faster than the time required to solve arbitrary linear systems. Two
problems can now be solved in almost-linear time in the size of the graph: computing the stationary
distribution of an irreducible Markov chain (hence abbreviated MC); and computing the escape
probabilities [14]. The computation of escape probabilities in a random walk maps directly to the
problem of computing hitting probabilities in a MC. So we have fast algorithms for our problem in
the case of no tie edges. However, the introduction of tie edges creates an ill-conditioned problem,
and we are interested in its solution as ϵ → 0. A possible approach would be to solve the system of
equations given in Eq. 1 symbolically and then take limits as ϵ → 0; however, solving large systems
of equations symbolically is intractable. Instead, we take a combinatorial approach to transform
any given CMC(G) into a simpler MC that preserves the limit hitting probabilities of the original
CMC(G) but eventually has no tie-edges. The hitting probabilities of this simplified MC can be
computed in almost linear time as mentioned above.

3.1 Outline of the Algorithm


1. The input to the algorithm is CMC(G) — actually, it could be any ϵ-MC M with absorbing
nodes. The output is the list of hitting probabilities {hiS }, the probabilities that the sink SCCs
of the graph will be reached by each of the nodes in the rest of the graph.
2. We start by collapsing the sink SCCs of M .
3. We calculate the SCC’s of M without the ϵ-edges, called rSCC’s. This makes sense since ϵ edges
are traversed at a far slower rate than the rest.
4. Next we must handle a phenomenon called a pseudosink, an rSCC that only has ϵ-edges out-
going. Within a pseudosink, the MC achieves convergence to a steady state before exiting, and
therefore all of its nodes have the same hitting probabilities. The pseudosinks are identified
one by one and collapsed, with their outgoing ϵ-edges replaced by regular edges in accordance
with Def. 6. A simple disjoint set data structure can track the original vertices through the
collapses.
5. A complication is that the collapsed pseudosinks acquire new regular edges to the rest of the
graph, and as a result the rSCCs of the graph must be recalculated. This procedure may also
create new pseudosinks, so steps 3 and 4 are repeated until no more pseudosinks exist.
6. Once all pseudosinks have been removed this way, any remaining ϵ-edges do not affect the
hitting probabilities and can therefore be deleted. At this point, the hitting probabilities can
be computed in almost linear time.
6 R. Hakim et al.

3.2 Definitions
Definition 1. ϵ-Markov Chain: A ϵ-Markov Chain (ϵ-MC) is a Markov chain that has two
types of edges: regular edges which have weights cr − cre ϵ for constants cr > 0 and cre ≥ 0 and ϵ
edges which have weights ce ϵ for constant ce > 0. Thus, the CMC(G) is an ϵ-MC. As the values
of the coefficients cre in the regular edges do not affect the limiting hitting probabilities we shall
ignore them.
Definition 2. Sink SCC: A sink SCC S is a maximal set of nodes that are strongly connected
(including connected via ϵ-edges) that has no outgoing edges.
Definition 3. rSCC: A rSCC is a maximal set of nodes that is strongly connected via regular
edges. An rSCC may contain ϵ-edges between nodes within the rSCC, but every node is reachable
from every other node without requiring ϵ-edges.
Definition 4. Pseudosink: A pseudosink P is a rSCC that has at least one outgoing ϵ-edge and
no outgoing regular edges.
Definition 5. Order: The order of a node i, Order(i), is the minimum number of tie edges that
exist on a path from i to any sink SCC. The maximum order of the current MC, MaxOrder(M ),
is our gauge of progress in the algorithm.
Definition 6. The weight of a new regular edge from a collapsed pseudosink to node y is as
follows (here O is the set of outgoing ϵ-edges from the pseudosink P ).
P
e∈O:e=(x,y) ce πP [x]
W (P, y) = P ′
,
e′ =(x′ ,y ′ )∈O ce πP [x ]

where πP [x] is the steady-state probability of x within P , computed using only regular edges.

3.3 Algorithm Correctness


Throughout the algorithm, we maintain a MC that we denote M , initially the MCM(G) with all
sink SCCs collapsed. There are two aspects to validate. The first is that the algorithm progresses
until M has no remaining ϵ-edges. The second is that M maintains the property that at all stages
the limit hitting probabilities of the original nodes are maintained through collapsing pseudosinks
(step 2) and deleting ϵ-edges.

Algorithm Progression
Lemma 1. If MaxOrder(M ) ≥ 1 then M contains a pseudosink.
Proof. Consider the set of all nodes i that achieve Order(i) = MaxOrder(M ). By definition,
from this set of nodes, the MC cannot reach any other nodes without using ϵ-edges. Consider
the rSCC decomposition of this set of nodes. The regular edges between rSCCs induce a directed
acyclic graph on this set of nodes. Every finite DAG has at least one leaf, defined as a vertex
that has no outgoing edges. Each leaf is a rSCC with no outgoing regular edges and therefore is a
pseudosink.
Lemma 2. Collapsing all pseudosinks in M reduces the maximum order of M by at least 1.
Proof. Again, consider the set of all nodes i that that have Order(i) equal to MaxOrder(M ) and
the DAG representing the rSCC structure of this set. Each leaf of the DAG is a pseudosink. From
each pseudosink, there exists a path to some sink SCC that achieves the order MaxOrder(M ).
Collapsing the pseudosink replaces all outgoing ϵ-edges with regular edges and therefore this same
path must now achieve the order MaxOrder(M ) − 1. Since every rSCC in the DAG is either a
leaf or has a regular path to a leaf, all nodes that previously achieved i will now have Order(i) ≤
MaxOrder(M ) − 1. So the maximum order of M is reduced by at least 1.
Combining these two lemmas means that at each stage of our algorithm we will find one or more
pseudosinks and collapse them, decreasing the maximum order by at least 1. So the algorithm will
progress until the maximum order reaches 0, at which point we delete all remaining ϵ-edges and
are left with a Markov Chain with only regular edges.
Swim till You Sink: Computing the Limit of a Game 7

Pseudosink Collapse
Lemma 3. Let P be a pseudosink. Let O be the set of outgoing ϵ-edges from P . Let Le be the
event that a Markov chain started at any i ∈ NP will take (e = (x, y)) ∈ O, where WM (e) = ce ϵ is
the weight in M of edge e. Then
ce πP [x]
lim Pr[Le ] = P
e′ ∈O ce πP
ϵ→0 ′

This immediately implies that W (P, y) from Definition 6 is the probability that y is the first node
outside of P that a chain started at any i ∈ NP will travel to.
Proof. Define Le as the event that e ∈ O is the first outgoing ϵ-edge. Also, define TL as the random
variable representing at what timestep the chain leaves P . We will start by using the fact that we
must leave P via some first outgoing edge.
X t1
XX ∞
X X
1= Pr[Le ] = Pr[Le ∩ TL = t] + Pr[Le ∩ TL = t]
e∈O e∈O t=0 e∈O t=t1 +1

Let d be the periodicity of the pseudosink ignoring ϵ-edges. Set t1 = √1ϵ + k where k is an in-
P variable chosen uniformly at random from 0 to d − 1. Define a constant Lmax =
teger random
maxi∈NM e=(i,j)∈E ce where E is the set of epsilon edges in M . So we have that Lmax ϵ upper
bounds the probability of taking an ϵ edge at any particular timestep.
t1 t1
XX X Lmax ϵ
lim Pr[L′e ∩ TL = t] ≤ lim Lmax · ϵ = lim ( √ + Lmax ϵk) = 0
ϵ→0 ϵ→0 ϵ→0 ϵ
e′ ∈O t=0 t=0

Let πP be the stationary distribution on the pseudosink P using only regular edges. We can
ignore the internal ϵ-edges because the pseudosink is an rSCC and therefore the regular edges
uniquely define the limiting stationary distribution [25].
Define Xt as the state of the MC at time t. For fixed t′ ≥ 0 and x ∈ NP , at time t = t1 + t′ we
have Pr[Xt = x|TL ≥ t] = πP [x] + n(t1 ) where the randomness is over both the MC process and
the choice of k. This is because while the MC has not left P , it is approaching the steady state of
P (or its subsequences are if P is periodic). We have that |n(t1 )| ≤ aG btG1 for some finite constants
aP and 0 ≤ bG < 1 independent of ϵ. This is due to the rate of convergence of subsequences of
periodic Markov Chains [9].

Noting that the left hand side of our previous equation is independent of ϵ, we can take the
limit as ϵ → 0.
X X ∞ X X ∞
1 = lim Pr[Le ∩ TL = t] = lim ce ϵ Pr[Xt = x ∩ TL ≥ t]
ϵ→0 ϵ→0
e∈O t=t1 +1 e∈O t=t1 +1
X X ∞
= lim ce ϵ Pr[Xt = x|TL ≥ t] Pr[TL ≥ t]
ϵ→0
e∈O t=t1 +1
X ∞
X
= lim ce′ (πP [x′ ] + n(t1 )) ϵ Pr[TL ≥ t]
ϵ→0
e′ =(x′ ,y ′ )∈O t=t1 +1

1 X
= lim ϵ Pr[TL ≥ t]
ce′ (πP [x′ ] + n(t1 )) ϵ→0 t=t +1
P
limϵ→0
1
e′ =(x′ ,y ′ )∈O

1 X
= lim ϵ Pr[TL ≥ t]
ce′ πP [x′ ]
P
ϵ→0
t=t1 +1
e′ =(x′ ,y ′ )∈O

Now we will compute limϵ→0 Pr[Le ].


t1
X ∞
X
Pr[Le ] = Pr[Le ∩ Tl = t] + Pr[Le ∩ Tl = t]
t=0 t=t1 +1
8 R. Hakim et al.

We can show that the first term vanishes in the limit as ϵ → 0.

t1
X t1
XX
Pr[Le ∩ Tl = t] < Pr[L′e ∩ TL = t]
t=0 e′ ∈O t=0
t1
X t1
XX
lim Pr[Le ∩ Tl = t] ≤ lim Pr[L′e ∩ TL = t] = 0
ϵ→0 ϵ→0
t=0 e′ ∈O t=0

Now we can substitute back in to our expression for Pr[Le ] and do a similar procedure as above.


X
lim Pr[Le ] = lim ce ϵ(πG [x] + n(t1 )) Pr[TL ≥ t]
ϵ→0 ϵ→0
t= √1ϵ +1

X
= lim ce (πP [x] + n(t1 )) · lim ϵ Pr[TL ≥ t]
ϵ→0 ϵ→0
t= √1ϵ +1

ce πP [x]
=P ′
e′ =(x′ ,y ′ )∈O ce πP [x ]

Lemma 4. Let P be a pseudosink, and let the stationary distribution on P without ϵ-edges be
denoted by πP . The hitting probabilities to sink SCCs of the overall graph are not affected by
collapsing P to a single node AP with outgoing edges corresponding to W (AP , y).

Proof. Let M be the ϵ-MC immediately before collapsing P , and let M ′ be the transformed ϵ-MC.
We will show that for an arbitrary i and arbitrary sink SCC S, limϵ→0 hiS (ϵ) = limϵ→0 h′iS (ϵ)
where hiS (ϵ) is the hitting probability from i to S in M and h′iS (ϵ) is the analogous quantity in
M ′.
Let NP be the set of nodes of pseudosink P . Define Ψk to be the set of paths in M that start
at i ̸∈ NP and enter and exit P exactly k times. Define Ψk′ similarly to be the set of paths in M ′
that visit collapsed node AP exactly k times. Then we can write the hitting probabilities as in Eq.
2.


X X
hi S(ϵ) = Pr[p]
k=0 p∈ΨiS ∩Ψk
X∞ X
h′i S(ϵ) = Pr[p]
′ ∩Ψ ′
k=0 p∈ΨiS k

Observe that any path p ∈ Ψk can be parameterized by two k length vectors. The first, denoted
ĝ ∈ (NP )k represents the sequence of entry locations to P of the path. To clarify, gj is the first
node in P that p visits on its j th time entering P . The second vector, denoted ŷ ∈ (NM \NP )k ,
represents the sequence of exit locations from P of the path. So yj is the first node not in P that
p visits on its j th exit from P . Let Ψĝ,ŷ ⊆ Ψk be the set of paths parameterized by ĝ, ŷ.
P
For set of nodes B and nodes i ∈ B, j, define TB [i, j] = p∈Ψi,j,B Pr[p] where Ψi,j,B is the
potentially infinite set of paths from i to j in M that have the property that all nodes on the path
are in the set B with the exception of the last node on the path if and only if j ̸∈ B. Define TB′ [i, j]
as the analogous quantity for paths in M ′ .
Let NM be the set of nodes of the ϵ-MC M . Define NH = NM \NP , so NH is the set of nodes
in M that are not in P . For k = 0, Ψis ∩ Ψk only includes paths that travel through nodes in NH .
These
P nodes and thePedges between them are unchanged by the collapse procedure implying that
p∈ΨiS ∩Ψ0 Pr[p] = p∈Ψ ′ ∩Ψ ′ Pr[p].
iS 0
Swim till You Sink: Computing the Limit of a Game 9

For k ≥ 1, we can write the sum of probabilities of all paths that start at i and end at S, that
visit G exactly k times as follows.
X XX X
Pr[p] = Pr[p]
p∈ΨiS ∩Ψk ŷ ĝ p∈ΨiS ∩Ψĝ,ŷ

X X X X k−1
Y
= ... TNH [i, g1 ] (TNP [gj , yj ]TNH [yj , gj+1 ]) · TNP [gk , yk ]TNH [yk , S]
ŷ g1 ∈NP g2 ∈NP gk ∈NP j=1
 
X X k
Y X
= TNH [i, g1 ]TNP [g1 , y1 ] ·  TNH [yj−1 , gj ]TNP [gj , yj ] TNH [yk , S]
ŷ g1 ∈NP j=2 gj ∈NP

Observe that TNP [g, y] is the probability that the first state after leaving P is y (after entering P
Pby Lemma 3, for all ′g ∈ NP , limϵ→0 TNP [g, y] = W (AP , y)
at state g) and therefore
Also, observe that g∈NP TNH [i, g] = TNH [i, AP ] because every path ending at some node
g ∈ NP in M ends at NG in M ′ and none of the edges used in these paths are affected by the

collapse procedure. We also know that paths in ΨK can be parameterized using a single vector
ŷ ∈ (NM ′ \AP ) , where yj is the node visited immediately after visiting AP for the j th time.
k

X X k−1
Y
lim Pr[p] = TN′ H [i, AP ] (W (g, yj )TN′ H [yj , AP ])W (g, yk )TN′ H [yk , S]
ϵ→0
p∈ΨiS ∩Ψk ŷ j=1
X
= lim Pr[p]
ϵ→0
′ ∩Ψ ′
p∈ΨiS k

Since the summations are equal for all k, we can see that the hitting probabilities from nodes
i ̸∈ NP are preserved in the limit. All that is left to prove is that for g ∈ NP , we have that
limϵ→0 hgS (ϵ) = limϵ→0 h′AP S (ϵ). Let HgS be the event that we hit absorbing state S given that
M started at g (so hgS (ϵ) = Pr[HgS ]). Recall that Le is the event that e is the first outgoing edge
from P that the chain takes.
X
Pr[HgS ] = Pr[HgS |Le ] Pr[Le ]
e∈O
X
= Pr[HyS ] Pr[Le ]
e=(x,y)∈O
X
lim hgS = W (AP , y) lim hyS
ϵ→0 ϵ→0
y̸∈P

Eq. 1 for calculating the hitting probability of AP in M ′ is exactly h′AP S = y̸∈P W (AP , y) limϵ→0 h′yS .
P
Since we have that the limiting hitting probabilities are maintained from y ̸∈ NP , they are also
maintained for g ∈ NP .

Deletion of ϵ-Edges at the Final Step


Lemma 5. If for all nodes i ∈ NM have a regular path (only using regular edges) to an absorbing
state, then ∀i, limϵ→0 P [Vi ] = 1 where Vi is the event that the MC started at i is absorbed before
taking any ϵ-edges.

Proof. Consider the set of regular edges of M which have weights of the form cr − cre ϵ. Set ϵ ≤
min(cr ) |NM |
2 max cre , so that every regular edge has weight at least wmin = min(cr )/2. Now set Cmin = wmin .
Note that Cmin is a constant independent of ϵ. For every node i there exists a regular path piS of
length at most |NM | to an absorbing state, which must have probability at least Cmin . In addition,
recall that Lmax is a constant defined in Lemma 3, such that Lmax ϵ upper bounds the probability
of taking an ϵ-edge at any timestep.
We will calculate P [Vi ] by analyzing a new MC M ′ that depends on M . This new MC begins
in the "start" state, with the original MC at some node i′ which is not an absorbing state (initially
this node is i). One step of M ′ is as follows: It evolves the M from i′ for len(p) steps, where p is the
10 R. Hakim et al.

path from i′ to an absorbing state that has probability ≥ Cmin . If M ends up at some absorbing
state (one way to do this is to take path p) then M ′ moves to the absorbing "success" state. If M
takes any ϵ transitions during this evolution, then M ′ moves to the absorbing "failure" state. If
neither of these happen, M will be at some non-absorbing state i′ and M ′ will stay in the start
state.
Observe that the probability that M ′ reaches the success state is exactly P [Vi ], since M ′ reaches
the success state if and only if M reaches some absorbing state before taking any ϵ-edges. Denote
the event that M ′ reaches the success state by VS′ . The edge from the start state to the success state
has probability at least Cmin . The edge from the start state to the failure state has probability at
most F (ϵ) = 1 − (1 − Lmin ϵ)|NM | . This is because there are len(p) ≤ |NM | steps of M taken by a
single step of M ′ and the chance of taking an ϵ transition at each of those steps is upper bounded
by Lmax ϵ.
We can use these bounds on the probabilities of the edges of M ′ to compute that:

Pr[VS′ ] ≥ (1 · Cmin ) + (0 · F ) + (Pr[VS′ ] · (1 − Cmin − F ))


Cmin

Cmin + F (ϵ)

Since limϵ→0 F (ϵ) = 0, limϵ→0 P [VS′ ] = 1 and therefore limϵ→0 P [Vi ] = 1.

Lemma 6. If MaxOrder(M ) = 0, deleting all ϵ-edges (that are not within a sink SCC) does not
affect the hitting probabilities.

Proof. By definition of order, if MaxOrder(M ) = 0, every node in M has a regular path to an


absorbing state, so we can apply Lemma 5 to get that for all states i, limϵ→0 P [Vi ] = 1 where Vi
is the event that M transitions from i to some absorbing state without taking any ϵ-edges.
Let M ′ be the graph with all ϵ-edges removed and all edge weights of the form cr + cre ϵ set to cr .
Let h′iS be the hitting probability from i to absorbing state S in M ′ . Define Ψr to be the set of
paths in M that only use regular edges, and Ψr′ to be the analogous set for M ′ . Then ΨiS ∩ Ψr is
the set of paths in M that go from i to S only using regular edges. Define P as the set of all paths
in M , we have that ΨiS ∩ (Ψ \Ψr ) is the set of paths in M that go from i to S using one or more
ϵ-edges.
X X
hiS (ϵ) = Pr[p] + Pr[p]
p∈(ΨiS ∩Ψr ) p∈(ΨiS ∩(Ψ \Ψr ))

The probability of taking a path from i to S that has one or more ϵ-edges must be less than the
probability of taking a ϵ-edge before absorption (since the first event is a subset of the second).
X
Pr[p] ≤ 1 − Pr[ViJ ]
p∈(ΨiS ∩(Ψ \Ψr ))
X
lim Pr[p] ≤ lim (1 − Pr[Vi ]) = 0
ϵ→0 ϵ→0
p∈(ΨiS ∩(Ψ \Ψr ))

We can plug this limit into the hitting probability equation and use the property that, since the
only structural difference between M and M ′ is the ϵ-edges, PiS ∩ Pr = PiS ′
∩ Pr′ . In addition,
for all regular edges e ∈ EM we have that limϵ→0 WM (e) = WM ′ (e) due to our renormalization.
Note that we can interchange limits because each weight on an regular edge converges to a positive
constant.
X X Y
lim hiS (ϵ) = lim Pr[p] = lim WM (e)
ϵ→0 ϵ→0 ϵ→0
p∈(ΨiS ∩Ψr ) p∈(ΨiS ∩Ψr ) e∈p
X Y
= WM ′ (e) = h′iS
′ ∩Ψ ′ ) e∈p
p∈(ΨiS r

Deleting the ϵ edges and re-normalizing the regular edges therefore has no effect on the limiting
hitting probabilities.
Swim till You Sink: Computing the Limit of a Game 11

3.4 Running Time of the Algorithm


Only steps 4, 5, and 6 of the algorithm have superlinear complexity. For 4 we need to calculate
the steady-state probabilities of the nodes of the pseudosink, because they are needed in the
calculation of the weights of the edges leaving the collapsed pseudosink. For 6, we need to compute
the hitting probability in an ordinary graph (no ϵ-edges). Both of these problems can be solved
in time O(|E|1+δ ) for all δ > 0 [14]. In Step 5, we do incremental maintenance of (r)SCCs. The
fastest known algorithms for incremental SCC maintenance take amortized time O(E 1+δ ) [10].

4 Experiments
We have implemented our algorithm and experimented with random games with various values
of the parameters p (players) and s (strategies per player) ranging for both parameters from 2 to
12. In the next subsection, we present certain examples that exhibit interesting behavior viz. our
algorithm. Since our main message is a new way to view a game as a algorithmic map from a prior
to a posterior distribution, in the second subsection we demonstrate how this works for various
reasonably large games. Given a prior distribution (typically the uniform distribution over all
MSPs), we sample from this distribution and then simulate the noisy replicator. We repeat until
our convergence criteria are satisfied, and output the posterior distribution. This accomplishes
our overarching goal, the empirical computation of the meaning of the game. We repeat this
experiment for larger and larger games, taking this simulation to its practical laptop limits. The
code used to generate the entire section is available at https://jasonmili.github.io/files/
gd_hittingprobabilities_code.zip.

4.1 Some interesting games and their better-response graphs


We use the following plotting conventions: in each better-response graph, every node of every sink
SCC will be colored with a unique color. Other pure profile nodes of the graph will be depicted
as a “pie” graph with colored areas that indicate the hitting probabilities towards each of the sink
SCCs it reaches, as identified by the former colors (of the sink SCCs). Tie edges (ϵ edges) appear
in the graph as bidirectional “0.00” edges; this is only for plotting convenience.

3 × 3 Game We start with a modified version of a game presented by [39] that exhibits two sink
SCCs: a directed cycle of length four (corresponding to a periodic orbit in the replicator space)
and another that is a single pure profile (corresponding to a strict pure NE); see Figure 2.

 
2, 1 1, 2 0, 0
1, 2 2, 1 0, 0
0, 0 0, 0 1, 1

Fig. 2. 3 × 3 game. Left: the better-response graph. Right: the game utilities.
12 R. Hakim et al.

Game with Order 1 Profile We construct a game, depicted in Figure 3 with two sink SCCs,
and a pure profile of order 1 (which is also a pseudosink SCC) that needs exactly one tie edge to
reach any sink SCC. Notice that the presence of this pure profile affects the hitting probabilities
towards the sink SCCs, as described in Section 3. This example shows that there are cases where
a pure NE may not be a sink SCC, or as a matter of fact, not even inside any sink equilibrium.
That is, this NE is not stochastically stable in the terminology of [40].

 
4, 4 1, 1 0, 0
0, 0 3, 3 1, 1
2, 2 1, 1 2, 2

Fig. 3. Tie game. The profile (3, 3) is order 1. Left: the better-response graph. Right: the game utilities.

3 × 3 × 3 Game The utilities for this game can be found in our code. See Figures 1 and 4.

(b) For added clarity, we show


a subgraph of (a) of all pure
profiles of order ≥ 1, along
with the sink SCCs.

(a) The better-response graph of the game; (c) The color coding that de-
nodes of sink SCCs are depicted in red. picts the orders of the various
pure-profile nodes.

Fig. 4. The 3 × 3 × 3 game.


Swim till You Sink: Computing the Limit of a Game 13

4.2 Convergence Statistics

Methodology. We generate games of various sizes and random utilities (see the figures below),
and we carry out a number of independently-randomized experiments of running noisy replicator
dynamics (RD) on each. For each sampled point of the prior distribution (typically uniform), we run
multiple independently-randomized instances of the noisy RD to obtain an empirical distribution.
We consider the outcome of the game as the empirical last-iterate distribution, i.e., the average of
all obtained distributions after run T . We keep track of the total variation (TV) distance between
the running average distribution (e.g., at time t < T ) and the ex-post empirical last-iterate (average
at time T ). We consider that a distribution has achieved good enough convergence when the TV
distance is less than 1% — we found that this is roughly the accuracy that is feasible in a laptop-
like experimental setup. All calculations in this section were performed on an Apple M2 processor
with the use of multi-threading with 8 parallel threads.

Fig. 5. Convergence example. 40 independent runs of noisy RD were used for each sample.

(a) 2-player, s-strategy games. (b) 3-player, s-strategy games.

5 Discussion and Open Questions

We have proposed that a useful way of understanding a game in normal form is as a map between
a prior distribution over mixed strategy profiles to a distribution over sink equilibria; namely, the
distribution induced by the noisy replicator dynamics if started at the prior. We showed that this
14 R. Hakim et al.

(a) n-player, 2-strategy games. (b) n-player, 3-strategy games.

Fig. 6. Convergence of our algorithm with total size of the game.

distribution can be computed quite efficiently starting from any pure strategy profile, through a
novel algorithm that handles the infinitesimal transitions associated with tie edges. By implement-
ing this algorithm and dynamical system we conducted experiments which we believe demonstrate
the feasibility of this approach to understanding the meaning of a game.

There are many problems left open by this work.

– In our simulations we approximated the meaning of the game for quite large games. We be-
lieve that more sophisticated statistical methods can yield more informative results for larger
games. Another possible front of improvement in our simulations would be a better theoretical
understanding of the trade-off between the parameters δ and η of the dynamics — the length
of the jump and the intensity of the noise.
– Under which assumptions do the sink equilibria coincide with the chain recurrent components
of the replicator dynamics (the solution concept suggested by the topological theory of dynam-
ical systems)? Sharpening the result of Biggar and Shames in this way is an important open
problem. On the other hand, a counterexample showing that it cannot be sharpened would
also be an important advance; we note that experiments such as the ones in this paper are
a fine way of generating examples of systems of sink equilibria which could eventually point
the way to a counterexample. Another question in the interface with the topological theory is,
Swim till You Sink: Computing the Limit of a Game 15

does the time average behavior within a sink SCC correspond to the behavior within a chain
component of the replicator?
– It would be very interesting to try — defying PSPACE-completeness — to compute sink
equilibria and simulate the noisy replicator on succinct games such as extensive form, Bayesian,
or graphical.

References
1. Andrade, G.P., Frongillo, R., Piliouras, G.: Learning in matrix games can be arbitrarily complex. In:
Conference on Learning Theory. pp. 159–185. PMLR (2021)
2. Andrade, G.P., Frongillo, R., Piliouras, G.: No-regret learning in games is turing complete. In: ACM
Conference on Economics and Computation (EC) (2023)
3. Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and
applications. Theory of Computing 8(1), 121–164 (2012)
4. Babichenko, Y.: Completely uncoupled dynamics and nash equilibria. Games and Economic Behavior
76(1), 1–14 (2012)
5. Bailey, J.P., Piliouras, G.: Multiplicative weights update in zero-sum games. In: ACM Conference on
Economics and Computation (2018)
6. Bielawski, J., Chotibut, T., Falniowski, F., Kosiorowski, G., Misiurewicz, M., Piliouras, G.: Follow-the-
regularized-leader routes to chaos in routing games. In: International Conference on Machine Learning.
pp. 925–935. PMLR (2021)
7. Biggar, O., Shames, I.: The attractor of the replicator dynamic in zero-sum games. arXiv preprint
arXiv:2302.00253 (2023)
8. Biggar, O., Shames, I.: The replicator dynamic, chain components and the response graph. In: Agrawal,
S., Orabona, F. (eds.) Proceedings of The 34th International Conference on Algorithmic Learning
Theory. Proceedings of Machine Learning Research, vol. 201, pp. 237–258. PMLR (20 Feb–23 Feb
2023), https://proceedings.mlr.press/v201/biggar23a.html
9. Bowerman, B., David, H., Isaacson, D.: The convergence of cesaro averages for certain nonstationary
markov chains. Stochastic processes and their applications 5(3), 221–230 (1977)
10. Chen, L., Kyng, R., Liu, Y.P., Meierhans, S., Gutenberg, M.P.: Almost-linear time algorithms for
incremental graphs: Cycle detection, sccs, s-t shortest path, and minimum-cost flow. arXiv preprint
arXiv:2311.18295 (2023)
11. Chen, X., Deng, X., Teng, S.h.: Computing Nash equilibria: Approximation and smoothed complexity.
In: FOCS’06. pp. 603–612. IEEE Computer Society (2006)
12. Cheung, Y.K., Piliouras, G.: Online optimization in games via control theory: Connecting regret,
passivity and poincaré recurrence. In: International Conference on Machine Learning. pp. 1855–1865.
PMLR (2021)
13. Chotibut, T., Falniowski, F., Misiurewicz, M., Piliouras, G.: The route to chaos in routing games:
When is price of anarchy too optimistic? Advances in Neural Information Processing Systems 33,
766–777 (2020)
14. Cohen, M.B., Kelner, J., Peebles, J., Peng, R., Rao, A.B., Sidford, A., Vladu, A.: Almost-linear-time
algorithms for markov chains and new spectral primitives for directed graphs. In: Proceedings of the
49th Annual ACM SIGACT Symposium on Theory of Computing. pp. 410–419 (2017)
15. Conley, C.: Isolated invariant sets and the Morse index. No. 38 in Regional conference series in math-
ematics, American Mathematical Society, Providence, RI (1978)
16. Daskalakis, C., Frongillo, R., Papadimitriou, C.H., Pierrakos, G., Valiant, G.: On learning algorithms
for nash equilibria. In: Kontogiannis, S., Koutsoupias, E., Spirakis, P.G. (eds.) Algorithmic Game
Theory. pp. 114–125. Springer Berlin Heidelberg, Berlin, Heidelberg (2010)
17. Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a Nash equilibrium.
In: STOC ’06. p. 71–78. ACM (2006)
18. Etessami, K., Yannakakis, M.: On the complexity of Nash equilibria and other fixed points. In: FOCS
’07. p. 113–123. IEEE Computer Society, USA (2007)
19. Fabrikant, A., Papadimitriou, C.: The complexity of game dynamics: BGP oscillations, sink equlibria,
and beyond. In: SODA (2008), http://www.cs.berkeley.edu/~{}alexf/papers/fp08.pdf
20. Fabrikant, A., Papadimitriou, C.H.: The complexity of game dynamics: Bgp oscillations, sink equilibria,
and beyond. In: SODA. vol. 8, pp. 844–853. Citeseer (2008)
21. Galla, T., Farmer, J.D.: Complex dynamics in learning complicated games. Proceedings of the National
Academy of Sciences 110(4), 1232–1236 (2013)
22. Goemans, M., Mirrokni, V., Vetta, A.: Sink equilibria and convergence. In: 46th Annual IEEE Sym-
posium on Foundations of Computer Science (FOCS’05). pp. 142–151 (2005). https://doi.org/10.
1109/SFCS.2005.68
16 R. Hakim et al.

23. Harsanyi, J.C., Selten, R.: A general theory of equilibrium selection in games. MIT Press Classics,
MIT Pr, Cambridge, Mass., 2. print edn. (1992)
24. Hart, S., Mas-Colell, A.: Uncoupled dynamics do not lead to nash equilibrium. American Economic
Review 93(5), 1830–1836 (2003)
25. Hassin, R., Haviv, M.: Mean passage times and nearly uncoupled markov chains. SIAM Journal on
Discrete Mathematics 5(3), 386–397 (1992)
26. Hsieh, Y.P., Mertikopoulos, P., Cevher, V.: The limits of min-max optimization algorithms: Conver-
gence to spurious non-critical sets. arXiv preprint arXiv:2006.09065 (2020)
27. Kleinberg, R., Ligett, K., Piliouras, G., Tardos, É.: Beyond the Nash equilibrium barrier. In: Sympo-
sium on Innovations in Computer Science (ICS) (2011)
28. Koutsoupias, E., Papadimitriou, C.: Worst-case equilibria. In: (STACS). pp. 404–413. Springer-Verlag
(1999)
29. Leonardos, S., Reijsbergen, D., Monnot, B., Piliouras, G.: Optimality despite chaos in fee markets. In:
International Conference on Financial Cryptography and Data Security. pp. 346–362. Springer (2023)
30. Mertikopoulos, P., Papadimitriou, C., Piliouras, G.: Cycles in adversarial regularized learning. In:
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 2703–
2717. SIAM (2018)
31. Milionis, J., Papadimitriou, C., Piliouras, G., Spendlove, K.: An impossibility theorem in game
dynamics. Proceedings of the National Academy of Sciences 120(41), e2305349120 (2023). https:
//doi.org/10.1073/pnas.2305349120, https://www.pnas.org/doi/abs/10.1073/pnas.2305349120
32. Mirrokni, V.S., Skopalik, A.: On the complexity of nash dynamics and sink equilibria. In: Proceedings
of the 10th ACM conference on Electronic commerce. pp. 1–10 (2009)
33. Muller, P., Omidshafiei, S., Rowland, M., Tuyls, K., Perolat, J., Liu, S., Hennes, D., Marris, L.,
Lanctot, M., Hughes, E., et al.: A generalized training approach for multiagent learning. arXiv preprint
arXiv:1909.12823 (2019)
34. Omidshafiei, S., Papadimitriou, C., Piliouras, G., Tuyls, K., Rowland, M., Lespiau, J.B., Czarnecki,
W.M., Lanctot, M., Perolat, J., Munos, R.: α-rank: Multi-agent evaluation by evolution. Scientific
reports 9(1), 9937 (2019)
35. Omidshafiei, S., Tuyls, K., Czarnecki, W.M., Santos, F.C., Rowland, M., Connor, J., Hennes, D.,
Muller, P., Pérolat, J., Vylder, B.D., et al.: Navigating the landscape of multiplayer games. Nature
communications 11(1), 5603 (2020)
36. Palaiopanos, G., Panageas, I., Piliouras, G.: Multiplicative weights update with constant step-size in
congestion games: Convergence, limit cycles and chaos. In: Advances in Neural Information Processing
Systems. pp. 5872–5882 (2017)
37. Pangallo, M., Sanders, J., Galla, T., Farmer, D.: A taxonomy of learning dynamics in 2 x 2 games.
arXiv e-prints arXiv:1701.09043 (Jan 2017)
38. Papadimitriou, C., Piliouras, G.: From nash equilibria to chain recurrent sets: An algorithmic solution
concept for game theory. Entropy 20(10) (2018)
39. Papadimitriou, C., Piliouras, G.: Game dynamics as the meaning of a game. ACM SIGecom Exchanges
16(2), 53–63 (2019)
40. Peyton Young, H.: The evolution of conventions. Econometrica 61(1), 57–84 (1993), http://www.
jstor.org/stable/2951778
41. Piliouras, G., Yu, F.Y.: Multi-agent performative prediction: From global stability and optimality to
chaos. In: Proceedings of the 24th ACM Conference on Economics and Computation. pp. 1047–1074
(2023)
42. Roughgarden, T.: Intrinsic robustness of the price of anarchy. In: ACM Symposium on Theory of
Computing (STOC). pp. 513–522. ACM (2009)
43. Rowland, M., Omidshafiei, S., Tuyls, K., Perolat, J., Valko, M., Piliouras, G., Munos, R.: Multiagent
evaluation under incomplete information. arXiv preprint arXiv:1909.09849 (2019)
44. Sanders, J.B., Farmer, J.D., Galla, T.: The prevalence of chaotic dynamics in games with many players.
Scientific reports 8(1), 1–13 (2018)
45. Sato, Y., Akiyama, E., Farmer, J.D.: Chaos in learning a simple two-person game. Proceedings of the
National Academy of Sciences 99(7), 4748–4751 (2002). https://doi.org/10.1073/pnas.032086299,
https://www.pnas.org/content/99/7/4748
46. Schuster, P., Sigmund, K.: Replicator dynamics. Journal of Theoretical Biology 100(3), 533
– 538 (1983). https://doi.org/http://dx.doi.org/10.1016/0022-5193(83)90445-9, http://www.
sciencedirect.com/science/article/pii/0022519383904459
47. Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Mathematical
Biosciences 40(1), 145 – 156 (1978). https://doi.org/https://doi.org/10.1016/0025-5564(78)
90077-9, http://www.sciencedirect.com/science/article/pii/0025556478900779
48. Vlatakis-Gkaragkounis, E.V., Flokas, L., Lianeas, T., Mertikopoulos, P., Piliouras, G.: No-regret learn-
ing and mixed nash equilibria: They do not mix. Advances in Neural Information Processing Systems
33, 1380–1391 (2020)
Swim till You Sink: Computing the Limit of a Game 17

49. Young, H.P.: The possible and the impossible in multi-agent learning. Artificial Intelligence 171(7),
429–433 (2007)

You might also like