Imitation in Large Games: Soumya Paul R. Ramanujam
Imitation in Large Games: Soumya Paul R. Ramanujam
Imitation in Large Games: Soumya Paul R. Ramanujam
Soumya Paul
R. Ramanujam
In games with a large number of players where players may have overlapping objectives, the analysis
of stable outcomes typically depends on player types. A special case is when a large part of the player
population consists of imitation types: that of players who imitate choice of other (optimizing) types.
Game theorists typically study the evolution of such games in dynamical systems with imitation rules.
In the setting of games of infinite duration on finite graphs with preference orderings on outcomes for
player types, we explore the possibility of imitation as a viable strategy. In our setup, the optimising
players play bounded memory strategies and the imitators play according to specifications given by
automata. We present algorithmic results on the eventual survival of types.
Summary
Imitation is an important heuristic studied by game theorists in the analysis of large games, in both
extensive form games with considerable structure, and repeated normal form games with large number
of players. One reason for this is that notions of rationality underlying solution concepts are justified by
players assumptions about how other players play, iteratively. In such situations, players knowledge
of the types of other players alters game dynamics. Skilled players can then be imitated by less skilled
ones, and the former can then strategize about how the latter might play. In games with a large number
of players, both strategies and outcomes are studied using distributions of player types.
The dynamics of imitation, and strategizing of optimizers in the presence of imitators can give rise to
interesting consequences. For instance, in the game of chess, if the player playing white somehow knows
that her opponent will copy her move for move then the following simple sequence of moves allows her
to checkmate her opponent 1 :
1.e3 e6 2.Qf3 Qf6 3.Qg3 Qg6 4.Nf3 Nf6 5.Kd1 Kd8 6.Be2 Be7 7.Re1 Re8
8.Nc3 Nc6 9.Nb5 Nb4 10.Qxc7#
On the other hand, we can have the scenario where every player is imitating someone or the other
and the equilibrium attained maybe highly inefficient. This is usually referred to as herd behaviour and
has been studied for instance in [3].
In an ideal world, where players have unbounded resources and computational ability, each of them
can compute their optimal strategies and play accordingly and thus we can predict optimal play. But
in reality, this is seldom the case. Players are limited in their resources, in computational ability and
their knowledge of the game. Hence, in large games it is not possible for such players to compute their
optimal strategies beforehand by considering all possible scenarios that may arise during play. Rather,
they observe the outcome of the game and then strategise dynamically. In such a setting again, imitation
types make sense.
A resource bounded player may attach some cost to strategy selection. For such a player, imitating
another player who has been doing extensive research and computation may well be worthwhile, even if
1 This
163
her own outcomes are less than optimal. What is lost in sub-optimal outcomes may be gained in avoiding
expensive strategisation.
Thus, in a large population of players, where resources and computational abilities are asymmetrically distributed, it is natural to consider a population where the players are predominantly of two kinds:
optimisers and imitators.2 Asymmetry in resources and abilities can then lead to different types of imitation and thus ensure that we do not end up with herd behaviour of the kind referred to above. Mutual
reasoning and strategising process between optimizers and imitators leads to interesting questions for
game dynamics in these contexts.
Imitation is typically modelled in the dynamical systems framework in game theory. Schlag ([12])
studies a model of repeated games where a player in every round samples one other player according
to some sampling procedure and then either imitates this player or sticks to her own move. He shows
that the strategy where a player imitates the sampled player with a probability that is proportional to the
difference in their payoffs, is the one that attains the maximum average payoff in the model. He also
gives a simple counterexample to show that the nave strategy of imitate if better may not always be
improving. Banerjee ([3]) studies a sequential decision model where each decision maker may look at
the decisions made by the previous decision makers and imitate them. He shows that the decision rules
that are chosen by optimising individuals are characterised by herd behaviour, i.e., people do what others
are doing rather than using their own information. He also shows that such an equilibrium is inefficient.
Levine and Pesendorfer ([7]) study a model where existing strategies are more likely to be imitated than
new strategies are to be introduced.
The common framework in all of the above studies is repeated non-zero-sum normal form games
where the questions asked of the model are somewhat different from standard ones on equilibria. Since
all players are not optimizers, we do not speak of equilibrium profiles as such but optimal strategies for
optimizers and possibly suboptimal outcomes for imitators. In the case of imitators, since they keep
switching (imitate i for 2 moves, j for 3 moves, then again i for 1 move, etc.) studies consider stability
of imitation patterns, what types of imitation survive eventually, since these would in turn determine play
by optimizers and thus stable subgames, thus determining stable outcomes. Note that, as in the example
of chess above, imitation and hence the study of system dynamics of this kind, makes equal sense in
large turn based extensive form games among resource bounded players as well.
For finitely presented infinite games the stability questions above can be easily posed and answered
in automata theoretic ways, since typically bounded memory strategies suffice for optimal solutions, and
stable imitation patterns can be analysed algorithmically. Indeed, this also provides a natural model for
resource bounded players as finite state automata.
With this motivation, we consider games of unbounded duration on finite graphs among players with
overlapping objectives where the population is divided into players who optimise and others who imitate.
Unbounded play is natural in the study of imitation as a heuristic, since losses incurred per move may be
amortised away and need not affect eventual outcomes very much. Imitator types specify how and who
to imitate and are given using finite state transducers. Since plays eventually settle down to connected
components, players preferences are given using orderings on Muller sets [11]. In this work, we study
turn-based games so as to use the set of techniques already available for the analysis of such games.
In this setting we address the following questions and present algorithmic results:
If the optimisers and the imitators play according to certain specifications, is a global outcome
eventually attained?
2 There would also be a third kind of players, randomisers, who play any random strategy, but we do not consider such
players in this exposition.
164
The model of games we present is the standard model of turn based games of unbounded duration on
finite graphs. For any positive integer n, let [n] = {1, . . . , n}.
Definition 1 Let n N, n > 1. An n-player game arena is a directed graph G = (V1 , . . .Vn , A, E), where
S
Vi are finite sets of game positions with Vi V j = 0/ for i 6= j, V = i[n] Vi , A is a finite set of moves, and
E (V A V ) is the move relation that satisfies the following conditions:
1. For every v, v1 , v2 V and a, b A, if (v, a, v1 ) E and (v, b, v2 ) E then a 6= b.
2. For every v V , there exists a A and v V such that (v, a, v ) E.
When an initial position v0 V is specified, we call (G , v0 ) an initialised arena or just an arena.
In this model, we assume for convenience that the moves of all players are the same. When v Vi ,
we say that player i owns the vertex v. A game arena is thus a finite graph with nodes labelled by players
and edges labelled by moves such that no two edges out of a vertex share a common label and there are
no dead ends. For a vertex v V , let vE denote its set of neighbours: vE = {v |(v, a, v ) E for some
a A}. For v V and a A, let v[a] = {v |(v, a, v ) E}; v[a] is either empty or the singleton {v }. In
the latter case, we say a is enabled at v and write v[a] = v . For u A , we can similarly speak of u being
enabled at v and define v[u] so that when v[u] = {v }, there is a path in the graph from v to v such that u
is the sequence of move labels of edges along that path. Given v V and u A , if any u-labelled path
exists in the graph, it is unique. On the other hand, given any sequence of vertices that correspond to a
path in the graph, there may be more than one sequence of moves that label that path.
ai+1
a
A play in (G , v0 ) is an infinite path v0 1 . . ., such that vi vi+1 for i N. We often speak of
a0 a1 . . . A as the play to denote this path. The game starts by placing a token at v0 Vi . Player i
chooses an action a A enabled at v0 and the token moves along the edge labelled a to a neighbouring
vertex v1 V j . Player j chooses an action a A enabled at v1 , the token moves along the edge labelled
a to a neighbouring vertex and so on. Note that since there are no dead ends, any player whose turn it is
to move has some available move.
165
a
/ Vi .
: < k, v
2.1 Objectives
The game arena describes only legal plays, and the game itself is defined by specifying outcomes and
players preferences on outcomes. Since each play results in an outcome for each player, players preferences are on plays. This can be specified finitely, as every infinite play on a finite graph settles down
to a strongly connected component.
For a play u A let inf(u) be the set of vertices that appear infinitely often in the play given by u.
With each player i, we associate a total pre-order i (2V 2V ). This induces a total preorder on plays
as follows: u i u iff inf(u) i inf(u ).
Thus an n-player game is given by a tuple (G , v0 , 1 , . . . , n ), consisting of an n-player game arena
and players preferences.
2.2 Strategies
Players strategise to achieve desired outcomes. Formally, a strategy i for player i is a partial function
i : VA A
where i (vu) is defined if v[u] is defined and v[u] Vi , and if (v[u])[i (vu)] is defined.
A strategy i of player i is said to be bounded memory if there exists a finite state transducer FST
A = (M, , g, m0 ) where M (the memory of the strategy) is a finite set of states, m0 M is the initial
state of the memory, : A M M is the memory update function, and g : Vi M A is the move
function such that for all v Vi and m M, g(v, m) is enabled at v and the following condition holds:
given v Vi , when u = a1 . . . ak A is a partial play from v, i (vu) is defined, i (vu) = g(v[u], mk ),
where mk is determined by: mi+1 = (ai+1 , mi ) for 0 i < k.
A strategy is said to be memoryless or positional if M is a singleton. That is, the moves depend only
on the current position.
Definition 2 Given a strategy profile = (1 , . . . , n ) for n players let denote the unique play in
(G , v0 ) conforming to . A profile is called a Nash equilibrium in (G , v0 , 1 , . . . , n ) if for every
player i and for every other strategy i of player i, inf(( i ,i ) ) i inf( ).
Specification of Strategies
We now describe how the strategies of the imitator and optimiser types are specified.
166
: V A is a positional strategy such that for any v V , (v) is enabled at v, and : M [n] is the
imitation map.
Given j as above, define a strategy j for player j as follows. Let v V and u = a1 . . . ak A is
a partial play from v such that v[u] is defined and v[u] V j . Let mi+1 = (ai+1 , mi ) for 0 i < k. Then
j (vu) = a , if a is the last (mk ) move in the given play and a is enabled at vu, and j (vu) = (v[u]),
otherwise.
Note that the type specification only specifies whom to imitate, and how it decides whom to imitate
but is silent on the rationale for imitating a player or switching from imitating x to imitating y. In general
an imitator would have a set of observables, and based on observations of game states made during
course of play, would decide on whom to imitate when. Thus imitator specifications could be given by
a past-time formula in a simple propositional modal logic. With any such formula we can associate an
imitation type transducer as defined above, so we do not pursue that approach here. See, for instance,
[10] for more along that direction.
The following are some examples of imitating strategies that can be expressed using such automata:
1. Imitate player 1 for 3 moves and then keep imitating player 4 forever.
2. Imitate player 2 till she receives the highest payoff. Otherwise switch to imitating player 3.
3. Nondeterministically imitate player 4 or 5 forever.
For convenience of the subsequent technical analysis, we assume that an imitator type
= (M, , , , m0 ) is presented as a finite state transducer R = (M , , g , mI ) where
M = V M A[n] .
: A M M such that (a, hv, m, (a1 , . . . , an )i) = hv , m , (a1 , . . . , ai1 , a, ai+1 , . . . , an )i such
a
that v v , (a, m) = m and v Vi .
g : V M A such that g (v, hv, m, (a1 , . . . , an )i) = ai iff (m) = i and ai is enabled at v. Otherwise
g (v, hv, m, (a1 , . . . , an )i) = (v).
mI = hv0 , m0 , (a1 , . . . , an )i for some (a1 , . . . , an ) A|n| .
Figure 1 below depicts an imitator strategy where a player imitates player 1 for two moves and then
player 2 for one move and then again player 1 for two moves and so on. She just plays the last move
of the player she is currently imitating. Suppose there are a total of p actions, that is, |A| = p. She
remembers the last move of the player she is imitating in the states m1 to m p , and when it is her turn to
move, plays the corresponding action.
Given an FST R for an imitator type , we call a strongly connected component of R a subtype of
R . We will often refer to the strategy j induced by the imitator type R for player j as R , when the
context is clear.
We define the notion of an imitation equilibrium which is a tuple of strategies for the optimisers such
that none of the optimisers can do better by unilaterally deviating from it given that the imitators stick to
their specifications.
Definition 3 In the game (G , v0 , 1 , . . . , n ), given that the imitators r + 1, . . . , n play strategies
r+1 , . . . , n , a profile of strategies = (1 , . . . , r ) of the optimisers is called an imitation equilibrium if
for every optimiser i and for every other strategy i of i, inf(( i ,i ) ) i inf( ).
Remark Note that an imitation equilibrium may be quite different from a Nash equilibrium of the
game (G , v0 , 1 , . . . , n ) when restricted to the first r components. In a Nash equilibrium the imitators
167
Results
In this section, we first show that it suffices to consider bounded memory strategies for the optimisers.
Then we go on to address the questions raised towards the end of Section 1.
First we define a product operation between an arena and a bounded memory strategy.
168
V = V M
v0 = (v0 , m0 )
4.2 Equilibrium
Of the n players let the first r be optimisers and the rest n r be imitators. Let r+1 , . . . , n be the
types of the imitators r + 1, . . . , n. We transform the game (G , v0 , 1 , . . . , n ) with n players to a game
(G , v0 , 1 , . . . , r+1 ) with r + 1 players in the following steps:
1. Construct the graph (G , v0 ) = ((V , E ), v0 ) as G = G Rr+1 Rn .
2. Let V = V1 . . . Vr Vr+1
such that for i : 1 i r, (v, m1 , . . . , mn ) Vi iff v Vi . And
(v, m1 , . . . , mn ) Vr+1
iff v Vr+1 . . . Vn . Let there be r + 1 players such that the vertex set
Vi belongs to player i. Thus we introduce a dummy player, the r + 1th player, who owns all the
vertices (v, m1 , . . . , mn ) V such that v was originally an imitator vertex in V . By construction,
a
169
The game (G , v0 , 1 , . . . , r+1 ) is a turn based game with r + 1 players (the optimisers and the
dummy) such that each player i has a preference ordering i over the Muller sets of V . Such a game
was called a generalised Muller game in [11].
Let L be the set
We then have:
Theorem 2 = (1 , . . . , r ) is an imitation equilibrium in (G , v0 , 1 , . . . , n ).
Proof Suppose not and suppose player i has an incentive to deviate to a strategy in (G , v0 , 1 , . . . , n ).
Let u A be the unique play consistent with the tuple where the imitators stick to their strategy tuple
(r+1 , . . . , n ). Let u A be the unique play consistent with the tuple ( i , ) (that is when player i has
deviated to the strategy ) where again the imitators stick to their strategy tuple (r+1 , . . . , n ). Let l be
the first index such that u(l) 6= u (l). Then, v0 [ul1 ] Vi , (where ul1 is the length l 1 prefix of u). That
is, the vertex v0 [ul1 ] belongs to optimiser i since everyone else sticks to her strategy.
Now consider what happens in the game (G , v0 , r+1 , . . . , n ) when all the optimisers except i play
, . . . , , . . . , and the imitators stick to their strategy tuple (
the strategies 1 , . . . , i1
r+1 , . . . , n ). If
r
i+1
the optimiser i mimicks strategy for l 1 moves in the game then the play is exactly ul1 and reaches a
vertex (v, mr+1 , . . . , mn ) Vi where v = v0 [ul1 ]. By construction of the product, all the actions enabled
at v in the arena G are also enabled in the arena G . Hence the optimiser i can play u(l). By similar
arguments, optimiser i can mimick the strategy in the arena G forever.
Thus by mimicking in the game (G , v0 , r+1 , . . . , n ), the optimiser i can force a more preferable
Muller set. But this contradicts the fact that is an equilibrium tuple in the game (G , v0 , r+1 , . . . , n ).
2
4.3 Stability
Finally, we adress the questions asked in Section 1. Given a game (G , v0 , 1 , . . . , n ) with optimisers and
imitators where the optimisers play bounded memory strategies and the imitators play imitative strategies
specified by k finite state transducers we wish to find out:
170
If a certain stongly connected component W of G is where the play eventually settles down to.
What subtypes eventually survive.
How worse-off is imitator i from an equilibrium outcome.
We have the following theorem:
Theorem 3 Let (G , v0 , 1 , . . . , n ) be a game with n players where the first r are optimisers playing bounded memory strategies 1 , . . . , r and the rest n r are imitators playing imitative strategies
r+1 , . . . , n where every such strategy is among k different types. Let W be a strongly connected component of G . The following questions are decidable:
(i) Does the game eventually settle down to W ?
(ii) What subtypes of the k types eventually survive?
(iii) How worse-off is imitator i from an equilibrium outcome?
Proof Construct the arena (G , v0 ) = G A1 . . . Ar Rr+1 . . . Rn .
(i) For the strongly connected component S in (G , v0 ) that is reachable from v0 , let S be subgraph
induced by the set {v | (v, m1 , . . . , mn ) S }. Collapse the vertices of S that have the same name
and call the resulting graph S . Check if S is the same as W and output YES if so.
(ii) For the strongly connected component S in (G , v0 ) that is reachable from v0 do the following:
For i : r + 1 i n take the restriction of S to the ith component for every (v, m1 , . . . , mn ) S.
Let Si denote this restriction.
Collapse vertices with the same name in Si . Let Si be this new graph.
Check if Si is a subtype of i . If so output Si .
(iii) Compute a Nash equilibrium of the game (G , v0 , 1 , . . . , n ) using the procedure described in
[11]. Let S be the reachable strongly connected component of the arena (G , v0 ). Restrict S to the
first component and call it S. Let F = occ(S). Compare F with inf( ) according to the preference
ordering i of imitator i.
2
4.4 An Example
Let us look at an example illustrating the concepts of the previous section. Consider 3 firms A, B and
C. Each firm has a choice of producing 2 products, product a or product b repeatedly, i.e., potentially
infinitely often. In every batch each of them can decide to produce either of the products.
Now firm A is a large firm with all the technical knowhow and infrastructure and it can change
between its choice of production in consecutive batches without much increase in cost. On the other
hand, the firms B and C are small. For either of them, if in any successive batch it decides to change
from producing a to b or vice-versa, there is a high cost incurred in setting up the necessary infrastructure.
Whereas, if it sticks to the product of the previous batch, the infrastructure cost is negligible. Thus in
the case where it switches between products in consecutive batches, it is forced to set the price of its
product high. This actually favours firm A as it can always set its product at a reasonable price since it is
indifferent between producing either of the two products in any batch.
The demand in the market for a and b keeps changing. Firm A being the bigger firm has the resources and knowhow to analyse the market and anticipate the current demand and then produce a or b
171
Discussion
The model that we have presented here is far from definitive, but we see these results as early reports in a
larger programme of studying games with player types. The model requires modification and refinement
in many directions, being addressed in related on-going work. In games with large number of players,
outcomes are typically associated not with player profiles but with distribution of types in the population.
Imitation crucially affects such dynamics. Our model can be easily modified to incorporate distributions
172
but the analysis is considerably more complicated. Further, it is natural to consider this model in the
context of repeated normal form games, but in such contexts almost-sure winning randomized strategies
are more natural. A more critical notion required is that of type based reduction of games, so that analysis
of large games can be reduced to that of interaction between player types.
Acknowledgement
We thank the anonymous referees for their helpful comments and suggestions. The second author thanks
NIAS (http://nias.knaw.nl) for support when he was working on this paper.
References
[1] L. de Alfaro & T. A. Henzinger (2000): Concurrent omega-regular Games. In: LICS 2000: 15th International
IEEE Symposium on Logic in Computer Science, IEEE Press, pp. 141154.
[2] L. de Alfaro, T. A. Henzinger & O. Kupferman (1998): Concurrent Reachability. In: FOCS 98, IEEE, pp.
564575.
[3] Abhijit V. Banerjee (1992): A Simple Model of Herd Behaviour. The Quarterly Journal of Economics 107(3),
pp. 797817.
[4] J. R. Buchi & L. H. Landweber (1969): Solving Sequential Conditions by Finite-State Strategies. Transactions
of the American Mathematical Society 138, pp. 295311.
[5] K. Chatterjee, M. Jurdzinski & R. Majumdar. (2004): On Nash equilibria in stochastic games. In: Proceedings of the 13th Annual Conference of the European Association for Computer Science Logic, LNCS 3210,
Springer-Verlag, pp. 2640.
[6] E. Gradel & M. Ummels (2008): Solution Concepts and Algorithms for Infinite Multiplayer Games. In:
New Perspectives on Games and Interaction, Texts in Logic and Games 4, Amsterdam University Press, pp.
151178.
[7] David K. Levine & Wolfgang Pesendorfer (2007): The Evolution of Cooperation Through Imitation. Games
and Economic Behaviour 58(2), pp. 293315.
[8] D. A. Martin (1975): Borel Determinacy. Annals of Mathematics 102, pp. 363371.
[9] D. A. Martin (1998): The Determinacy of Blackwell Games. The Journal of Symbolic Logic 63(4), pp.
15651581.
[10] Soumya Paul, R. Ramanujam & Sunil Simon (2009): Stability under Strategy Switching. In: Benedict Lowe
Klaus Ambos-Spies & Wofgang Merkle, editors: Proceedings of the 5th Conference on Computability in
Europe (CiE ), LNCS 5635, pp. 389398.
[11] Soumya Paul & Sunil Simon (2009): Nash equilibrium in generalised Muller games. In: Proceedings of the
Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS, Leibniz
International Proceedings in Informatics (LIPIcs) 4, Schloss DagstuhlLeibniz-Zentrum fuer Informatik, pp.
335346.
[12] Karl S. Schlag (1998): Why Imitate, and if so, How? A Boundedly Rational Approach to Multi-armed
Bandits. Journal of Economic Theory , pp. 130156.
[13] W. Zielonka (1998): Infinite Games on Finitely Coloured Graphs with Applications to Automata on Infinite
Trees. Theoretical Computer Science 200(1-2), pp. 135183.