Unbending Strategies Shepherd Cooperation and Suppress Extortion in Spatial Populations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Unbending strategies shepherd cooperation and

suppress extortion in spatial populations


arXiv:2405.19565v1 [physics.soc-ph] 29 May 2024

Zijie Chen1 , Yuxin Geng1 , Xingru Chen1 and Feng Fu2,3


1
School of Science, Beijing University of Posts and Telecommunications, Beijing
100876, China
2
Department of Mathematics, Dartmouth College, Hanover, NH 03755, USA
3
Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth,
Lebanon, NH 03756, USA
E-mail: [email protected]

Abstract. Evolutionary game dynamics on networks typically consider the


competition among simple strategies such as cooperation and defection in the Prisoner’s
Dilemma and summarize the effect of population structure as network reciprocity.
However, it remains largely unknown regarding the evolutionary dynamics involving
multiple powerful strategies typically considered in repeated games, such as the zero-
determinant (ZD) strategies that are able to enforce a linear payoff relationship between
them and their co-players. Here, we consider the evolutionary dynamics of always
cooperate (AllC), extortionate ZD (extortioners), and unbending players in lattice
populations based on the commonly used death-birth updating. Out of the class of
unbending strategies, we consider a particular candidate, PSO Gambler, a machine-
learning-optimized memory-one strategy, which can foster reciprocal cooperation and
fairness among extortionate players. We derive analytical results under weak selection
and rare mutations, including pairwise fixation probabilities and long-term frequencies
of strategies. In the absence of the third unbending type, extortioners can achieve
a half-half split in equilibrium with unconditional cooperators for sufficiently large
extortion factors. However, the presence of unbending players fundamentally changes
the dynamics and tilts the system to favor unbending cooperation. Most surprisingly,
extortioners cannot dominate at all regardless of how large their extortion factor is,
and the long-term frequency of unbending players is maintained almost as a constant.
Our analytical method is applicable to studying the evolutionary dynamics of multiple
strategies in structured populations. Our work provides insights into the interplay
between network reciprocity and direct reciprocity, revealing the role of unbending
strategies in enforcing fairness and suppressing extortion.

Keywords: spatial games, network reciprocity, cooperation, pair approximation

Submitted to: New J. Phys.


Unbending strategies shepherd cooperation 2

Introduction

A central aspect of human behavior is cooperation [1]. Cooperation is needed for


collective action problems, ranging from climate change [2, 3] to pandemic control [4].
However, cooperation incurs a cost to oneself and benefits others; therefore it can be
vulnerable to exploitation by defection, which pays no cost and free rides on others’
effort [5]. The Prisoner’s Dilemma game is commonly used to illustrate this tension [6].
Two individuals play this game simultaneously. If both cooperate, they each receive
a reward, R, for mutual cooperation. If one cooperates while the other defects, the
cooperator receives the sucker’s payoff, S, while the defector gets the payoff of the
temptation to defect, T . If both defect, they both receive the punishment of mutual
defection, P . We have T > R > P > S in order for the game to be classified as a
Prisoner’s Dilemma [7].
To resolve the conundrum of cooperation, different mechanisms have been
extensively studied [8]. One important mechanism, called direct reciprocity, is to repeat
the game and investigate how individuals can build mutually beneficial cooperation
when engaged in multiple encounters. This extension leads to the famous framework
of the Iterated Prisoner’s Dilemma (IPD) with many discoveries worthy of mention.
Among others, fair-minded Tit-for-Tat (‘I will if you will’) [9] and adaptive ‘Win-Stay,
Lose-Shift’ (keep using the current strategy after payoff outcomes R and T but switch
otherwise) [10] are notable. In noisy IPD games, generous Tit-for-Tat prevails over
unforgiving ones [9, 11]. Moreover, equalizer strategies are able to unilaterally set
any co-player’s payoff to a given value within the interval [P, R], therefore the name
equalizer [12]. More general than equalizers, Press and Dyson have discovered the
so-called zero-determinant (ZD) strategies, which are able to enforce a linear payoff
relation between themselves and their co-players [13]. A subclass of ZD strategies, called
extortionate ZD, is parameterized by the baseline payoff P and the extortion factor χ > 1
such that the two payoffs πX and πY of a ZD player X playing against an opponent Y
satisfy πX − P = χ(πY − P ). Extortionate ZD seems formidable but is revealed to
have the Achilles’ heel [14]: its dominance and extortion ability are impacted by the
underlying payoff structure and under T + S < 2P it can actually have a lower payoff
than Win-Stay, Lose-Shift. Inspired by the empirical results [15], a recent theoretical
study uncovers unbending strategies where extortion does not pay off and leads to lower
payoffs than being more fair [14].
Another important mechanism, called network or spatial reciprocity, is to consider
structured populations where individuals’ interactions are not random but exquisitely
patterned as a graph or a network, such as the square lattice [16, 17, 18]. The seminal
work of spatial games has ushered in an era of studying games on networks [19, 20, 21,
22, 23, 24, 25, 26, 27, 28]. Broadly speaking, population structure leads to assortment,
meaning like-with-alike, in a way that clusters of cooperators can resist invasion by
neighboring defectors and reaching a dynamic balance [16, 28]. In the donation game (a
simplified Prisoner’s Dilemma) where a cooperator pays a cost c for another to receive
Unbending strategies shepherd cooperation 3

a benefit b while a defector does nothing, the payoff matrix becomes R = b − c, S = −c,
T = b, and P = 0. Prior studies have quantified how network structure, in particular
the average degree k, impacts the evolution of cooperation, requiring b/c > k [21].
This celebrated result for the evolution of cooperation on networks states that
cooperators are able to prevail and be favored over defectors because of the network
clustering effect in structured populations. More generally, the impact of population
structure on strategy competition can be written as a more general form: σR + S >
T + σP , which says that cooperators are more abundant than defectors if this condition
holds with the coefficient σ summarizing the effect of population structure [29, 30].
Extending this formula to matrix games with multiple strategies yields similar conditions
that can be obtained under the mutation-selection equilibrium [31, 32, 33]. Besides these
analytical insights, there has been significant interest in studying games on networks
from the perspective of statistical physics [5], with numerous contributions from the
field (for a recent review, for example, see Ref. [34]).
Beyond simple strategies, a surge of curiosity has arisen in exploring the
evolutionary dynamics of prescribed IPD strategies in well-mixed [35, 36] and structured
populations [37, 38, 39, 40, 41]. Part of these efforts is to study extortionate ZD
from an evolutionary perspective [35, 36, 42, 43]. The intuition that the lack of
mutual cooperation among extortionate ZD makes them not evolutionary favorable.
However, prior work has shown the impact of population size on the evolutionary
advantage of extortionate ZD [35], including some other general classifications of IPD
strategies [44, 45, 46]. Interestingly, previous work found that extortionate ZD can
be a catalyst for cooperation [35, 37]. And simulations of a handful of IPD strategies
show that this holds both in well-mixed populations and on networks [38, 39, 40, 41].
Despite much progress made in this direction, it remains largely unknown how the
resilience of cooperation can be bolstered in the wake of increased extortion factors by
extortionate ZD aimed at securing even more inequality. Understanding the potential
interplay between spatial reciprocity and direct reciprocity is especially important in
light of recently discovered unbending strategies.
To address this issue, we use the IPD game with conventional payoff values (namely,
R = 3, S = 0, T = 5, P = 1), the same as in the well-known Axelrod tournament.
Aside from always cooperate (AllC) and extortionate ZD typically considered in previous
studies, we also include a third memory-one strategy called PSO Gambler. This strategy
has been optimized using the machine learning algorithm particle swarm optimization
(PSO) [47]. Notably, the PSO Gambler is found to have the unbending property [14]:
the best response of extortionate ZD when playing against a fixed unbending player is
to offer a fair split by letting the extortion factor be one. In other words, the larger
the extortion factor, the greater the payoff reduction compared to what it would have
been otherwise. Although we focus on these three particular strategies, our method
works for any IPD strategy and can be extended to study more than three strategies
in structured populations. We find that the presence of unbending individuals can
greatly enhance the resilience of cooperation in a synergistic way that promotes direct
Unbending strategies shepherd cooperation 4

reciprocity together with spatial reciprocity. In particular, the long-term frequency of


unbending PSO Gamblers remains almost constant, being able to mitigate the negative
impact enforced by extortionate ZD with increased extortion factors.

Model and Methods

Figure 1. Spatiotemporal coevolutionary dynamics of AllC (always cooperate),


extortionate zero-determinant strategy (ZD extortioner), and unbending strategy
(PSO Gambler) in repeated games. The sequential series of spatial snapshots shows
the invasion of a spatial population by mutations (indicated by solid arrows) and
competition dynamics under the death-birth update rule (indicated by dashed arrows).
The snapshots are taken from a stochastic simulation on a square lattice of size 10 × 10
with von Neumann neighborhood k = 4, selection strength β = 0.001, mutation rate
µ = 0.005, conventional payoff parameters R = 3, S = 0, T = 5, P = 1, χ = 1 for the
zero-determinant strategy, and [q1 , q2 , q3 , q4 ] = [1, 0.52173487, 0, 0.12050939] for the
memory-one PSO Gambler.

Here, we consider the evolutionary game dynamics of multiple strategies in spatial


populations. To focus specifically on the interplay between network reciprocity and
direct reciprocity, we consider strategies commonly used in the Iterated Prisoner
Dilemma games, including always cooperate (AllC), extortionate zero-determinant (ZD)
strategy (which is able to enforce an unfair split of payoffs with co-players, extortioners),
and a particular candidate out of the unbending strategies (which can cause the backfire
of extortion, PSO Gambler). The memory-one PSO Gambler used in our study is
Unbending strategies shepherd cooperation 5

qPSO = [1, 0.52173487, 0, 0.12050939] [47]. We consider a square lattice with the von
Neumann neighborhood (degree k = 4) and periodic boundary conditions. Each node
is occupied by an individual who adopts one of these three aforementioned strategies.
An individual i interacts with their immediate neighbors and accrues payoffs from their
pairwise interactions, πi . We use the exponential fitness function as fi = exp(βπi ),
where β is the intensity of selection.
To account for the success of these three strategies, we use the following 3 × 3
expected payoff matrix to characterize their repeated interactions:
AllC ZD PSO
 
AllC a11 a12 a13
ZD  a21 a22 a23  . (1)
PSO a31 a32 a33

Using the method originally introduced by Press and Dyson [13], these payoff entries
can be analytically calculated by quotients of two determinants (see Appendix).
Specifically, for AllC versus extortionate ZD with a given extortion factor χ, we
have
(R − P )(T − S)
a11 = R, a12 = P + ,
(R − S)χ + (T − R)
(2)
(R − P )(T − S)χ
a22 = P, a21 = P + .
(R − S)χ + (T − R)
Moreover, the average payoff of extortion ZD against PSO Gambler, a23 , can be written
as a23 = g(χ)/h(χ), where both g(·) and h(·) are quadratic functions of χ and g(χ)/h(χ)
is monotonically decreasing for χ > 1 with the maximum value of R at χ = 1 [14].
As for evolutionary updating, we consider death-birth updating with mutation. At
each time step, an individual is randomly chosen to die and its neighbors compete for
this vacant site with probability proportional to their fitness. A mutation occurs with
probability µ. The newly produced offspring is identical to its parent with probability
1 − µ; otherwise with probability µ, it randomly chooses one of the three strategies (see
Fig. 1). This process can also be interpreted in a cultural evolution setting with social
imitation and exploration rates.
We perform stochastic agent-based simulations with asynchronous updating and
average the frequencies of strategies over 1 × 107 times. On top of that, closed-form
predictions are feasible under the weak selection limit and with rare mutations. In
the limit of low mutations, the fate of a new mutant is determined, either reaching
fixation or going extinct, before the next mutant arises. Therefore, the population
spends most of the time in homogeneous states with transitions in between given by
the pairwise fixation probabilities, ρij , for 1 ≤ i, j ≤ 3 and i ̸= j: the probability that
a single mutant of strategy j invades and takes over the entire population of strategy
i [21]. The long-term frequencies of strategies can be approximated by the stationary
distribution of the corresponding embedded Markov chain with the following transition
Unbending strategies shepherd cooperation 6

matrix [48, 49, 50]:


 
1 − µ(ρ12 + ρ13 ) µρ12 µρ13
Λ= µρ21 1 − µ(ρ21 + ρ23 ) µρ23 . (3)
 
µρ31 µρ32 1 − µ(ρ31 + ρ32 )

Results

We run agent-based simulations of the spatial system with the three aforementioned
IPD strategies and are interested in their long-term frequencies. As depicted in Fig. 1,
the spatial snapshots show the rise and fall of the respective three strategies and
spatial population structures promote clustering (‘like-with-alike’) as foreseen. From
time to time, mutations arise in the population, following which the mutants can form
clusters and are likely to succeed in invading and taking over the system. Accordingly,
Fig 1 shows possible transitions among homogeneous population states from AllC to
PSO Gambler to extortionate ZD, along with the evolutionary dynamics of multiple
strategies. Most of the time, evolutionary dynamics involve pairwise competition,
but occasionally, all three strategies are present due to mutations, in particular as a
consequence of neutral drift when χ = 1.
As a base case for comparison, we consider the special limiting case where the
extortion factor χ = 1 for the extortionate ZD strategy, which becomes the well-
known Tit-for-Tat strategy [13], with the memory-one strategy representation as qTFT =
[1, 0, 1, 0]. This, in fact, leads to neutral dynamics among the three strategies considered
since they each receive an equal payoff of mutual cooperation R = 3. As such, the long-
term abundance of the three strategies should be equal to 1/3. It can also be observed
from Fig. 2a that our stochastic simulations confirm neutral drift dynamics, with the
three strategies being roughly equally present in the population.
In contrast, for an extortion factor χ > 1, the extortionate ZD strategy is able to
ensure a higher payoff (or at least equal) than any other strategy in a Prisoner’s Dilemma
satisfying T + S > 2P [14], which is why it is called extortioners in prior work [35]. The
resulting evolutionary dynamics is thereby no longer neutral (compared with Fig. 2a).
The extortion factor χ has a dual impact on extortioners: increasing χ makes them
more fiercely aggressive against AllC players, but on the other hand, it reduces their
absolute payoffs against PSO Gamblers. This stems from the PSO Gambler strategy
being demonstrated to have the unbending property: against a fixed unbending player,
any extortioner who intends to demand an unfairer payoff by increasing the extortion
factor will lead to a lower payoff. As a consequence, unbending individuals are able to
turn the tables against extortioners, suppress extortion, and shepherd cooperation. As
shown in Fig. 2b, the unbending strategy, PSO Gambler, emerges as the most abundant
in the long run, trailed by AllC, both are above 1/3, while extortioners are suppressed
and its abundance is below 1/3.
In order to understand the underlying evolutionary dynamics in Fig. 2, we now
consider pairwise competition dynamics that arise under rare mutations. In this limit,
Unbending strategies shepherd cooperation 7

Figure 2. Time evolution of spatial competition dynamics among three strategies in


repeated games: AllC, extortionate ZD, and PSO Gambler under rare mutations. Two
different extortion factors for the extortionate ZD strategy are considered: (a) χ = 1
which leads the extortioner to a Tit-for-Tat player, and (b) χ = 4 which enables the
extortioner X to unilaterally enforce an unfair payoff relation against its co-player Y as
sX − P = χ(sY − P ). As expected, the simulation from (a) indicates neutral drift with
equal frequencies of the three strategies. In contrast, the simulation from (b) suggests
that the unbending strategy PSO Gambler is most abundant, and its presence can
suppress unfair extortion with χ > 1 and promote cooperation in spatial populations.
Simulation parameters are as in Fig. 1, except that we also consider χ = 4 for the
extortionate ZD strategy.

the dynamics can be studied by the stochastic transitions between homogenous states.
The transitions are determined by the fixation probabilities ρij (see Model and Methods
and also the Appendix for technical details).
Under weak selection, the pairwise fixation probability ρij , the likelihood of a single
individual of strategy j taking over a spatial population of strategy i, has a closed-form
expression up to the first order of β:
1 (∗)
ρij = + β + O(β 2 ),
N 6(k − 1) (4)
2 2 2
(∗) = (k + 1) ajj + (2k − 2k − 1)aji − (k − k + 1)aij − (2k − 1)(k + 1)aii ,
Unbending strategies shepherd cooperation 8

where N is the total population size, and k is the degree of the lattice. This formula
is derived using the discrete random walk approach as Ref. [24] (see Appendix), but it
can also be obtained by the diffusion approximation method [21].
In the limit of low mutation (and weak selection), the long-term frequencies of
strategies, λi , can be approximated using the embedded Markov chain approach and
are related to the normalized left eigenvector of the transition matrix corresponding to
the largest eigenvalue one:
1
[λ1 , λ2 , λ3 ] = [γ1 , γ2 , γ3 ], (5)
γ1 + γ2 + γ3
where
γ1 = ρ21 ρ31 + ρ21 ρ32 + ρ31 ρ23 ,
γ2 = ρ31 ρ12 + ρ12 ρ32 + ρ32 ρ13 , (6)
γ3 = ρ21 ρ13 + ρ12 ρ23 + ρ13 ρ23 .
Under weak selection (β → 0), the condition for the stationary distribution of strategy
i, λi , to be greater than 1/3 can be expressed equivalently as
3
X
(ρji − ρij ) > 0. (7)
j=1

Intuitively, this inequality indicates that the influx into strategy i must exceed the
P P
outflux from it in the embedded Markov chain: j ρji > j ρij .
After substituting the fixation probabilities under weak selection in Eq. 4 and
simplifying the algebra, we obtain the following inequality for λi > 1/3, namely, natural
selection favors strategy i and its long-run abundance is greater than 1/3:
k+1 k+1
aii + āi∗ − ā∗i − ā∗∗ > 0, (8)
k−1 k−1
where ā∗∗ = 1/3 3i aii is the average payoff for two players using the same strategies,
P

āi∗ = 1/3 3j=1 aij is the average payoff of strategy i, and ā∗i = 1/3 3j=1 aji is the
P P

average payoff of players when playing against strategy i.


We note that this inequality is, in fact, a special case of the more general condition
for multiple strategies in structured populations under rare mutations (µ → 0) [29]. A
given strategy i is favored by natural selection if the inequality holds:
σ1 aii + āi∗ − ā∗i − σ1 ā∗∗ > 0, (9)
k+1
with the structural coefficient σ1 = k−1 for lattice populations and the number of
strategies n = 3.
When the extortion factor χ = 1, the resulting game dynamics are neutral, with
fixation probabilities ρij = 1/N for any pairs of i, j (Fig. 3a). Therefore, each strategy
has an equal frequency in the long run in the limit of low mutation. However, the
neutrality no longer holds for χ > 1 between AllC and extortionate ZD, as well as
between PSO Gambler and extortionate ZD, except for AllC and PSO Gambler.
Unbending strategies shepherd cooperation 9

a b
AllC AllC

25
1

ρ 31
ρ 31

09
0.0

0.0

63

=
=
=

ρ 13
11
ρ 13

0.0
0.0
0.0

=
2

0.0
ρ1

1
1
=

2
=

ρ1

0.0
0.0
χ=1 χ=4

=
1
ρ2

1
1

1
ρ2
ρ32 = 0.01 ρ32 = 0.00707
ZD PSO ZD PSO
ρ23 = 0.01 ρ23 = 0.01242

Figure 3. Pairwise competition dynamics under rare mutations. We show fixation


probabilities ρij between the three strategies considered in repeated games: AllC,
extortionate ZD (ZD extortioner) with (a) χ = 1 and (b) χ = 4, and unbending
strategy (PSO Gambler), labeled as i = 1, 2, 3, respectively. In panel (a), for χ = 1,
the game dynamics are neutral among all three strategies, and therefore the fixation
probability is ρij = 1/N where N is the population size. In panel (b), whereas for
χ = 4, the game dynamics remain neutral between AllD and PSO but no longer for
AllC vs ZD or ZD vs PSO. It is noteworthy that spatial structure can help AllC to
invade extortionate ZD (ρ21 > 1/N > ρ12 ) as opposed to well-mixed populations;
furthermore, compared to AllC, PSO is less likely to be invaded by ZD and more likely
to take over ZD. Together, extortion can be suppressed by spatial structure and the
presence of unbending strategies such as the PSO Gambler. Simulation parameters
are as in Fig. 1.

Even though extortionate ZD can now secure the highest payoffs when interacting
with AllC or PSO Gambler, it is unable to exploit the unbending PSO Gambler to the
same extent as AllC because unbending leads to a monotonic decrease in extortionate
ZD’s payoff with respect to χ. This makes PSO Gambler harder to invade by extortion
ZD and meanwhile easier to take over extortionate ZD, as compared to AllC (Fig. 2).
Altogether, the presence of spatial structure can help natural selection favor AllC over
extortion because of spatial assortment, and extortioners fail to do well when against
each other since they get neutralized and together receive the payoff P . Moreover,
unbending PSO Gambler, compared to AllC, can even further diminish the advantage
of extortion, if any, in spatial populations.
As demonstrated in Fig. 4, an increase in the extortion factor helps extortionate
ZD players increase in abundance as more benefits are squeezed from AllC, but their
extortion is greatly mitigated by the presence of an unbending strategy, the PSO
Gambler for example. These PSO Gamblers are able to make extortion backfire: the
greater χ, the less absolute payoff extortionate ZD can reap from unbending players.
Therefore, the presence of a third type of unbending greatly increases the system’s
resilience against extortion. The abundance of PSO Gamblers is only slightly impacted
by increases in χ, quickly reaching a plateau – in other words, it saturates much faster
Unbending strategies shepherd cooperation 10

Figure 4. Long-term frequencies of the three strategies – AllC, extortionate ZD


(extortioner), and unbending strategy (PSO Gambler) – in lattice populations. We find
good agreement between the simulation results (symbols) and analytical predictions
(lines), shown as a function of the extortion factor χ. Notably, the frequency of PSO
Gambler remains almost unchanged while the equilibrium frequency of extortioners
increases at the expense of AllC. In reference to the competition of AllC vs ZD in spatial
populations (dashed lines), ZD can achieve a half-half split in equilibrium with AllC
for sufficiently large extortion factors, despite the role of spatial structure in favoring
AllC. However, the presence of the unbending strategy, PSO Gambler, fundamentally
affects the underlying pairwise competition dynamics (as depicted in Fig. 3), thereby
suppressing extortion regardless of how large χ is. Simulation parameters are as in
Fig. 1., except that we vary the extortion factor χ.

than the impacts on AllC and extortionate ZD. For the limit χ → ∞, the abundance
of extortionate ZD reaches a limit that is strictly below 1/3. Our simulation results
agree well with the analytical predictions (Fig. 4). This is in sharp contrast with the
scenario where PSO Gambler is absent and extortionate ZD can achieve a half-half
split in abundance with AllC at the limit χ → ∞ (see Appendix). Taken together,
unbending strategies such as the PSO Gambler can help shepherd cooperation and
suppress extortion in spatial populations.
We also note the jump discontinuity of the long-term abundance at χ = 1 versus
χ > 1. This is due to the fact that the stochastic memory-one ZD strategy reaches the
boundary deterministic strategy Tit-for-Tat, thereby causing the pairwise moves CC to
become the only attracting state, as opposed to being ergodic among all four possible
pairwise states CC, CD, DC, and DD.

Discussions and Conclusion

As a powerful memory-one strategy, an extortionate zero-determinant player can


dominate any other co-player (or tie at worst) in the conventional Prisoner’s Dilemma
Unbending strategies shepherd cooperation 11

game. A recent study reveals that unbending players, when fixed, can render the best
response of an extortionate ZD player to be fair by letting their extortion factor approach
one [14]. Moreover, unbending players can dominate ZD players even if the underlying
game remains a Prisoner’s Dilemma game but is of a more adversarial nature featuring
T + S < 2P . In this work, we deepen our understanding of unbending strategies
by showing their capacity to enhance spatial cooperation while suppressing extortion.
The intuition is that unbending strategies are neutral with AllC while they reduce the
payoffs of extortionate ZD strategies due to their unbending properties. Therefore, the
presence of unbending individuals can drastically change the invasion dynamics among
them under the mutation-selection equilibrium.
Among the body of previous research, certain work has demonstrated that
extortionate ZD players cannot be evolutionary successful unless they become more
generous [36]. These studies typically have been done in well-mixed populations
with multiple strategies [35]. It is evident that population size has an effect on the
evolutionary advantage of extortion. On the one hand, extortion can dominate in small-
sized populations [35]. On the other hand, population structure promotes assortment
(termed as network or spatial reciprocity [16, 8]), which can further strengthen the
advantage of cooperation. Depending on the payoff structure parameters, such as the
benefit-to-cost ratio in donation games or other parameterizations of the Prisoner’s
Dilemma, increasing the extortion factor χ can help ZD players dominate other strategies
and subsequently provide an evolutionary pathway to cooperation, thereby acting as a
catalyst for the evolution of cooperation. These insights are obtained in both well-mixed
and structured populations [35, 37]. Here, the contribution of our current work lies in
demonstrating how unbending players can further foster direct reciprocity and suppress
extortion, thus increasing the resilience of cooperation against extortion.
In addition to fixation probabilities, another important quantity of interest is the
conditional fixation time [51]. Prior work has shown that fixation time strongly depends
on the type of game interactions (payoff structure in general) in finite, well-mixed
populations [51, 52]. For instance, the fixation time is exponential when the underlying
game is of the snowdrift type [51]. Moreover, spatial structure can promote exceedingly
long co-existence even if the underlying game is of the Prisoner’s Dilemma type [16].
In our current study, we have a snowdrift game both between AllC and extortionate
ZD, and between extortionate ZD and PSO Gambler, and a neutral game between
AllC and PSO Gambler. Thus, the fixation time can be prohibitively long for non-
weak selection strength in well-mixed populations, not to mention the role of spatial
structure in promoting coexistence [19]. That said, our analytical approach based on
pairwise invasion dynamics can no longer be employed because fixation takes exceedingly
long and, as a consequence, we can only observe the co-existence of AllC, extortionate
ZD, and PSO Gambler while one strategy being fixed in the population becomes an
extremely rare event. In this case, we are compelled to rely on agent-based simulations
and extended pair approximation methods for multiple strategies to understand the
dynamics. Previous studies utilizing simulations have focused on non-weak selection
Unbending strategies shepherd cooperation 12

and offered some insights into this regime [37, 38, 39, 53, 40, 41].
In conclusion, we have analytically and by means of agent-based simulations
investigated how introducing unbending strategies can help suppress extortion
and shepherd cooperation in spatial populations. We have derived closed-form
approximations for the long-term frequencies of three strategies, AllC, extortionate
ZD, and the unbending strategy PSO Gambler, under weak selection and in the limit
of low mutation. We find that the presence of unbending strategies can restrain the
abundance of extortion ZD no matter how large the extortion factor is, whereas the
impact of the extortion factor has little effect on the long-run abundance of unbending
strategies. Therefore, unbending individuals can strengthen the resilience of spatial
cooperation. Although we demonstrate our general method through a particular
candidate of unbending strategy, the PSO Gambler, our approach can apply to study
broader contexts such as the evolutionary dynamics of multiple powerful strategies in
the repeated multiplayer games [54, 55, 56, 57, 58], multiplex networks [59], higher order
networks [60], and enforcing fairness in human-AI systems [61, 62], providing insights
into understanding the interplay between network reciprocity and direct reciprocity.

Acknowledgments

We would like to express our heartfelt gratitude to Professor Long Wang on the occasion
of his 60th birthday. X.C. gratefully acknowledges the support by Beijing Natural
Science Foundation (grant no. 1244045). F.F. is grateful for support from the Bill &
Melinda Gates Foundation (award no. OPP1217336).

Author Contributions

Z.C., X.C., and F.F. conceived the model; Z.C. and Y.G. performed calculations and
analyses and plotted the figures; Z.C., X.C., and F.F. wrote the manuscript. All authors
give final approval of publication.

Competing Interests

The authors declare that they have no competing financial interests.

Appendix A.

In our study, we consider a population of size N under a network structure of degree k


where individuals can be of strategy A or B. The corresponding payoff matrix between
these two strategies is (a, b, c, d). For the evolution process, the death-birth update rule
is used and the contribution of payoff to fitness is described by the selection strength β.
We first work on pairwise invasions with no mutations (µ = 0) and then consider the
evolutionary dynamics of a triple of strategies under rare mutations (µ → 0).
Unbending strategies shepherd cooperation 13

Appendix A.1. Fixation probabilities


The probability of randomly picking an individual of type X is denoted by pX . Moreover,
the probability of randomly picking an XY pair is denoted by pXY and the conditional
probability of finding a neighbor of type X given a focal individual of type Y is referred
to as qX|Y . It follows immediately that pXY = pY qX|Y . We also have pA + pB = 1,
pA = pAA + pBA , pB = pAB + pBB , and pAB = pBA . Therefore, the two probabilities pA
and pAA can be considered as the independent variables of the population. In the limit
of weak selection where β ≪ 1, it has been proven in previous work that we can further
obtain an elegant relation between the global frequency pX and the local density qX|X
via the degree k. To be more precise, the following two identities hold:
1 − pA 1 − pB
qA|A = pA + , qB|B = pB + . (A.1)
k−1 k−1
We assume that there are i individuals of strategy A and N − i individuals of
strategy B at the moment. If a focal A individual is selected, let the numbers of its
neighbors of type A and type B be nA B
A (i) and nA (i), respectively. On average, we have
nA B
A (i) = kqA|A and nA (i) = kqB|A . The payoffs of these two types of neighbors are

πAA (i) = (k − 1)qA|A + 1 · a + (k − 1)qB|A · b,


 
(A.2)
πAB (i) = (k − 1)qA|B + 1 · c + (k − 1)qB|B · d.
 

Similarly, if a focal B individual is selected, let the numbers of its neighbors of type A
and type B be nA B A
B (i) and nB (i), respectively. On average, we have nB (i) = kqA|B and
nB
B (i) = kqB|B . The payoffs of these two types of neighbors are

πBA (i) = (k − 1)qA|A · a + (k − 1)qB|A + 1 · b,


 
(A.3)
πBB (i) = (k − 1)qA|B · c + (k − 1)qB|B + 1 · d,
 

The corresponding fitness can be written as


fAA (i) = exp[βπAA (i)], fAB (i) = exp[βπAB (i)].
(A.4)
fBA (i) = exp[βπBA (i)], fBB (i) = exp[βπBB (i)].
Based on the update rule, the probability that the number of A individuals decreases
by one is
nB B
A (i)fA (i)
TA− (i) = pA , (A.5)
nA A B B
A (i)fA (i) + nA (i)fA (i)
whereas the probability that the number of A individuals increases by one is
nA A
B (i)fB (i)
TA+ (i) = pB . (A.6)
nA A B B
B (i)fB (i) + nB (i)fB (i)
The evolution process can be regarded as a finite-state Markov chain in the interval
[0, N ], where the endpoints 0 and N are the absorbing states. After some routine
calculations, we can obtain the fixation probability for strategy A:
1
ρA = PN −1 Qi TA− (j) . (A.7)
1 + i=1 j=1 T + (j)
A
Unbending strategies shepherd cooperation 14

We can further get the condition for natural selection to favor strategy A:
N −1
X Y T − (j) i
1 A
ρA > ⇔1+ + < N. (A.8)
N i=1 j=1
TA (j)

We know that TA− (i) = TA+ (i) = pAB when the selection strength β = 0. In the
limit of weak selection where β ≪ 1, we can take the Taylor expansion of the ratio
TA− (i)/TA+ (i) to the first order and get
TA− (j)
= 1 + β{qA|A [πAB (i) − πAA (i)] + qB|B [πBB (i) − πBA (i)]} + O(β 2 ). (A.9)
TA+ (j)
As such, the condition in A.8 is simplified as
N
X −1 X
i
{qA|A [πAA (i) − πAB (i)] + qB|B [πBA (i) − πBB (i)]} > 0. (A.10)
i=1 j=1

After substituting A.1 - A.4 into the above inequality, it becomes


N
X −1 X
i
k(Fa + Fb − Fc − Fd ) > 0, (A.11)
i=1 j=1

where
pA (k 2 − k − 2) + (k + 1)
Fa = · a,
k(k − 1)
pA (−k 2 + k + 2) + (k 2 − k − 1)
Fb = · b,
k(k − 1)
(A.12)
pA (k 2 − k − 2) + 1
Fc = · c,
k(k − 1)
pA (−k 2 + k + 2) + (k 2 − 1)
Fd = · d.
k(k − 1)
If the population size is large enough, that is, N ≫ k, a discrete sum can be
approximated by a continuous integral. Therefore, we can replace discrete variables
i, j ∈ {0, 1, 2, · · · , N } by continuous variables u, v ∈ (0, 1). And the condition in A.11 is
simplified as
Z 1Z u
(Ha + Hb − Hc − Hd )dvdu > 0, (A.13)
0 0
where
Ha = [v(k 2 − k − 2) + (k + 1)] · a,
Hb = [v(−k 2 + k + 2) + (k 2 − k − 1)] · b,
(A.14)
Hc = [v(k 2 − k − 2) + 1] · c,
Hd = [v(−k 2 + k + 2) + (k 2 − 1)] · d,
By integrating the four parts, we finally get the condition for natural selection to favor
strategy A, determined by the payoff structure and the network structure:
(k + 1)2 a + (2k 2 − 2k − 1)b − (k 2 − k + 1)c − (2k − 1)(k + 1)d. (A.15)
Unbending strategies shepherd cooperation 15

In like manner, we obtain the fixation probabilities for both strategies as linear
functions of the selection strength:
1 (k + 1)2 a + (2k 2 − 2k − 1)b − (k 2 − k + 1)c − (2k − 1)(k + 1)d
ρA = +β· ,
N 6(k − 1)
(A.16)
1 (k + 1)2 d + (2k 2 − 2k − 1)c − (k 2 − k + 1)b − (2k − 1)(k + 1)a
ρB = +β· .
N 6(k − 1)

Appendix A.2. Average payoffs of IPD strategies


We consider the conventional IPD game with payoff matrix (R, S, T, P ) = (3, 0, 5, 1).
Then, we focus on three strategies: AllC (cooperator), extortionate ZD (extortioner),
and unbending strategy (PSO Gambler). Since these strategies are memory-one, each
of them can be represented by a four-component vector q = [qCC , qCD , qDC , qDD ]. Here,
qCC is the probability of cooperating in the next round given the outcome CC in the
current round and the same applies to the other components. We denote the vectors of
AllC, extortionate ZD, and PSO Gambler by q1 , q2 , and q3 , respectively.
To be more precise, we have
     
1 1 − ϕ(R − P )(χ − 1) 1
 1   1 − ϕ[(T − P )χ + (P − S)]   0.52173487 
qAllC =   , qZD =  , qPSO =   , (A.17)
1 ϕ[(P − S)χ + (T − P )] 0

1 0 0.12050939
where χ ≥ 1 is the extortion factor and 0 ≤ ϕ ≤ (P − S)/[(T − P )χ + (P − S)] is
the normalization factor. Although ϕ has a non-trivial effect on the payoffs between
extortionate ZD and the co-player in general, the two strategies discussed here can
actually trivialize the influence enforced by ϕ.
We use the method (taking the quotient of the determinants of two matrices)
introduced by Press and Dyson in their work [13] to calculate the expected payoffs
sX and sY between a pair of players X and Y using strategy p and q, respectively:
D(p, q, SX ) D(p, q, SY )
sX = , sY = , (A.18)
D(p, q, I) D(p, q, I)
where SX = (R, S, T, P ), SY = (R, T, S, P ), and I = (1, 1, 1, 1). We refer to the 3 × 3
expected payoff matrix of AllC, extortionate ZD, and PSO Gambler as
AllC ZD PSO
 
AllC a11 a12 a13
ZD  a21 a22 a23  . (A.19)
PSO a31 a32 a33

It always holds that a11 = a13 = a31 = a33 = R. When χ = 1, it follows that
q2 = [1, 0, 1, 0], namely, extortionate ZD degenerates to Tit-for-Tat. We further have
Unbending strategies shepherd cooperation 16

a11 = a12 = a21 = a22 = a23 = a32 = R. In contrast, when χ > 1, we have
(R − P )(T − S)
a11 = R, a12 = P + ,
(R − S)χ + (T − R)
(A.20)
(R − P )(T − S)χ
a22 = P, a21 =P+ .
(R − S)χ + (T − R)
And the remaining two payoffs a23 and a32 become quadratic rational functions of χ.
In the conventional IPD game, the four payoffs decided by χ can be (approximately)
written as
3(χ + 4) 13χ + 2
a12 = , a21 = ,
3χ + 2 3χ + 2
0.455742281814226(χ − 0.598718917613494)(4χ + 1)
a23 = , (A.21)
χ2 − 0.422337214095301χ − 0.272861525678518
χ2 + 0.400631913161605χ − 0.48622813248306
a32 = 2 .
χ − 0.422337214095301χ − 0.272861525678518
Moreover, we can get the extreme values of these payoffs by letting χ → +∞:
13
lim a12 = lim a32 = 1, lim a21 = , lim a23 = 1.82296912725691. (A.22)
χ→+∞ χ→+∞ χ→+∞ 3 χ→+∞

Appendix A.3. Impact of extortion factor on long-term frequencies of IPD strategies


In a similar manner, we denote the pairwise fixation probabilities between AllC (strategy
1), extortionate ZD (strategy 2), and PSO Gambler (strategy 3) by ρij ’s and the long-
term frequencies of these strategies by λ̃i ’s when unbending Gamblers are absent from
the population (the null case) or λi ’s when all three strategies are present.
Based on the work of Fudenberg and Imhof [48], we use the embedded Markov chain
approach and obtain these frequencies as elements of the normalized left eigenvector of
the transition matrix corresponding to the largest eigenvalue one. For two strategies,
we have
1
[λ̃1 , λ̃2 ] = [ρ21 , ρ12 ]. (A.23)
ρ21 + ρ12
And for three strategies, we have
1
[λ1 , λ2 , λ3 ] = [γ1 , γ2 , γ3 ], (A.24)
γ1 + γ2 + γ3
where
γ1 = ρ21 ρ31 + ρ21 ρ32 + ρ31 ρ23 ,
γ2 = ρ31 ρ12 + ρ12 ρ32 + ρ32 ρ13 , (A.25)
γ3 = ρ21 ρ13 + ρ12 ρ23 + ρ13 ρ23 .
To study the impact of the extortion factor χ on ρij and λi , we consider a specific
example where N = 100, k = 4, β = 0.001 to get concrete numerical results. The
same approach applies to any other combination of parameters as long as the conditions
N ≫ k and β ≪ 1 are satisfied. After some tedious calculations, we get
Unbending strategies shepherd cooperation 17

• when χ = 1,
1
ρ21 = ρ12 = ρ31 = ρ23 = ρ31 = ρ13 = , (A.26)
100
• when χ > 1,
28χ + 34.5 28χ + 4.5
ρ21 = , ρ12 = ,
900(3χ + 2) 900(3χ + 2)
2.14880483211515χ2 − 1.03438540268879χ − 0.454016698936298
ρ32 =
300(χ2 − 0.422337214095301χ − 0.272861525678518)
(A.27)
3.655023355761χ2 + 1.25725839044252χ + 1.12775971437606
ρ23 = ,
300(χ2 − 0.422337214095301χ − 0.272861525678518)
1
ρ31 = ρ13 = .
100
It is clear that these fixation probabilities can be seen as functions of the extortion
factor. The curves of ρij ’s with respect to χ are given in Fig A1.

Figure A1. Fixation probabilities between the three strategies considered in repeated
games: AllC, extortionate zero-determinant strategy (ZD extortioner) with χ ≥ 1, and
unbending strategy (PSO Gambler). Simulation parameters are as in Fig. 1, except
that we vary the extortion factor χ for the extortionate ZD strategy.

Combined with A.23 - A.25, we further get


• when χ = 1, [λ̃1 , λ̃2 ] = [1/2, 1/2] and [λ1 , λ2 , λ3 ] = [1/3, 1/3, 1/3],
• when χ > 1,
28χ + 34.5 28χ + 4.5
[λ̃1 , λ̃2 ] = [ , ], (A.28)
56χ + 39 56χ + 39
and
1 4.49726233156984χ3 + 2.68590913537835χ2 − 3.11317031657329χ − 1.18897071999626
λ1 = · ,
3 4.50655997677753χ3 + 1.20760596881118χ2 − 2.48101898478848χ − 0.813081399282176
1 3.74415306974691χ3 − 0.565174340912924χ2 − 1.42345389906067χ − 0.25738900597642
λ2 = ·
3 4.50655997677753χ3 + 1.20760596881118χ2 − 2.48101898478848χ − 0.813081399282176
(A.29)
1 5.27826452901584χ3 + 1.50208311196812χ2 − 2.90643273873147χ − 0.9928844718738χ
λ3 = · .
3 4.50655997677753χ3 + 1.20760596881118χ2 − 2.48101898478848χ − 0.813081399282176
Unbending strategies shepherd cooperation 18

As before, these abundances can be seen as functions of the extortion factor. The curves
of λ̃i ’s and λi ’s with respect to χ are given in Fig 4. In particular, we can get the extreme
values of these abundances by letting χ → +∞:
1
lim λ̃1 = lim λ̃2 = ,
χ→+∞ χ→+∞ 2
lim λ1 = 0.332645621401128,
χ→+∞
(A.30)
lim λ2 = 0.276940954892473,
χ→+∞

lim λ3 = 0.390413423706399,
χ→+∞
It can be observed from Fig 4 that the abundance of PSO Gambler is only slightly
impacted by increases in the extortion factor, compared with those of AllC and
extortionate ZD. To understand the trends in a more intuitive way, we differentiate
λi ’s with respective to χ and get their corresponding derivatives. The curves of λ̇i ’s are
shown in Fig A2.

Figure A2. Derivatives of long-term frequencies of the three strategies – AllC,


extortionate ZD, and PSO Gambler – in lattice populations. Simulation parameters
are as in Fig. 1, except that we vary the extortion factor χ.

Appendix B.

Data Availability: All the data and analyses pertaining to this work have been included
in the main text.
Code Availability: The source code for reproducing the results is available at the GitHub
repository (https://github.com/fufeng/unbending3S).

Reference

[1] Perc, M. et al. Statistical physics of human cooperation. Physics Reports 687, 1–51 (2017).
Unbending strategies shepherd cooperation 19

[2] Wang, J., Fu, F., Wu, T. & Wang, L. Emergence of social cooperation in threshold public goods
games with collective risk. Physical Review E 80, 016101 (2009).
[3] Vasconcelos, V. V., Santos, F. C., Pacheco, J. M. & Levin, S. A. Climate policies under wealth
inequality. Proceedings of the National Academy of Sciences 111, 2212–2216 (2014).
[4] Glaubitz, A. & Fu, F. Social dilemma of non-pharmaceutical interventions. arXiv preprint
arXiv:2404.07829 (2024).
[5] Hauert, C. & Szabó, G. Game theory and physics. American Journal of Physics 73, 405–414
(2005).
[6] Axelrod, R. & Hamilton, W. D. The evolution of cooperation. science 211, 1390–1396 (1981).
[7] Rapoport, A. Prisoner’s dilemma. In Game theory, 199–204 (Springer, 1989).
[8] Nowak, M. A. Five rules for the evolution of cooperation. science 314, 1560–1563 (2006).
[9] Nowak, M. A. & Sigmund, K. Tit for tat in heterogeneous populations. Nature 355, 250–253
(1992).
[10] Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the
prisoner’s dilemma game. Nature 364, 56–58 (1993).
[11] Molander, P. The optimal level of generosity in a selfish, uncertain environment. Journal of
Conflict Resolution 29, 611–618 (1985).
[12] Boerlijst, M. C., Nowak, M. A. & Sigmund, K. Equal pay for all prisoners. The American
mathematical monthly 104, 303–305 (1997).
[13] Press, W. H. & Dyson, F. J. Iterated prisoner’s dilemma contains strategies that dominate any
evolutionary opponent. Proceedings of the National Academy of Sciences 109, 10409–10413
(2012).
[14] Chen, X. & Fu, F. Outlearning extortioners: unbending strategies can foster reciprocal fairness
and cooperation. PNAS Nexus 2, pgad176 (2023).
[15] Hilbe, C., Röhl, T. & Milinski, M. Extortion subdues human players but is finally punished in
the prisoner’s dilemma. Nature communications 5, 3976 (2014).
[16] Nowak, M. A. & May, R. M. Evolutionary games and spatial chaos. nature 359, 826–829 (1992).
[17] Lindgren, K. & Nordahl, M. G. Evolutionary dynamics of spatial games. Physica D: Nonlinear
Phenomena 75, 292–309 (1994).
[18] Perc, M. Coherence resonance in a spatial prisoner’s dilemma game. New Journal of Physics 8,
22 (2006).
[19] Hauert, C. & Doebeli, M. Spatial structure often inhibits the evolution of cooperation in the
snowdrift game. Nature 428, 643–646 (2004).
[20] Santos, F. C. & Pacheco, J. M. Scale-free networks provide a unifying framework for the emergence
of cooperation. Physical review letters 95, 098104 (2005).
[21] Ohtsuki, H., Hauert, C., Lieberman, E. & Nowak, M. A. A simple rule for the evolution of
cooperation on graphs and social networks. Nature 441, 502–505 (2006).
[22] Poncela, J., Gómez-Gardenes, J., Florı́a, L. M. & Moreno, Y. Robustness of cooperation in the
evolutionary prisoner’s dilemma on complex networks. New Journal of Physics 9, 184 (2007).
[23] Szolnoki, A. & Perc, M. Coevolution of teaching activity promotes cooperation. New Journal of
Physics 10, 043036 (2008).
[24] Fu, F., Wang, L., Nowak, M. A. & Hauert, C. Evolutionary dynamics on graphs: Efficient method
for weak selection. Physical Review E 79, 046707 (2009).
[25] Jackson, M. O. & Zenou, Y. Games on networks. In Handbook of game theory with economic
applications, vol. 4, 95–163 (Elsevier, 2015).
[26] Su, Q., Li, A., Zhou, L. & Wang, L. Interactive diversity promotes the evolution of cooperation
in structured populations. New Journal of Physics 18, 103007 (2016).
[27] Perez-Martinez, H., Gracia-Lazaro, C., Dercole, F. & Moreno, Y. Cooperation in costly-access
environments. New Journal of Physics 24, 083005 (2022).
[28] Wu, T., Fu, F. & Wang, L. Evolutionary games and spatial periodicity. Journal of Automation
and Intelligence 2, 79–86 (2023).
Unbending strategies shepherd cooperation 20

[29] Tarnita, C. E., Ohtsuki, H., Antal, T., Fu, F. & Nowak, M. A. Strategy selection in structured
populations. Journal of theoretical biology 259, 570–581 (2009).
[30] Allen, B. et al. Evolutionary dynamics on any population structure. Nature 544, 227–230 (2017).
[31] Antal, T., Traulsen, A., Ohtsuki, H., Tarnita, C. E. & Nowak, M. A. Mutation-selection
equilibrium in games with multiple strategies. Journal of theoretical biology 258, 614–622 (2009).
[32] Tarnita, C. E., Wage, N. & Nowak, M. A. Multiple strategies in structured populations.
Proceedings of the National Academy of Sciences 108, 2334–2337 (2011).
[33] McAvoy, A. & Allen, B. Fixation probabilities in evolutionary dynamics under weak selection.
Journal of Mathematical Biology 82, 14 (2021).
[34] Jusup, M. et al. Social physics. Physics Reports 948, 1–148 (2022).
[35] Hilbe, C., Nowak, M. A. & Sigmund, K. Evolution of extortion in iterated prisoner’s dilemma
games. Proceedings of the National Academy of Sciences 110, 6913–6918 (2013).
[36] Stewart, A. J. & Plotkin, J. B. From extortion to generosity, evolution in the iterated prisoner’s
dilemma. Proceedings of the National Academy of Sciences 110, 15348–15353 (2013).
[37] Szolnoki, A. & Perc, M. Evolution of extortion in structured populations. Physical Review E 89,
022804 (2014).
[38] Szolnoki, A. & Perc, M. Defection and extortion as unexpected catalysts of unconditional
cooperation in structured populations. Scientific reports 4, 5496 (2014).
[39] Wu, Z.-X. & Rong, Z. Boosting cooperation by involving extortion in spatial prisoner’s dilemma
games. Physical Review E 90, 062102 (2014).
[40] Xu, X., Rong, Z., Wu, Z.-X., Zhou, T. & Tse, C. K. Extortion provides alternative routes to the
evolution of cooperation in structured populations. Physical Review E 95, 052302 (2017).
[41] Mao, Y., Xu, X., Rong, Z. & Wu, Z.-X. The emergence of cooperation-extortion alliance on
scale-free networks with normalized payoff. Europhysics Letters 122, 50005 (2018).
[42] Hilbe, C., Nowak, M. A. & Traulsen, A. Adaptive dynamics of extortion and compliance. PloS
one 8, e77886 (2013).
[43] Chen, X., Wang, L. & Fu, F. The intricate geometry of zero-determinant strategies underlying
evolutionary adaptation from extortion to generosity. New Journal of Physics 24, 103001 (2022).
[44] Akin, E. The iterated prisoner’s dilemma: good strategies and their dynamics. Ergodic Theory,
Advances in Dynamical Systems 77–107 (2016).
[45] Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nature human
behaviour 2, 469–477 (2018).
[46] Chen, X. & Fu, F. Identifying bridges and catalysts for persistent cooperation using network-based
approach. In 2023 42nd Chinese Control Conference (CCC), 8064–8069 (IEEE, 2023).
[47] Harper, M. et al. Reinforcement learning produces dominant strategies for the iterated prisoner?s
dilemma. PloS one 12, e0188046 (2017).
[48] Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. Journal of Economic
Theory 131, 251–262 (2006).
[49] Wu, B., Gokhale, C. S., Wang, L. & Traulsen, A. How small are small mutation rates? Journal
of mathematical biology 64, 803–827 (2012).
[50] McAvoy, A. Comment on “imitation processes with small mutations”[j. econ. theory 131 (2006)
251–262]. Journal of Economic Theory 159, 66–69 (2015).
[51] Antal, T. & Scheuring, I. Fixation of strategies for an evolutionary game in finite populations.
Bulletin of mathematical biology 68, 1923–1944 (2006).
[52] Altrock, P. M. & Traulsen, A. Fixation times in evolutionary games under weak selection. New
Journal of Physics 11, 013012 (2009).
[53] Hao, D., Rong, Z. & Zhou, T. Extortion under uncertainty: Zero-determinant strategies in noisy
games. Physical Review E 91, 052803 (2015).
[54] Hilbe, C., Wu, B., Traulsen, A. & Nowak, M. A. Cooperation and control in multiplayer social
dilemmas. Proceedings of the National Academy of Sciences 111, 16425–16430 (2014).
[55] Pan, L., Hao, D., Rong, Z. & Zhou, T. Zero-determinant strategies in iterated public goods game.
Unbending strategies shepherd cooperation 21

Scientific reports 5, 13096 (2015).


[56] Szolnoki, A. & Chen, X. Environmental feedback drives cooperation in spatial social dilemmas.
Europhysics Letters 120, 58001 (2018).
[57] Shao, Y., Wang, X. & Fu, F. Evolutionary dynamics of group cooperation with asymmetrical
environmental feedback. Europhysics Letters 126, 40005 (2019).
[58] Wang, X., Zheng, Z. & Fu, F. Steering eco-evolutionary game dynamics with manifold control.
Proceedings of the Royal Society A 476, 20190643 (2020).
[59] Battiston, F., Perc, M. & Latora, V. Determinants of public cooperation in multiplex networks.
New Journal of Physics 19, 073017 (2017).
[60] Alvarez-Rodriguez, U. et al. Evolutionary dynamics of higher-order interactions in social networks.
Nature Human Behaviour 5, 586–595 (2021).
[61] Santos, F. P., Pacheco, J. M., Paiva, A. & Santos, F. C. Evolution of collective fairness in
hybrid populations of humans and agents. In Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 33, 6146–6153 (2019).
[62] Wang, L., Fu, F. & Chen, X. Mathematics of multi-agent learning systems at the interface of
game theory and artificial intelligence. Science China Information Sciences 67, 166201 (2024).

You might also like