Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st
Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st
Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st
Volume 12
Series Editor
Tamer Başar
Editorial Board
Mathematics Subject Classification (2010): 91A15, 91A22, 91A23, 91A24, 91A25, 91A80
The Annals of the International Society of Dynamic Games have a strong track
record of reporting recent advances in dynamic games by selecting articles primarily
from papers based on presentations at an international symposium of the Society
of Dynamic Games. This edition, Volume 12, continues the tradition, with most
contributions connected to the 14th International Symposium on Dynamic Games
and Applications held in Banff, Alberta, Canada in June 2010. The symposium was
cosponsored by St. Francis Xavier University, Antigonish, Nova Scotia, Canada; by
the Group for Research in Decision Analysis (GERARD); and by the Chair in Game
Theory and Management, HEC Montréal, Canada.
The volume contains 20 chapters that have been peer-reviewed according to
the standards of the international journals in game theory and its applications.
The chapters are organized into four parts: evolutionary game theory (Part I),
theoretical developments in dynamic and differential games (Part II), pursuit-
evasion games and search games (Part III), and applications of dynamic games
(Part IV). Beginning with its first volume in 2011, the journal Dynamic Games and
Applications has provided another important venue for the dissemination of related
research. Combined with the Annals, this development points to a bright future for
the theory of dynamic games as it continues to evolve.
Part I is devoted to evolutionary game theory and applications. It contains four
chapters.
David Ramsey examines age-structured game-theoretic models of mating behav-
ior in biological species, a topic of early interest when evolutionary game theory
began in the 1970s. Ramsey extends recent progress in this area first by allowing
the individual mating rate to depend on the proportion searching for mates and then
by incorporating asymmetries in the newborn sex ratio or the time for which males
and females are fertile. An iterative best-response procedure is used to generate the
equilibrium age distribution of fertile individuals.
Mike Mesterton-Gibbons and Tom Sherratt consider the evolutionary conse-
quences of signaling and of dominance in conflicts between two individuals. In
particular, it is shown that, when dominance over the opponent is sufficiently
advantageous, the evolutionarily stable strategy (ESS) is for only winners of the
vii
viii Preface
conflict to signal in long contests and for neither winners nor losers to signal in
short contests.
Quanyan Zhu, Hamidou Tembine, and Tamer Başar formulate a multiple-access
control game and show that there is a convex set of pure strategy Nash equilibria.
The paper also addresses how to select one equilibrium from this set through game-
theoretic solutions such as ESS as well as through the long-run behavior of standard
evolutionary dynamics applied to this game that has a continuum of pure strategies.
Andrei Akhmetzhanov, Frédéric Grognard, Ludovic Mailleret, and Pierre Bern-
hard study the evolution of a consumer–resource system assuming that the repro-
duction rate of the resource population is constant. The consumers’ optimal behavior
is found over one season when they all act for the common good. The authors then
show that selfish mutants can successfully invade this system but are eventually as
vulnerable to invasion as the initially cooperative residents.
Part II contains eight chapters on theoretical developments of dynamic and
differential games.
Sergey Chistyakov and Leon Petrosyan analyze coalition issues in m-person
differential games with prescribed duration and integral payoffs. They show that
components of the Shapley value are absolutely continuous and thus differentiable
functions along any admissible trajectory.
Yurii Averboukh studies two-player, non-zero-sum differential games and charac-
terizes the set of Nash equilibrium payoffs in terms of nonsmooth analysis. He also
obtains sufficient conditions for a pair of continuous payoff functions to generate a
Nash equilibrium.
Anne Souquière studies two-player, non-zero-sum differential games played in
mixed strategies and characterizes the set of Nash equilibrium payoffs in this
framework. She shows in particular that the set of publicly correlated equilibrium
payoffs is the same as the set of Nash equilibrium payoffs using mixed strategies.
Dean Carlson and George Leitmann explain how to solve non-zero-sum differ-
ential games with equality constraints by using a penalty method approach. Under
the assumption that the penalized problem has an open-loop Nash equilibrium,
they show that this open-loop Nash equilibrium converges to an open-loop Nash
equilibrium for the constrained problem.
Paul Frihauf, Miroslav Krstic, and Tamer Başar investigate how to approximate
the stable Nash equilibria of a game by solving a differential equation in which the
players only need to measure their own payoff values. The approximation method
is based on the so-called extremum-seeking approach.
Miquel Oliu-Barton and Guillaume Vigeral obtain Tauberian-type results in
(continuous-time) optimal control problems: they show an equivalence between the
long-time average and the convergence of the discounted problem as the discount
rate tends to 0.
Lucia Pusillo and Stef Tijs propose a new type of equilibrium for multicriteria
noncooperative games. This “E-equilibrium” is based on improvement sets and
captures the idea of approximate and exact solutions.
Olivier Guéant studies a particular class of mean field games, with linear-
quadratic payoffs (mean field games are obtained as the limit of stochastic
Preface ix
differential games when the number of interacting agents tends to infinity). The
author shows that the system of equations associated with these games can be
transformed into a simple system of coupled partial differential equations, for
which he provides a monotonic scheme to build solutions.
Part III is devoted to pursuit-evasion games and search games and contains six
contributions.
Sourabh Bhattacharya and Tamer Başar investigate the effect of an aerial
jamming attack on the communication network of a team of unmanned aerial
vehicles (UAVs) flying in a formation. They analyze the problem in the framework
of differential game theory and provide analytical and approximate techniques to
compute nonsingular motion strategies of UAVs.
Serguei A. Ganebny, Serguei S. Kumkov, Stéphane Le Menec, and Valerii S.
Patsko study a pursuit-evasion game with two pursuers and one evader having linear
dynamics. They perform a numerical construction of the level sets of the value
function and explain how to produce feedback-optimal control.
Stéphane Le Menec presents a centralized algorithm to design cooperative
allocation strategies and guidance laws for air defense applications. One of its main
features is a capability to generate and counter alternative target assumptions based
on concurrent beliefs of future target behaviors, i.e., a Salvo Enhanced No Escape
Zone (SENEZ) algorithm.
Alexander Belousov, Alexander Chentsov, and Arkadii Chikrii study pursuit-
evasion games with integral constraints on the controls. They derive sufficient
conditions for the game to terminate in finite time.
Anna Karpowicz and Krzysztof Szajowski study the angler’s fishing problem, in
which an angler has at most two fishing rods. Using dynamic programming methods,
the authors explain how to find the optimal times to start fishing with only one rod
and then to stop fishing altogether to maximize the angler’s satisfaction.
Ryusuke Hohzaki deals with a non-zero-sum three-person noncooperative search
game, where two searchers compete for the detection of a target and the target tries
to evade the searchers. He shows that, in some cases, there is cooperation between
two searchers against the target and that the game can then be reduced to a zero-sum
one.
Part IV contains two papers dedicated to the applications of dynamic games to
economics and management science.
Alessandra Buratto formalizes a fashion licensing agreement where the licensee
produces and sells a product in a complementary business. Solving a Stackelberg
differential game, she analyzes the different strategies the licensor can adopt to
sustain his brand.
Pietro De Giovanni and Georges Zaccour consider a closed-loop supply chain
with a single manufacturer and a single retailer. They characterize and compare the
feedback equilibrium results in two scenarios. In the first scenario, the manufacturer
invests in green activities to increase the product-return rate while the retailer
controls the price. In the second scenario, the players implement a cost revenue
sharing contract in which the manufacturer transfers part of its sales revenues and
the retailer pays part of the cost of the manufacturer’s green activities program that
aims at increasing the return rate of used products.
Acknowledgements
The selection of contributions to this volume started during the 14th International
Symposium on Dynamic Games and Applications held in Banff. Our warmest
thanks go to all the referees of the papers. Without their invaluable efforts this
volume would not have been possible. Finally, our thanks go to the editorial staff at
Birkhäuser, and especially Tom Grasso, for their assistance throughout the editing
process. It has been an honor to serve as editors.
xi
Contents
xiii
xiv Contents
xv
xvi Contributors
David M. Ramsey
Abstract This paper considers some generalizations of the large population game
theoretic model of mate choice based on age preferences introduced by Alpern et
al. [Alpern et al., Partnership formation with age-dependent preferences. Eur. J.
Oper. Res. (2012)]. They presented a symmetric (with respect to sex) model with
continuous time in which the only difference between members of the same sex is
their age. The rate at which young males enter the adult population (at age 0) is
equal to the rate at which young females enter the population. All adults are fertile
for one period of time and mate only once. Mutual acceptance is required for mating
to occur. On mating or becoming infertile, individuals leave the pool of searchers.
It follows that the proportion of fertile adults searching and the distribution of
their ages (age profile) depend on the strategies that are used in the population
as a whole (called the strategy profile). They look for a symmetric equilibrium
strategy profile and corresponding age profile satisfying the following condition:
any individual accepts a prospective mate if and only if the reward obtained from
such a pairing is greater than the individual’s expected reward from future search.
It is assumed that individuals find prospective mates at a fixed rate. The following
three generalizations of this model are considered: (1) the introduction of a uniform
mortality rate, (2) allowing the rate at which prospective mates are found to depend
on the proportion of individuals who are searching, (3) asymmetric models in which
the rate at which males and females enter the population and/or the time for which
they are fertile differ.
1.1 Introduction
Many models of mate choice have been based on common preferences. According
to such preferences, individuals prefer attractive partners and each individual of a
given sex agrees on the attractiveness of a member of the opposite sex. Some work
has been carried out on models in which preferences are homotypic, i.e. individuals
prefer partners who are similar (e.g. in character) to themselves in some way. In
such models the attractiveness and character of an individual are assumed to be
fixed. One obvious characteristic upon which mate choice might be based is the age
of a prospective partner (and the searcher himself/herself). By definition, the age of
an individual must change over time. Very little theoretical work has been carried
out on such problems. This article extends a model considered by Alpern et al. [4].
Janetos [8] was the first to present a model of mate choice with common
preferences. He assumed that only females are choosy and the value of a male to
a female comes from a distribution known to the females. There is a fixed cost for
observing each prospective mate, but there is no limit on the number of males a
female can observe. Real [19] developed these ideas.
In many species both sexes are choosy and such problems are game theoretic.
Parker [17] presents a model in which both sexes prefer mates of high value.
He concludes that assortative mating should occur with individuals being divided
into classes. Class i males are paired with class i females and there may be one
class of males or females who do not mate. Unlike the models of Janetos [8] and
Real [19], Parker’s model did not assume that individuals observe a sequence of
prospective mates.
In the mathematics and economics literature such problems are often formulated
as marriage problems or job search problems. McNamara and Collins [12] consider
a job search game in which job seekers observe a sequence of job offers and,
correspondingly, employers observe a sequence of candidates. Both groups have
a fixed cost of observing a candidate or employer, as appropriate. Their conclusions
are similar to those of Parker [17]. Real [20] developed these ideas within the
framework of mate choice problems. For similar problems in the economics
literature see e.g. Shimer and Smith [21] and Smith [22].
In the above models it is assumed that the distribution of the value of prospective
partners has reached a steady state. There may be a mating season and as it
progresses the distribution of the value of available partners changes. Collins and
McNamara [6] were the first to formulate such a model as a one-sided job search
problem with continuous time. Ramsey [18] considers a similar problem with
discrete time. Johnstone [9] presents numerical results for a discrete time, two-
sided mate choice problem with a finite horizon. Alpern and Reyniers [3] use a
more analytic approach to similar mate choice problems. These models are further
developed and analyzed in Alpern and Kantrantzi [1] and Mazalov and Falko [11].
Burdett and Coles [5] consider a dynamic model in which the outflow resulting
from partnership formation is balanced by job seekers and employers coming into
the employment market. Alpern and Reyniers [2] consider a similar model in which
individuals have homotypic preferences.
1 Mate Choice with Age Preferences 5
This paper is an extension of the work by Alpern et al. [4]. They consider a
problem of mutual mate choice in which all individuals of a sex are identical except
for their age. They first consider a problem with discrete time in which males are
fertile for m periods and females are fertile for n periods. Without loss of generality,
we may assume that m ≥ n. At each moment of time a number a1 (b1 ) of young
males (females) of age 1 enter the adult population. All the other adult individuals
age by 1 unit. If a male (female) reaches the age m + 1 (n + 1) without having
mated, then he (she) is removed from the population. The ratio R = a1 /b1 is called
the incoming sex ratio (ISR). The ratio of the number of adult males searching for
a mate to the number of females searching for a mate is called the operational sex
ratio (OSR) and is denoted by r.
At each moment an individual of the least common sex in the mating pool is
matched with a member of the opposite sex with probability ε . The age of the
prospective partner is chosen at random from the distribution of the age of members
of the appropriate sex. Suppose males are at least as common as females (i.e. r ≥ 1).
It follows that a male is matched with a female with probability ε /r. Given a male
is matched with a female, her age is chosen at random from the age of females.
Similarly, if females are at least as common as males (i.e. r ≤ 1), then in each period
a searching female is matched with a male with probability ε r. The age of such a
male is chosen at random from the distribution of male age. When two individuals
are matched, they must decide whether to accept or reject their prospective partner.
Mating only occurs by mutual consent. On mating two individuals are removed from
the population of searchers. It follows that the steady state distributions of the ages
of males and females in the population of searchers depend on the strategies used
within the population as a whole (the strategy profile).
The reward obtained by a pair on mating is taken to be the expected number of
offspring produced over the period of time for which both individuals are fertile. It is
assumed that offspring are produced at a rate depending on the ages of the partners
in such a way that the reward obtained by an individual on mating is non-increasing
in the age of the prospective partner. In this case, the equilibrium strategies of
males and females are threshold strategies in which each individual defines the
maximum acceptable age of a prospective partner as a function of the individual’s
age. This maximum acceptable age is non-decreasing in the age of the individual.
One example of such a reward function is the simple fertility model, according to
which the payoff of a pair on mating is simply the number of periods for which
both partners remain fertile. Equilibrium strategy profiles and the corresponding
age profiles are derived for a selection of problems of this form.
In addition, they define a continuous time model of a symmetric mate choice
problem in which both males and females enter the adult population at the same rate
and are fertile for one unit of time. It is assumed that when both males and females
use the same strategy (and thus r = 1), individuals meet prospective partners as a
Poisson process of rate λ (called the interaction rate). Hence, an individual expects
to meet λ prospective partners during their fertile period. The payoff obtained by a
pair on mating is equal to the length of time for which both remain fertile. A policy
iteration algorithm is defined to approximate a symmetric equilibrium of the game.
6 D.M. Ramsey
It should be noted that when the strategy used by females differs from the strategy
used by males, then the OSR may well differ from one. Since a matching of a
male with a female must correspond exactly to one matching of a female with a
male, it follows that the interaction rate depends on the OSR. Given the assumption
regarding the interaction rate at a symmetric equilibrium, females should meet
2λ r 2λ
males at rate r+1 , while males should meet females at rate r+1 . On the other hand,
the policy iteration algorithm assumes that this interaction rate is always λ . This
assumption affects the dynamics of the evolution of the threshold rule. However,
since at a symmetric equilibrium the OSR will be equal to 1, a fixed point of such
a regime will also be a fixed point of a suitably adapted algorithm in which the
interaction rate varies according to the OSR.
The paper presented here considers three extensions of this continuous time
model. Section 1.2 outlines the original model. Section 1.3 adapts this model to
include a fixed mortality rate for fertile individuals. Section 1.4 considers a model
in which the interaction rate depends on the proportion of fertile members of the
opposite sex who are searching for a mate. Section 1.5 considers a model of an
asymmetric game in which males are fertile for longer than females and/or the
ISR differs from 1. For convenience and ease of exposition, these adaptations are
considered separately, but they can be combined relatively easily. Section 1.6 gives
some numerical results, while Sect. 1.7 gives a brief conclusion and some directions
for future research.
We consider a symmetric (with respect to sex) model in which the rate at which new
males enter the adult population equals the rate at which females enter, i.e. R = 1.
Suppose individuals are fertile for one unit of time, there is no mortality over this
period and the rate at which they meet prospective partners is λ .
Each prospective partner is chosen at random from the set of members of the
opposite sex that are searching for a mate. When two prospective partners meet, they
decide whether to accept or reject the other on the basis of his/her age. Acceptance
must be mutual, in order to form a breeding pair. If acceptance is not mutual, both
individuals continue searching. No recall of previously encountered prospective
partners is possible.
The strategy of an individual defines the set of ages of acceptable prospective
mates at each age. We look for a symmetric equilibrium of such a game in which
males and females use the same strategy. It is clear that at such an equilibrium the
OSR is also equal to 1.
Suppose the rate at which a male of age x and a female of age y produce offspring
is γ (x, y), where γ (x, y) ≥ 0. The reward of both partners when a male of age x pairs
1 Mate Choice with Age Preferences 7
Suppose the rate at which fertile partners produce offspring is independent of their
ages. We may assume γ (x, y) = 1. In this case u(x, y) = 1 − max{x, y}. This is simply
the period of time for which both of the partners remain fertile. In the following
analysis, we assume the reward is of this form.
The equilibrium condition is as follows: each individual should accept a prospec-
tive mate if and only if the reward gained from such a mating is greater than the
expected reward from future search. An equilibrium can be described by a strategy
pair. It is assumed that all males follow the first strategy in this pair and females
follow the second. At a symmetric equilibrium males and females use the same
strategy.
Note that the reward of an individual of age x from mating with a prospective
partner of age y, 1 − max{x, y}, is non-increasing in y. Hence, if an individual of
age x should accept a prospective partner of age y, then he/she should accept a
prospective partner of age ≤ y. Thus at a symmetric equilibrium each individual
uses a threshold rule such that an individual of age x accepts any prospective partner
of age ≤ f (x). The function f will be referred to as the threshold profile.
The future expected reward at age 1 is 0. Hence, an individual of age 1 will
accept any prospective mate, i.e. f (1) = 1. Suppose an individual of age x meets a
prospective mate of age ≤ x. By mating with such a prospective mate, the individual
obtains a payoff of 1 − x, which for x < 1 is greater than the payoff obtained from
continued search. Hence, f (x) ≥ x with equality if and only if x = 1. In addition,
f (x) ≥ 0, since at equilibrium an individual of age x can ensure himself/herself the
same reward as an individual of age x + δ by rejecting all prospective partners until
age x + δ and then following the threshold profile f (x). It should be noted that an
individual of age ≤ f (0) will be acceptable to any member of the opposite sex.
Define a(x) to be the steady state proportion of individuals of age x that are
still searching for a mate. It should be noted that this proportion depends on the
acceptance rule being used in the population [i.e. on f (x)]. The proportion of fertile
individuals that have not mated is a, where a = 01 a(x) dx. It follows that the density
function of the age of available, fertile individuals is given by â(x) = a(x)/a. The
function a will be referred to as the age profile.
We now derive a differential equation which the equilibrium threshold profile
must satisfy. Consider a male of age x. The probability of encountering a unmated
female in a small interval of time of length δ is λ δ . We consider two cases:
1. x < f (0). In this case the male is acceptable to females of any age y. The female
is acceptable if y ≤ f (x). The probability that the female is acceptable is given by
f (x)
a(u) du
0
.
a
8 D.M. Ramsey
Given a male is still searching at age x, the probability he mates between age x
and age x + δ is given by
f (x)
λδ
a(u) du + O(δ 2).
a 0
Hence,
f (x)
λδ
a(x + δ ) = a(x) 1 − a(u) du + O(δ 2 )
a 0
f (x)
a(x + δ ) − a(x) λ a(x)
=− a(u) du + O(δ ).
δ a 0
Letting δ → 0, we obtain
f (x)
λ a(x)
a (x) = − a(u) du. (1.1)
a 0
2. x ≥ f (0). In this case, the male must also be acceptable to the female, i.e.
x ≤ f (y). Since f is an increasing function, it follows that f −1 (x) ≤ y. Hence,
acceptance is mutual if f −1 (x) ≤ y ≤ f (x). Given a male is still searching at age
x, the probability he mates between age x and age x + δ is given by
f (x)
λδ
a(u) du + O(δ 2).
a f −1 (x)
At first glance, it appears that the following procedure might work: choose an
arbitrary male strategy f1 ; determine an optimal female response strategy f2 ;
determine an optimal male response f3 to f2 , and so on, hoping that the sequence
converges to a limit. If there is a limit, this defines a symmetric equilibrium of the
game considered. In a true two person game, such a procedure is at least feasible in
principle. However, it will not work in the present setting. To see this, suppose the
females know the male strategy f1 and consider the problem faced by a female of
age y. In order to know which males to accept, she needs to know her optimal future
return from searching. For this she needs to know (a) the rate at which she will be
matched (this is assumed to be known), (b) which males will accept her at her future
ages (this is known from f1 ), and (c) the age profile, call it a1 , of the males she will
be matched with. However, in the scheme we proposed above, she will not know
this, as it is not determined solely by the male strategy f1 , but also depends on what
the females are doing. In theory, we could determine f2 based on f1 and a previous
female strategy, say f0 , and determine a1 as the age profile corresponding to f1
and f0 . However, as we showed in Sect. 1.2.1, the determination of a1 is difficult.
So we use a different iterative procedure, described below.
Suppose we begin by positing an initial male strategy, denoted f1 (where f1
is a non-decreasing function), and any non-increasing initial male age profile,
10 D.M. Ramsey
f2 = H1 ( f1 , a1 ) .
To continue the process, we then need to compute the function a2 defining the
probability that an individual female using f2 is still searching for a mate at age
x when the male age profile is a1 and the males adopt strategy f1 . Note that this is
not by definition the age profile of females when all males use f1 and all females
use f2 . We denote this computation (derived in Sect. 1.2.3) as
a2 = H2 ( f1 , a1 ) .
Define
( f2 , a2 ) = H ( f1 , a1 ) = (H1 ( f1 , a1 ) , H2 ( f1 , a1 )) .
Since the game is symmetric with respect to sex, we may define the optimal response
of an individual, fi+1 and the probability that when such an individual uses this
strategy he/she is still searching at age x, ai+1 (x), when the members of the opposite
sex use the threshold profile fi and have age profile ai as follows:
Theorem 1.1 (From Alpern et al. [4]). Suppose that for some initial strategy-age
profile pair ( f1 , a1 ) , the iterates ( fi+1 , ai+1 ) = H ( fi , ai ) converge to a limit ( f , a).
Then
• The strategy pair ( f , f ) is a symmetric equilibrium and
• Both sexes have the invariant age profile a = a (x) .
Proof. In the limit we will have
( f , a) = H ( f , a) .
It follows from the definition of H1 that f is the best response function of females
when males adopt f and their age profile is a. Similarly, f is the best response
function of males when females adopt f and have age profile a. Hence, it suffices to
show that when all individuals use the threshold strategy f , then the age profile in
both sexes is a. The second part of the iteration indicates that the probability that an
individual using f is still searching at age x is given by a(x). This individual is using
the same strategy as the rest of the population, thus a(x) is simply the proportion of
individuals of age x who are still in the mating pool, as required.
1 Mate Choice with Age Preferences 11
In order to simplify the notation used, we define f −1 (x) = 0 for x ≤ f (0), otherwise
f −1 (x) is the standard inverse function. First, we consider the best response fi+1
to the pair of profiles ( fi , ai ). Denote ai = 01 ai (x) dx. Suppose an individual is
still searching at age x. The optimal expected future reward from search is equal
to the reward obtained by accepting the oldest acceptable prospective partner, i.e.
1 − fi+1 (x). Suppose the next encounter with an available mate occurs at age W .
The probability density function of W is given by p(w|W > x) = λ e−λ (w−x) , for
x ≤ w < 1. It should be noted that there is an atom of probability at w = 1 of mass
equal to the probability that an individual does not meet another prospective partner
given that he/she is still searching at age x. In this case, the reward from search is
defined to be 0. Suppose the age of the prospective mate is y. The pairing is mutually
acceptable if y ∈ [ fi−1 (w), fi+1 (w)]. If y ∈ [ fi−1 (w), w], then the searcher obtains a
reward of 1 − w. If y ∈ (w, fi+1 (w)], then the searcher obtains a reward of 1 − y. In
all other cases, the future expected reward from search is 1 − fi+1 (w). Conditioning
on the age of the prospective mate and taking the expected value, it follows that
1
f −1 (w) w
λ eλ x −λ w i
1− fi+1 (x) = e [1− fi+1 (w)] ai (y) dy+ (1−w)ai (y) dy +
ai x 0 f i−1 (w)
fi+1 (w) 1
+ (1 − y)ai(y) dy + [1 − fi+1(w)]ai (y) dy dw.
w f i+1 (w)
Equation (1.4) can be solved numerically, using the boundary condition f (1) = 1
and estimating fi+1 (x) sequentially at x = 1 − h, 1 − 2h, . . ., 0.
Once fi+1 has been estimated, we can estimate the corresponding age profile. The
calculations are analogous to the calculations carried out in the previous section.
We have
λ ai+1 (x) fi+1 (x)
ai+1 (x) = − ai (y) dy. (1.5)
ai f i−1 (x)
Equation (1.5) can be solved numerically, using the boundary condition a(0) = 1
and estimating a(x) sequentially at x = h, 2h, . . . , 1.
A proof that the iteration procedure is well defined can be found in Alpern et al.
[4]. This proof may be adapted to show that the procedures proposed for the three
extensions considered below are also well defined.
12 D.M. Ramsey
We now adapt the model presented above by assuming that mortality affects fertile
individuals at a constant rate of μ (independently of sex and status, single or mated).
We first derive the expected reward obtained by a pair composed of a male of age x
and a female of age y, denoted u(x, y). This is given by the expected time for which
both partners survive and are fertile. Note that the death of one of the partners occurs
at rate 2 μ . Suppose x ≤ y. If both partners survive a period of 1 − y (i.e. until the
female becomes infertile), then they both receive a payoff of 1 − y. Otherwise, the
reward obtained is the time until the death of the first of the partners, denoted by Z.
It follows that
∞ 1−y
1 − e−2μ (1−y)
u(x, y) = (1 − y) 2μ e−2μ z dz + 2μ z e−2μ z dz = . (1.6)
1−y 0 2μ
1 − e−2μ (1−x)
u(x, y) = .
2μ
We now derive a differential equation for ri+1 (x) by conditioning on the time
of the next event, where the death of a male and meeting a prospective partner
are defined to be events. Events occur at rate λ + μ , thus given the male is still
searching at age x, the time at which the next event occurs, W , has density function
p(w|W > x) = (λ + μ ) e−(λ +μ )(w−x) for x ≤ w < 1. Note that W has an atom of
probability at w = 1 of mass equal to the probability that no event occurs before
the male becomes infertile. Given that an event occurs before the male becomes
μ
infertile, this event is his death with probability λ + μ . If no event occurs before he
reaches age 1 or the first event is his death, the reward of the male is 0. Considering
the time of the next event, the type of this event and the age of the prospective
partner, we obtain
1 Mate Choice with Age Preferences 13
1
λ
ri+1 (x) = exp[−(λ + μ )(w − x)]
x ai
w
f i−1 (w) (1 − e−2μ (1−w))ai (y) dy
× ri+1 (w)ai (y) dy +
0 f i−1 (w) 2μ
fi+1 (w) 1
(1 − e−2μ (1−y))ai (y) dy
+ + ri+1 (w)ai (y) dy dw.
w 2μ f i+1 (w)
The functions fi+1 and ri+1 can be calculated numerically from Eqs. (1.7) and (1.8)
using the boundary conditions fi+1 (1) = 1 and ri+1 (1) = 0. Given ri+1 (x) and
fi+1 (x) for a sequence of values x ∈ {x0 , x0 + h, x0 + 2h, . . . , 1}, we can evaluate
ri+1 (x0 − h) using a numerical procedure to solve Eq. (1.8). We can then evaluate
fi+1 (x0 − h) directly from Eq. (1.7).
Having calculated fi+1 , we can then estimate ai+1 (x), the probability that a male
using the threshold profile fi+1 is still searching at age x given the threshold profile
and age profile of females, ( fi , ai ). A male of age x will leave the population of
searchers in the time interval [x, x + δ ] if he either finds a mate or dies in that
time interval. Analogous calculations to the ones used to obtain Eq. (1.1) lead to
the differential equation
fi+1 (x)
λ
ai+1 (x) = −ai+1 (x) μ + ai (y) dy . (1.9)
ai f i−1 (x)
Equation (1.9) can be solved using the boundary condition ai+1 (0) = 1 and using a
numerical procedure to evaluate ai+1 (x) at x = 0, h, 2h, . . ., 1.
It should be noted that this model can be relatively easily modified to allow
the mortality rate of individuals to depend on their status (either single or paired),
but not on sex. In order to generalize this model to one in which the mortality
rate can depend on sex, we have to generalize the model considered above to
allow asymmetries between the sexes. Asymmetric problems will be considered in
Sect. 1.5.
14 D.M. Ramsey
The model presented in Sect. 1.2 assumes that as long as the OSR is equal to
1 individuals meet prospective mates at a constant rate regardless of the strategy
profile used (i.e. independently of the proportion of adult individuals who are
searching for a partner). One might think of this model as describing a population
in which all the singles are concentrated in a particular area (i.e. a type of “singles
bar” model). We might consider a model under which the adult population mixes
randomly. In this case, we assume that when a male meets a female the probability
of her being single is equal to the proportion of females searching for a mate. In
reality, it seems likely that the rate of finding prospective mates would be found at
an increasing rate as the proportion of adults searching increases. However, it would
be realistic to assume that singles can concentrate their search in such a way that the
probability of an encounter being with another single is greater than the proportion
of the opposite sex who are single. Hence, the two models described above define
the two extremes of a spectrum for modelling encounters between searchers.
For ease of presentation, we only consider the “randomly mixing” model under
which the rate of meeting prospective mates is proportional to the fraction of
individuals of the opposite sex searching for a mate. As in Sect. 1.2, we only
consider symmetric equilibria of symmetric games of this form. We define an
iterative procedure ( fi+1 , ai+1 ) = H( fi , ai ), where fi+1 defines the best response of
an individual (without loss of generality we may assume a male) when the threshold
and age
profiles of females are given by fi and ai , respectively. As before, define
ai = 01 ai (x) dx. It is assumed that the rate at which individuals meet prospective
partners is λ ai .
Firstly, we define the best response, fi+1 . Suppose a male is still searching at age
x. The optimal expected future reward from search is equal to the reward obtained by
accepting the presently oldest acceptable female, i.e. 1 − fi+1 (x). Suppose the next
encounter with a single female occurs at age W . The probability density function
of W is given by p(w|W > x) = λ ai e−λ ai (w−x) , for x ≤ w < 1. As before, there is
an atom of probability at w = 1 of mass equal to the probability that the male does
not meet another available female given that he/she is still searching at age x. In this
case, the male’s reward from search is defined to be 0. Suppose the age of the female
is y. The pairing is mutually acceptable if y ∈ [ fi−1 (w), fi+1 (w)]. If y ∈ [ fi−1 (w), w],
then they obtain a reward of 1 − w. If y ∈ (w, fi+1 (w)], then they obtain a reward
of 1 − y. In all other cases, the future expected reward of the male from search is
1 − fi+1(w). Conditioning on the age of the female and taking the expected value, it
follows that
−1
1 f i (w) w
1− fi+1 (x) = λeλ ai x e−λ ai w [1− fi+1 (w)] ai (y) dy+ (1−w)ai (y) dy
x 0 f i−1 (w)
fi+1 (w) 1
+ (1 − y)ai(y) dy + [1 − fi+1(w)]ai (y) dy dw.
w f i+1 (w)
1 Mate Choice with Age Preferences 15
Using the boundary condition fi+1 (1) = 1, we can estimate fi+1 (x) for x = 1 − h, 1 −
2h, . . . , 0 by solving Eq. (1.10) numerically.
Having calculated fi+1 , we now estimate ai+1 , where ai+1 (x) is the probability
that a male using the optimal response is still searching at age x. This male finds
prospective mates at rate λ ai . Given an optimally responding male of age x meets a
female of age y, such a pairing is mutually acceptable if and only if fi−1 (x) ≤ y ≤
fi+1 (x). Analogous calculations to the ones used to obtain Eq. (1.1) lead to
fi+1 (x)
ai+1 (x) = −λ ai+1 (x) ai (y) dy. (1.11)
f i−1 (x)
We can estimate ai+1 (x) for x = h, 2h, . . . , 1 using the boundary condition ai+1 (0)=1
and solving Eq. (1.11) numerically.
In this section we assume that males are fertile for a period of t units, while females
are fertile for 1 unit of time. Also, young males enter the adult population at a rate
R times the rate at which young females enter the adult population. Without loss of
generality, it may be assumed that t ≥ 1. The OSR r depends on the strategy profile
used. It is assumed that there is no mortality.
As stated earlier, there is an intrinsic problem with the formulation of the original
model. Although the ISR is one, when the strategy used depends on sex, the OSR
may differ from 1 (see Alpern et al. [4]). Suppose r = 1 and individual males meet
females at the same rate as which individual females meet males. It follows that the
ratio of the number of times a male meets a female to the number of times a female
meets a male must differ from 1. This is clearly a contradiction.
In order to generalize the model, we assume that the rate at which singles meet
other singles (of either sex) is λ0 . It follows that the rate at which single females meet
λ0 r
prospective mates is λ f , where λ f = 1+r . Similarly, the rate at which single males
λ0
meet prospective mates is λm , where λm = 1+r . This satisfies the Fisher condition
(see Houston and McNamara [7]) that the ratio of the number of times a male meets
a female to the number of times a female meets a male must be equal to 1.
In the case of the symmetric problem, this problem is sidestepped by the
assumption that the equilibrium is symmetric with respect to sex. This means that at
16 D.M. Ramsey
individuals continue searching. By conditioning on the age of the female at the next
encounter with a male and his age, we obtain
1 f −λif (w−y) f −1 (w) t−1+w
λi e i
t−gi+1 (y) = [t−gi+1 (w)]ai (x) dx+ [1−w]ai (x) dx
y ai 0 f i−1 (w)
gi+1 (w) 1
+ [t − x]ai (x) dx + [t − gi+1(w)]ai (x) dx dw.
t−1+w gi+1 (w)
f
Dividing by eλi y and differentiating with respect to y, after some simplification we
obtain
t−1+y gi+1 (y)
λi f
gi+1 (y) = [gi+1 (y) + 1 − y − t] −1 ai (x) dx + [gi+1 (y) − x]ai (x) dx .
ai f i (y) t−1+y
(1.12)
Using Eq. (1.12) and the boundary condition gi+1 (1) = t, we can numerically
calculate gi+1 (y) for y ∈ {1 − h, 1 − 2h, . . ., 0}.
Now we consider the probability that a female using this optimal response
will still be searching at age y. This is denoted by bi+1 (y). Such a female meets
prospective partners at rate λi . If she meets a male of age x when she is y years old,
f
mating occurs if and only if fi−1 (y) ≤ x ≤ gi+1 (y). Using an argument analogous to
the one to derive Eq. (1.1), it follows that
gi+1 (y)
bi+1(y)λi
f
bi+1 (y) = − ai (x) dx. (1.13)
ai f i−1 (y)
Using Eq. (1.13) and the boundary condition bi+1 (0) = 1, we can numerically
calculate bi+1 (y) for y ∈ {h, 2h, . . . , 1}.
We then calculate the optimal response of a male given that the female threshold
and age profiles are gi+1 and bi+1 , respectively and the OSR is assumed to be given
λ0
by rim = ai /bi+1 . The rate at which males find females is thus λim = 1+r m . It should
i
be noted that this is not by definition the OSR when males use the threshold profile
fi and females use the profile gi+1 .
Suppose a male is still searching at age x. His optimal reward from future
search is given by the length of time for which the presently oldest acceptable
female remains fertile, i.e. 1 − fi+1 (x). Suppose the next prospective mate is of
age y and appears when the male is of age w. From the definition of the inverse
of the threshold function used here, the youngest female who will accept such a
male, is of age g−1 i+1 (w). A female should accept a male who will be fertile for a
longer period than her. It follows that for w ≤ t − 1, then g−1 i+1 (w) = 0. Also, for
w > t − 1, we have 1 − g−1 i+1 (w) ≥ t − w. Hence, g −1
i+1 (w) ≤ 1 + w − t. It follows that
−1 −1
gi+1 (w) ≤ max{0, 1 + w − t}. When gi+1 (w) ≤ y ≤ 1 + w − t, then a pair is formed
18 D.M. Ramsey
1
+ [1 − fi+1(w)]bi (y) dy dw.
f i+1 (w)
Using Eqs. (1.14) and (1.15), together with the boundary condition fi+1 (t) = 1
and the continuity of fi+1 , we can numerically estimate fi+1 (x) for x ∈ {t − h,t −
2h, . . . , 0}.
We define ai+1 (x) to be the probability that a male using this best response is
still searching at age x. Using an argument analogous to the one used in deriving
Eq. (1.1), we obtain
fi+1 (y)
ai+1 (x)λim
ai+1 (x) = − bi (y) dy. (1.16)
bi g−1
i+1 (y)
Using Eq. (1.16) and the boundary condition ai+1 (0) = R, we can numerically
calculate ai+1 (x) for x ∈ {h, 2h, . . .,t}.
We have thus updated each of the four profiles and defined the mapping H.
Suppose the mapping H has a fixed point ( f , a, g, b). In this case, the best response
of females to the male threshold and age profiles ( f , a) is to use the threshold profile
g. The probability that such an optimally behaving female is still searching at age y
is given by b(y). Since this female is using the same strategy as the other females,
b gives the age profile of the females. Using a similar argument, f is the optimal
response of a male to (g, b) and a gives the age profile of the males. It follows
that the OSR defined by the iterative procedure is equal to the actual OSR given
the quartet of profiles ( f , a, g, b). Hence, any fixed point of the mapping H is an
equilibrium of this asymmetric game.
1 Mate Choice with Age Preferences 19
It should be noted that the algorithm described above must be used when looking
for a asymmetric equilibrium of a symmetric problem. The algorithm described in
Sect. 1.2 generally does not work in this case, since the OSR at such an equilibrium
may well differ from 1 and so the rate at which prospective mates are found is sex
dependent.
Also, the model presented above can easily be modified to introduce constant
mortality rates and encounter rates which are dependent on the proportion of adult
individuals who are searching for a mate (as described in Sects. 1.3 and 1.4,
respectively).
A MATLAB programme was written to estimate the equilibrium threshold rule and
age profiles at points 0, h, 2h, . . . , 1 based on the appropriate difference equations,
using the trapezium rule to calculate the required integrals and double precision
arithmetic. The inverse to a threshold rule was estimated at the same points using
linear interpolation. Comparison of different step sizes suggested that using a step
size of h = 10−4 allowed estimation of the threshold and age profile to at least
three decimal places for λ ≤ 50. The maximum value of the second derivative of
the threshold profile is increasing in λ and for larger values of λ a more accurate
procedure would be necessary to achieve the same accuracy.
Table 1.1 gives the expected reward and initial threshold (in brackets) at equilibrium
for various mortality rates and interaction rates. The case μ = 0 corresponds to the
original model. Figure 1.2 illustrates the threshold rules evolved for λ = 20 and
various mortality rates. Figure 1.3 illustrates the corresponding age profiles. When
the mortality rate increases, we expect that individuals become less choosy. It would
thus seem that the threshold used would be increasing in the mortality rate. This
seems to be the case when the mortality rate is relatively low. However, the threshold
profile on its own does not tell us how choosy individuals are. Figure 1.2 shows that
Table 1.1 Expected rewards and initial threshold (in round brackets) at equilibrium for various
mortality rates, μ , and interaction rates, λ
μ =2 μ =1 μ = 0.5 μ = 0.2 μ = 0.1 μ =0
λ = 2 0.109 (0.858) 0.205 (0.737) 0.292 (0.655) 0.365 (0.605) 0.395 (0.589) 0.426 (0.574)
λ = 5 0.168 (0.721) 0.311 (0.514) 0.445 (0.411) 0.563 (0.362) 0.611 (0.348) 0.665 (0.335)
λ = 10 0.201 (0.592) 0.366 (0.343) 0.527 (0.250) 0.675 (0.213) 0.736 (0.204) 0.805 (0.195)
λ = 20 0.221 (0.462) 0.395 (0.218) 0.574 (0.148) 0.740 (0.123) 0.810 (0.116) 0.895 (0.105)
20 D.M. Ramsey
Fig. 1.2 Effect of mortality rate on the equilibrium threshold profile (λ = 20)
Fig. 1.3 Effect of mortality rate on the equilibrium age profile (λ = 20)
1 Mate Choice with Age Preferences 21
Table 1.2 Expected rewards λ Original (singles bar) model Randomly mixing
at equilibrium under the
singles bar model and model 2 0.4264 0.3082
of a randomly mixing 5 0.6645 0.4636
population 10 0.8054 0.5775
20 0.8954 0.6774
for ages between 0.2 and 0.4 the threshold used at equilibrium when μ = 2, λ = 20
is lower than the threshold used in the case where μ = 0.5, λ = 20 (i.e. it seems that
increasing mortality increases the choosiness of individuals at some ages). However,
in the case μ = 2, λ = 20 (i.e. relatively high interaction and mortality rates), there
are virtually no individuals of age greater than 0.3 in the mating pool. Individuals
always accept a prospective mate of age below 0.462 and hence at equilibrium the
probability of an individual rejecting a prospective partner is virtually zero.
When the mortality rate is high, the age profile of the mating pool is very highly
concentrated on young ages. As long as a young individual survives, he/she will
almost certainly mate with the next perspective partner and the expected payoff
obtained is much more dependent on the mortality rate than the maximum length
of time for which an individual can remain fertile. Thus young individuals will
increase their threshold only very slowly. This remains true until an individual
attains the age at which the youngest individuals begin rejecting him/her. At this
point the probability of rejection increases very rapidly due to the shape of the age
profile. It follows that for high mortality rates the equilibrium threshold profile is
similar to a step function. However, at equilibrium the probability of meeting an
unacceptable partner is virtually zero. Thus an individual who always mates with
the first prospective partner would have virtually the same reward at equilibrium as
an individual using the equilibrium threshold. Hence, at equilibrium there would be
very low selection pressure on the threshold used.
Table 1.2 gives the expected reward from search at equilibrium for the original
(singles bar) model and the model of a randomly mixing population for various
interaction rates. Since it is assumed that there is no mortality of fertile individuals,
the initial threshold is simply one minus the expected reward.
It should also be noted that if a proportion a of the adult population are searching
for a mate at equilibrium, such an equilibrium is also stable in a game where the
interaction rate is fixed to be λ a when the sex ratio is one. However, the dynamics
of the policy iteration procedure corresponding to these two problems are different.
22 D.M. Ramsey
Table 1.3 Expected reward of females, males (in round brackets) and OSR (in square
brackets) when λ0 = 4
R = 0.5 R=1 R=2
T =1 0.253 (0.506) [0.293] 0.427 (0.427) [1.000] 0.506 (0.253) [3.413]
T =2 0.356 (0.714) [0.602] 0.585 (0.586) [2.412] 0.653 (0.327) [8.656]
T =5 0.412 (0.826) [1.590] 0.658 (0.659) [7.148] 0.717 (0.359) [26.004]
Table 1.3 gives the expected reward of females, males (in round brackets) and OSR
[in square brackets] when λ0 = 4. It should be noted that the case T = R = 1
corresponds to the original symmetric model with λ = 2. Various values of λ0 were
used for this parameter set and the equilibrium found was always symmetric. Also,
the problem with T = 1 and R = 0.5 is equivalent to the problem with T = 1 and
R = 2 with the roles of the sexes reversed.
The sum of the rewards of females is by definition equal to the sum of the rewards
of males. It follows that the ratio of the expected reward of a female to the expected
reward of a male is equal to the ratio of the number of males entering the adult
population to the number of females entering the adult population (i.e. R). The minor
deviations from this rule are due to numerical errors in the iterative procedure.
1.7 Conclusion
This paper has generalized a model of mate choice with age based preferences
introduced by Alpern et al. [4] by (a) introducing a uniform mortality rate, (b)
allowing the rate at which prospective mates are found to depend on the proportion
of individuals searching, (c) considering models which are asymmetric with respect
to sex.
It may well be interesting to generalize the model to allow variable mortality
rates. It seems reasonable to assume that the mortality rate increases with age and in
this case it is expected that the equilibrium strategy will be of the same form, i.e. a
threshold strategy according to which younger mates are always preferred. However,
it is possible that the mortality rate is higher for young adults than for middle-aged
adults. In this case, the equilibrium strategy may well be of a more complex form,
since a middle-age mate may be preferable to a young mate.
It would also be interesting to look at the interplay between resource holding
potential (RHP) (see Parker [16]) and age. For example, as a human ages his/her
RHP (i.e. qualifications, earnings, wealth, social position) increases. Hence, from
this point of view the attractiveness of an individual may well increase over time.
However, from the point of view of the model considered here, older individuals are
less attractive as mates as they will be fertile for a shorter period.
1 Mate Choice with Age Preferences 23
Mauck et al. [10] note that the average number of surviving offspring per brood
in a population of storm petrels is increasing in the age of the partners. There are
various explanations for this (e.g. fitter individuals may live longer, individuals (or
pairs) may become increasingly efficient at rearing offspring, or simply that older
pairs invest more in reproduction than in survival). However, this may mean that
age preferences may be to some degree homotypic. This is due to the fact that an
old individual may well prefer an old partner, since it is more important for them to
maximize their present reproduction rate. Younger partners may well prefer young
partners, since they have time to adapt to each other and perfect their method of
rearing. It would be interesting to see how the form of the equilibrium function
depends on these factors (by considering a wider range of payoff functions).
Also, our model assumes that the ages of prospective partners are independent. It
may well be that individuals concentrate their search on prospective partners of a
“suitable age”.
According to our model, individuals only mate once during their life, whereas
in reality pairs can divorce or an individual can remate after the death of a partner.
Since individuals may be of different qualities, it might be optimal for an individual
to divorce a partner who turns out not to be good as expected (see McNamara and
Forslund [13], McNamara et al. [14]). It is intended that future work will extend the
model to allow individuals to remate after a partner dies or becomes infertile.
Of course, mate choice may depend on other factors, such as attractiveness
(common preferences) and compatibility (homotypic preferences). It would be
interesting to see how these factors might interact with age. Due to the necessarily
complex nature of such models, it would seem that simulations based on replicator
dynamics would be a sensible approach to such problems (see Nowak [15]).
Acknowledgements I would like to thank Prof. Steve Alpern for the conversations, advice and
encouragement that have aided my work on this paper.
References
1. Alpern, S., Katrantzi, I.: Equilibria of two-sided matching games with common preferences.
Eur. J. Oper. Res. 196(3), 1214–1222 (2009)
2. Alpern, S., Reyniers, D.: Strategic mating with homotypic preferences. J. Theor. Biol. 198,
71–88 (1999)
3. Alpern, S., Reyniers, D.: Strategic mating with common preferences. J. Theor. Biol. 237,
337–354 (2005)
4. Alpern, S., Katrantzi, I., Ramsey, D.: Partnership formation with age-dependent preferences.
Eur. J. Oper. Res. (2012)
5. Burdett, K., Coles, M.G.: Long-term partnership formation: marriage and employment. Econ.
J. 109, 307–334 (1999)
6. Collins, E.J., McNamara, J.M.: The job-search problem with competition: an evolutionarily
stable strategy. Adv. Appl. Prob. 25, 314–333 (1993)
7. Houston, A.I., McNamara, J.M.A.: Self-consistent approach to paternity and parental effort.
Phil. Trans. R. Soc. Lond. B 357, 351–362 (2002)
24 D.M. Ramsey
8. Janetos, A.C.: Strategies of female mate choice: a theoretical analysis. Behav. Ecol. Sociobiol.
7, 107–112 (1980)
9. Johnstone, R.A.: The tactics of mutual mate choice and competitive search. Behav. Ecol.
Sociobiol. 40, 51–59 (1997)
10. Mauck, R.A.: Huntington, C.E., Grubb, T.C.: Age-specific reproductive success: evidence for
the selection hypothesis. Evolution 58(4), 880–885 (2004)
11. Mazalov, V., Falko, A.: Nash equilibrium in two-sided mate choice problem. Int. Game Theory
Rev. 10(4), 421–435 (2008)
12. McNamara, J.M., Collins, E.J.: The job search problem as an employer-candidate game.
J. Appl. Prob. 28, 815–827 (1990)
13. McNamara, J.M., Forslund, P.: Divorce rates in birds: prediction from an optimization model.
Am. Nat. 147, 609–640 (1996)
14. McNamara, J.M., Forslund, P., Lang, A.: An ESS model for divorce strategies in birds. Philos.
Trans. R. Soc. Lond. B Biol. Sci. 354, 223–236 (1999)
15. Nowak, M.: Evolutionary Dynamics: Exploring the Equations of Life. Belknap Press,
Cambridge, Massachusetts (2006)
16. Parker, G.A.: Assessment strategy and the evolution of animal conflicts. J. Theor. Biol. 47,
223–243 (1974)
17. Parker, G.A.: Mate quality and mating decisions. In: Bateson, P. (ed.) Mate Choice,
pp. 227–256. Cambridge University Press, Cambridge, UK (1983)
18. Ramsey, D.M.: A large population job search game with discrete time. Eur. J. Oper. Res. 188,
586–602 (2008)
19. Real, L.A.: Search theory and mate choice. I. Models of single-sex discriminination. Am. Nat.
136, 376–404 (1990)
20. Real, L.A.: Search theory and mate choice. II. Mutual interaction, assortative mating, and
equilibrium variation in male and female fitness. Am. Nat. 138, 901–917 (1991)
21. Shimer, R., Smith, L.: Assortative matching and search. Econometrica 68, 343–370 (2000)
22. Smith, L.: The marriage model with search frictions. J. Pol. Econ. 114, 1124–1144 (2006)
Chapter 2
Signalling Victory to Ensure Dominance:
A Continuous Model
M. Mesterton-Gibbons ()
Department of Mathematics, Florida State University, 1017 Academic Way,
Tallahassee, FL 32306-4510, USA
e-mail: [email protected]
T.N. Sherratt
Department of Biology, Carleton University, 1125 Colonel By Drive,
Ottawa, ON K1S 5B6, Canada
e-mail: [email protected]
2.1 Introduction
Bower [4] defines a victory display as a display performed by the winner of a contest
but not by the loser. He offers in essence two possible adaptive explanations of their
function: that they are an attempt to advertise victory to other members of a social
group that do not pay attention to contests, or cannot otherwise identify the winner,
and thus alter their behavior (“function within the network”), or that they are an
attempt to decrease the probability that the loser of a contest will initiate a new
contest with the same individual (“function within the dyad”). In an earlier paper
[20], we called the first rationale advertising, and the second one browbeating; and
we used game-theoretic models to explore the logic of both rationales. These models
showed that both rationales are logically sound; moreover, all other things being
equal, the intensity of victory displays will be highest through advertising in groups
where the reproductive advantage of dominating an opponent is low, and highest
through browbeating in groups where the reproductive advantage of dominance is
high.
Here we further consider the browbeating rationale, leaving the case of an
advertising rationale for future work. By the browbeating rationale, a victory display
is an attempt to decrease the probability that the loser of a contest will initiate a new
contest with the same individual. As long as there is a chance that the loser will
challenge the winner to another fight in the future, the winner has won a battle for
dominance, but not the war. If, on the other hand, the victory ensures that the loser
will never challenge, then victory is tantamount to dominance. Thus browbeating is
an attempt to ensure that victory equals dominance, and the essence of modelling
this phenomenon is to observe a distinction between losing and subordination.
Although we have previously demonstrated that browbeating is a plausible mech-
anism for victory displays, our earlier model assumed—as opposed to predicted—
that a loser does not display, and hence dodged the question of why victory displays
should be respected. Moreover, our original model assumed that all contests were
of equal length, which leaves open the question as to whether individuals should be
more or less likely to signal their dominance after long (close) fights than after short
(one-sided) fights.
Accordingly, our purpose here is twofold. First, it is to relax the assumption
that the loser does not display. Second, our purpose is also to address the context-
dependent nature of the display, which lies outside the scope of our original
browbeating model. Specifically, in a recent study investigating fighting behavior
in the spring field cricket, Gryllus veletis, Bertram et al. [3] found that the intensity
of post-conflict signals (aggressive song rate and body jerk rate) were dependent
on whether the individual was a winner or loser (with winners signalling more
intensely than losers) and on the duration of the contest (with short fights producing
less intense signals). Ting et al. (unpublished) came to similar conclusions after
analysing the outcomes of fights in the fall field cricket, Gryllus pennsylvanicus.
Likewise, post-conflict displays in the black-capped chickadee, Poecile atricapil-
lus—albeit more common among losers than among winners—were more likely
2 Signalling Victory to Ensure Dominance 27
to occur after highly aggressive contests [15]. Collectively, these recent studies
suggest that context dependency might be a general feature of post-conflict displays.
Clearly, if mathematical models are to be of value in understanding victory displays,
then they should help explain not only the display, but also who displays, and with
what intensity. Here we present a simple model that addresses both phenomena.
the above payoffs by 12 and adding, we find that the reward to a u-strategist in a
population of v-strategists is
1
f (u, v) = (1 − b)q(u1, v2 ) − bq(v1, u2 ) − cw (u1 ) − cl (u2 ) + b. (2.2)
2
We need to place conditions on the functions cw , cl and q. First, for cw and cl , it
seems reasonable to suppose that cw (0) = 0, cw (s) > 0, cw (s) ≥ 0 (as in [20]) and
cl (0) = 0, cl (s) > 0, cl (s) ≥ 0. For the sake of simplicity, we satisfy these conditions
by taking
with
γw < γl (2.4)
throughout, where θ (> 0) has the dimensions of INTENSITY−1 , so that γw (> 0) and
γl (> 0) are dimensionless measures of the marginal cost of displaying for a winner
and a loser, respectively.
Second, for q, the following seem reasonable: q(∞, l) = 1 for any finite l, and
q(w, l) = δ for all w ≤ l where δ is the base probability that winning will lead
to dominance—a winner cannot increase its chance of converting its win into
dominance unless it is displaying with at least as strong an intensity as the loser. The
shorter the contest, the more likely it is that the loser will feel heavily outgunned and
concede dominance; hence δ is a decreasing function of contest length T . For the
sake of simplicity, we take
δ = e−T /μ , (2.5)
where μ is a scaling factor (the length of a contest that would reduce the probability
of achieving dominance without a display from 1 to approximately 37 %). We also
require ∂ q/∂ w > 0 and ∂ q/∂ l < 0 for all w > l. Again for the sake of simplicity,
we satisfy all conditions on q by taking
δ + (1 − δ ){1 − e−θ (w−l)} if w ≥ l
q(w, l) = (2.6)
δ if w < l
throughout. Note the asymmetry here: a display by the loser is not a second chance
to win the fight. On the contrary, it is merely an attempt to reduce the probability
that losing implies subordination.
From Appendix A, if the marginal cost of displaying is so high for a winner that
γw ≥ 1 − b, then v1 = 0 (i.e., not displaying) is a winner’s best reply to any v2 ; and
likewise, if the marginal cost of displaying is so high for a loser that γl ≥ b, then
v2 = 0 is a loser’s best reply to any v1 . These are not interesting cases. Accordingly,
we assume henceforward that γw < 1 − b and γl < b invariably hold. That is, we
assume min(ρ , ζ ) > 1, where
1−b b
ρ = , ζ = . (2.7)
γw γl
Then, from Appendix A, and in particular from the discussion following (A8), the
game defined by (2.2)–(2.6) has a unique ESS if
T ρ ζ
< max ln , ln , (2.8)
μ ρ −1 ζ −1
although it has no ESS if the above inequality is reversed. Subject to (2.8), if also
T ρ ζ
then from (2.5) and (A5) the ESS is v = (0, 0): neither a winner nor a loser displays.
If, on the other hand, (2.8) holds with (2.9) reversed, then one of two cases arises. If
ζ T ρ
ln < < ln , (2.10)
ζ −1 μ ρ −1
however, then it follows from (A6) that the ESS is given by θ v = (λ , 0), where
Thus the relative magnitudes of ρ and ζ determine the ESS. For ρ > ζ or
γl b
> , (2.13)
γw 1−b
ζ ζ
there is no ESS if T > μ ln( ζ −1 ); but if T < μ ln( ζ −1 ), then the unique ESS is
given by v = (0, 0) for T < μ ln( ρ −1 ) and by θ v = (λ , 0) for T > μ ln( ρ ρ−1 ), with
ρ
λ defined by (2.12). If (2.13) is reversed, or ρ < ζ , then the unique ESS for T <
ρ
μ ln( ρ −1 ) is v = (0, 0); and for T > μ ln( ρ ρ−1 ) there is no ESS.
30 M. Mesterton-Gibbons and T.N. Sherratt
2.4 Discussion
Fighting behavior and their associated signals have been the subject of extensive
empirical and theoretical study (see [10, 12]). However, much of this work has
focused on the behaviors that occur before and during aggressive interactions,
and relatively little is known about behaviors that occur after the outcomes have
been decided [4]. Here we have developed and explored a game-theoretic model
of post-conflict signalling, seeking to identify who should tend to signal following
termination of conflict, with what intensity, and the factors that shape this intensity.
We have focused on the hypothesis that post-conflict signalling by the victor serves
to reinforce dominance, reducing the chances that the loser will try it on again,
although there may be other complementary adaptive explanations for such displays,
including advertising of victory to bystanders, and non-adaptive explanations such
as emotional release [4].
Post-conflict victory displays [4] have been reported in a range of organisms,
including humans [22] and birds [9], but they have been most intensively researched
in crickets (Orthoptera). Crickets often perform aggressive songs and body jerks
both during and after an agonistic conflict [1, 3, 13]. In a study of the field cricket,
Teleogryllus oceanicus, Bailey and Stoddart [2] proposed that if the display of a
victorious male is sufficiently intense, then it may indicate to the loser that the
fight is unlikely to be reversed by further combat, enabling the victor to divert
its time and energy to other activities such as mating. Conversely, low signalling
intensity of the winner may suggest to the loser that re-engagement could potentially
produce a reversal, hence some future reward to the loser. This is precisely the
situation we have attempted to model here. Indeed, Bailey and Stoddart [2] went
further and argued that the winner’s post-conflict display could be used as an
indication of the winner’s position in a broader dominance hierarchy, showing that
hierarchies constructed using an index based on post-conflict signalling correlated
well with those produced by more classical methods. Intriguingly, Logue et al. [16]
recently reported that contests between male field crickets Teleogryllus oceanicus
that were unable to sing were more aggressive than interactions between males that
were free to signal, supporting the view that signalling can serve to mitigate the
costs of fighting in these species.
As predicted by our current model, there is now a considerable amount of
evidence from the cricket literature that eventual winners tend to signal far more
frequently than losers after fighting [1, 3]. One factor driving this basic result in
our model (and most likely in the experiments) is our assumption that the marginal
cost of signalling is lower for the winner than the loser, i.e., γw < γl ; see (2.4). We
consider this an entirely realistic condition given that the victor is likely to have
“more left in the tank” than the vanquished (see [4], pp. 121–122 for a similar
argument). Indeed, in these cases costly signals may serve as an honest indicator
of how much the victor has in reserve, and thereby intimidate the opponent into
submission. The “Ali shuffle” [8] is potentially one such example of an honest
demonstration of a fighter’s superiority. Analogous behaviors, which may have
2 Signalling Victory to Ensure Dominance 31
v 0, 0 v ln z a, 0
1 r
1 a
MOTIVATION FOR WINNER TO DISPLAY
qv1
2
INTENSITY OF DISPLAY
0.6
1.7
1.5
0.3
1.3
1.1
0 Tm
0 3 6 9
CONTEST LENGTH
Fig. 2.2 Scaled intensity of winner’s victory display as a function of scaled contest length for
various values of the parameter ρ = (1 − b)/γw (assumed to exceed 1) in the limit as ζ → 1 from
above, where ζ = b/γl
In general, however, both critical values are finite. For contest lengths below the
first critical value, neither the winner nor the loser displays at the ESS. For contest
lengths between the two critical values, only the winner displays, with intensity that
increases with T . For contest lengths greater than the second critical value, the ESS
breaks down as described at the end of the appendix, and in such a way that a loser’s
optimal response will sometimes be to match the winner’s display. Thus, according
2 Signalling Victory to Ensure Dominance 33
to our model, a loser should be expected to display only if the contest is so long
that its length exceeds the second critical value. Those unusual biological examples
in which only the loser displays (e.g., [15]) may potentially be explained by some
sort of subservient signal to assure dominance to the victor, thereby reducing future
conflict [4].
There is an intriguing parallel between one of our results on victory displays
and a result concerning winner effects that Mesterton-Gibbons [19] found, several
years before victory displays were first reviewed by Bower [4]. A winner effect is
an increased probability of victory in a later contest following victory in an earlier
contest [21], which in Mesterton-Gibbons [19] is mediated through increased self-
perception of strength. The greater the likelihood of a later victory, the more likely
it is that the earlier victory will eventually lead to dominance over the opponent.
Thus a winner effect may also be regarded as an attempt to convert victory into
dominance, even though there is no display. The result discovered by Mesterton-
Gibbons [19] is that there can be no winner effect unless b < 12 , where b has exactly
the same interpretation as in our current model, i.e., an inverse measure of the
reproductive advantage of dominance. Thus, to the extent that victory displays and
winner effects can both be regarded as factors favoring dominance, such factors are
most operant when b < 12 .
Finally, for the sake of tractability, we did not explicitly model the variation
of strength that supports any variation of contest length observed in nature. On
the contrary, we assumed that T is fixed for a theoretical population; and we
obtained an evolutionarily stable response to that T , which is likewise fixed for
the theoretical population. Over many such theoretical populations, each with a
different T , however, there will be many different ESS responses; and in effect
we have implicitly assumed that the variation of ESS with T thus engendered
will reasonably approximate the variation of signal intensity with contest length
observed within a single real population. Essentially this assumption—phrased more
generally, that ESS variation over many theoretical populations each characterized
by a different parameter value will reasonably approximate variation of behavior
with respect to that parameter within a single real population—is widely adopted
in the literature, although rarely made explicit, as here. Indeed essentially this
assumption is made whenever a game-theoretic model predicts the dependence of
an ESS on a parameter that varies within a real population, but whose variance is
not accounted for by the model.
Acknowledgements We are grateful to Lauren Fitzsimmons and two anonymous reviewers for
constructive feedback on earlier versions of the manuscript.
Strategy v is a strong, global evolutionarily stable strategy or ESS in the sense of [17]
if (and only if) it is uniquely the best reply to itself, in the sense that f (v, v) > f (u, v)
34 M. Mesterton-Gibbons and T.N. Sherratt
for all u
= v; or, equivalently for our model, if v1 is a winner’s best reply to a loser’s
v2 and v2 is a loser’s best reply to a winner’s v1 .1
From (2.2) and (2.6) we have
⎧
⎪ 1
∂f ⎨− γw θ if u1 < v2
= 2 (A1)
∂ u1 ⎪
⎩ 1 (1 − b)(1 − δ ) e−θ (u1−v2 ) − γw θ if u1 > v2
2
with ∂ 2 f /∂ u21 = − 12 θ 2 (1−b)(1− δ ) e−θ (u1 −v2 ) < 0 for u1 > v2 but ∂ 2 f /∂ u21 = 0 for
u1 < v2 . So, with respect to u1 , f decreases from u1 = 0 to u1 = v2 . What happens
next depends on the limit of ∂ f /∂ u1 as u1 → v2 from above, which is 12 {(1 − b)
(1 − δ ) − γw}θ . If this quantity is not positive, then f continues to decrease, and so
the maximum of f with respect to u1 occurs at u1 = 0. So a winner’s best reply is
u1 = 0 whenever δ > 1 − γw /(1 − b) (which is true in particular if γw > 1 − b). If,
on the other hand, δ < 1 − γw /(1 − b), then there is a local maximum for u1 > v2
where ∂ f /∂ u1 = 0 or
(1 − b)(1 − δ )
θ u1 = θ v2 + ln . (A2)
γw
Note that the right-hand side of (A3) is always positive (because x − 1 − ln(x) > 0
for all x > 1). In sum, a winner’s best reply is u1 = 0 unless δ < 1 − γw /(1 − b)
and (A3) holds, in which case, the best reply is given by (A2). In particular, zero is
always a winner’s best reply if γw > 1 − b.
Similarly,
⎧
⎪ 1
∂f ⎨ b(1 − δ ) eθ (u2−v1 ) − γl θ if u2 < v1
= 2 (A4)
∂ u2 ⎪
⎩− 1 γl θ if u2 > v1
2
with ∂ 2 f /∂ u22 = 12 θ 2 b(1 − δ ) eθ (u2 −v1 ) > 0 for u2 < v1 but ∂ 2 f /∂ u22 = 0 for u2 > v1 .
Note that the limit of ∂ f /∂ u2 as u2 → v1 from below is 12 {b(1 − δ ) − γl }θ . Because
1 In general, strategy v is an ESS if it does not pay a potential mutant to switch from v to any other
strategy, and v need not satisfy the strong condition f (v, v) > f (u, v) for all u
= v. If there is at least
one alternative best reply u such that f (u, v) = f (v, v) but v is a better reply than u to all such u
( f (v, u) > f (u, u)), then v is called a weak ESS. For our model, however, any ESS is a strong ESS,
as is typical of continuous games ([18], p. 408).
2 Signalling Victory to Ensure Dominance 35
∂ 2 f /∂ u22 > 0, if the limit is negative, i.e., if δ > 1 − γl /b, then f decreases with
respect to u2 and has its maximum where u2 = 0, so that a loser should not display.
If, on the other hand, the limit is positive, i.e., δ < 1 − γl /b, then f at least partly
increases with respect to u2 for u2 < v1 ; and so the maximum of f with respect to u2
occurs either at u2 = 0 or u2 = v1 , depending on which has the higher value of f . Let
xc denote the unique positive root of the equation (1 − δ )(1 − e−x ) = γl x/b. Then
straightforward algebra reveals that the maximum is at 0 if θ v1 > xc but at v1 if
θ v1 < xc . In sum, a loser’s best reply is u2 = 0 unless δ < 1 − γl /b and θ v1 < xc , in
which case, the best reply is v1 . Clearly, θ v1 < xc holds for v1 = 0, so that u2 = 0
is in particular the best reply to v1 = 0; however, this result follows more readily
directly from (A4). Also, note that zero is always a loser’s best reply if γl > b.
For v = (v1 , v2 ) to be an ESS it must be a best reply to itself, i.e., we require v1
to be a winner’s best reply to the loser’s v2 at the same time as v2 is a loser’s best
reply to the winner’s v1 . If
γw γl
δ > max 1 − ,1 − (A5)
1−b b
then the unique ESS is v = (0, 0), because it follows from the discussion after (A1)
that v1 = 0 is the best reply to any v2 , and hence to v2 = 0; and from the discussion
after (A4) that v2 = 0 is the best reply to any v1 , and hence to v1 = 0. If
γw γl
1− < δ < 1− (A6)
1−b b
then v = (0, 0) is still the ESS by the above discussion and the remark at the end of
the preceding paragraph. If instead
γl γw
1− < δ < 1− (A7)
b 1−b
then an ESS does not exist. Consider a population in which a winner displays with
small positive intensity v1 . Then θ v1 < xc ; and, from the discussion following (A4),
a loser’s best reply is to match the display. From (A2), a winner’s best reply is
now to increase the intensity of its display, because (A3) invariably holds; and a
loser’s best reply in turn is again to match the display. Continuing in this manner,
we observe an “arms race” of increasing display intensity, until either (A3) or
θ v1 < xc is violated. If the former, then a winner’s best reply is not to display,
36 M. Mesterton-Gibbons and T.N. Sherratt
which a loser matches, so that it pays for a winner to display at higher intensity;
if the latter, then a loser’s best reply becomes no display, but now a winner’s best
reply is to display with intensity λ /θ . Either way, the unstable cycle continues ad
infinitum. The study of victory displays is still in its infancy, and researchers are still
trying to characterize when it occurs and with what frequency. Therefore, there is
no study into its temporal dynamics (within or between generations). Furthermore,
full analysis of the evolutionary dynamics when no ESS exists is beyond the scope
of this paper. Nevertheless, we broach this issue in Appendix B.
In this appendix we remark on why, when no ESS exists, the evolutionary dynamics
require a more sophisticated approach than the one we have taken in this paper
and cannot readily be addressed by the standard framework of discrete evolutionary
games with replicator dynamics (e.g., [6, 11]). To make our point as expeditiously
as possible, we explore circumstances in which b < 12 and hence ρ > ζ but T /μ
ζ
exceeds the second critical value ln( ζ −1 ) of Sect. 2.3 (corresponding to the shaded
triangle within the no-ESS region of Fig. 2.1).
Accordingly, consider a mixture of three strategies that appear to evoke the
discussion towards the end of Appendix A, namely, a non-signalling strategy,
denoted by N or Strategy 1; the ESS signalling strategy for the dark shaded rectangle
of Fig. 2.1, denoted by S or Strategy 2; and a matching strategy, denoted by M
ρ
or strategy 3, which displays with the ESS intensity corresponding to ln( ρ −1 )<
ζ
T / μ < ln( ζ −1 ) after winning, but matches the winner’s display after losing. From
the viewpoint of a focal u-strategist against a v-strategist, these three strategies are
defined, respectively, by u = (0, 0) for N; u = (λ /θ , 0) for S; and u = (λ /θ , v1 )
for M. Let the proportions of N, S and M be x1 , x2 and x3 , respectively (so that
x1 + x2 + x3 = 1); and let ai j be the reward to strategy i against strategy j (for
1 ≤ i, j ≤ 3). Then from (2.2), (2.6) and (2.12) we have a11 = 12 {(1 − 2b)q(0, 0) −
cw (0) − cl (0)} + b = 12 δ + (1 − δ )b, a12 = 12 {(1 − b)q(0, 0) − bq(λ /θ , 0) − cw(0) −
cl (0)} + b = 12 {(1 − b)δ + b(1 + 1/ρ )}, and so on, yielding the reward matrix
⎡ ρ (1−b)δ +(ρ +1)b
⎤
2 δ + (1 − δ )b
1
2ρ a12
⎢ ρ −1+b+(1−δ )ρ b 1 ρ −1+2b ⎥
A= ⎣ 2ρ − 2 γw ln(ρ /α ) 2ρ − 2 γw ln(ρ /α )
1
a12 − 12 γw ln(ρ /α ) ⎦
a21 a21 − 2 γl ln(ρ /α )
1
a11 − 12 (γw + γl ) ln(ρ /α )
(B1)
Thus a11 < a21 , and Strategy 1 is not an ESS. However, because a11 − a21 + a22 −
a12 = 0, it also follows that a22 > a12 . So if Strategy 2 is also not an ESS, then it
2 Signalling Victory to Ensure Dominance 37
must be Strategy 3 that invades. But a22 − a32 = 12 γl {(1/ρ − 1/α )ζ + ln(ρ /α )} may
have either sign, and in particular will always be positive for sufficiently small ζ ,
that is, for ζ sufficiently close to α (ζ > α having been assumed). Furthermore, if
we suppose that the point (ρ , ζ ) in Fig. 2.1 has migrated from the signalling ESS
region (dark shaded rectangle) into the no-ESS region (shaded triangle immediately
above) because environmental pressures have increased the value the value of ζ
(by decreasing γl ), then it is precisely such sufficiently small values of ζ that are
relevant. Thus S will often be an ESS of the discrete game defined by the matrix A
even though it is no longer an ESS of the continuous game described in the main
body of our paper.
The upshot is that replicator dynamics cannot readily be used to describe what
happens when S is not an ESS of our continuous game; the dynamics described
verbally towards the end of Appendix A are not adequately reflected by a mix of M,
N and S. They require a much more sophisticated approach, and we leave the matter
open for future work.
Nevertheless, let us suppose that ζ is indeed large enough for M to invade S,
that is,
ρα
ζ > ln(ρ /α ). (B2)
ρ −α
Then a22 − a32 < 0, and because a33 − a23 + a22 − a32 = 0, we must have
a33 − a23 > 0. That is, from (B1), a33 > a13 − 12 γw ln(ρ /α ). Thus a33 >
max(a13 , a23 ) will hold for sufficiently small γw , making M the unique ESS
of the discrete game defined by A. Otherwise (i.e., for larger values of γw ),
a32 > a22 , a21 = a31 > a11 and a13 > a33 will hold simultaneously, so that M
can invade S, S or M can invade N and N can invade M. In these circumstances,
the population will eventually reach a polymorphism of N and M at which
x1 = (a13 − a33 )/(a13 − a33 + a31 − a11 ), x2 = 0 and x3 = 1 − x1 . All of these
results have been verified by numerical integration, for relevant parameter values,
of the replicator equations ẋi = xi {(Ax)i − x · Ax}, i = 1, 2, 3, where x = (x1 , x2 , x3 )
and an overdot denotes differentiation with respect to time (see, e.g., [11], p. 68).
References
5. Caro, T.M.: The functions of stotting in Thomson’s gazelles: some tests of the predictions.
Anim. Behav. 34, 663–684 (1986)
6. Cressman, R.: Evolutionary Dynamics and Extensive Form Games. MIT Press, Cambridge
(2003)
7. Cresswell, W.: Song as a pursuit-deterrent signal, and its occurrence relative to other
anti-predation behaviours of skylark (Alauda arvensis) on attack by merlins (Falco
columbarius). Behav. Ecol. Sociobiol. 34, 217–223 (1994)
8. Golus, C.: Muhammad Ali. Twenty-First Century Books, Minneapolis (2006)
9. Grafe, T., Bitz, J.H.: An acoustic postconflict display in the duetting tropical boubou (Laniarius
aethiopicus): a signal of victory? BMC Ecol. 4(1) (2004). http://www.biomedcentral.com/
1472-6785/4/1
10. Hardy, I.C.W., Briffa, M. (eds.): Animal Contests. Cambridge University Press, Cambridge
(2013)
11. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge
University Press, Cambridge (1998)
12. Huntingford, F., Turner, A.K.: Animal Conflict. Chapman & Hall, London (1987)
13. Jang, Y., Gerhardt, H.C., Choe, J.C.: A comparative study of aggressiveness in eastern North
American field cricket species (genus Gryllus). Behav. Ecol. Sociobiol. 62, 1397–1407 (2008)
14. Leal, M., Rodrı́guez-Robles, J.A.: Signalling displays during predator-prey interactions in a
puerto rican anole, Anolis cristatellus. Anim. Behav. 54, 1147–1154 (1997)
15. Lippold, S., Fitzsimmons, L.P., Foote, J.R., Ratcliffe, L.M., Mennill, D.J.: Post-contest
behaviour in black-capped chickadees (Poecile atricapillus): loser displays, not victory
displays, follow asymmetrical countersinging exchanges. Acta Ethol. 11, 67–72 (2008)
16. Logue, D.M., Abiola, I.O., Rains, D., Bailey, N.W., Zuk, M., Cade, W.H.: Does signalling
mitigate the cost of agonistic interactions? a test in a cricket that has lost its song. Proc. R. Soc.
Lond. B 277, 2571–2575 (2010)
17. Maynard, S.J.: Evolution and the Theory of Games. Cambridge University Press, Cambridge
(1982)
18. McGill, B.J., Brown, J.S.: Evolutionary game theory and adaptive dynamics of continuous
traits. Annu. Rev. Ecol. Syst. 38, 403–435 (2007)
19. Mesterton-Gibbons, M.: On the evolution of pure winner and loser effects: a game-theoretic
model. Bull. Math. Biol. 61, 1151–1186 (1999)
20. Mesterton-Gibbons, M., Sherratt, T.N.: Victory displays: a game-theoretic analysis. Behav.
Ecol. 17, 597–605 (2006)
21. Rutte, C., Taborsky, M., Brinkhof, M.W.G.: What sets the odds of winning and losing? Trends
Ecol. Evol. 21, 16–21 (2006)
22. Tracy, J.L., Matsumoto, D.: The spontaneous expression of pride and shame: evidence for
biologically innate nonverbal displays. Proc. Natl. Acad. Sci. U.S.A. 105, 11655–11660 (2008)
Chapter 3
Evolutionary Games for Multiple Access Control
† Thismaterial is based upon work supported in part by the U.S. Air Force Office of Scientific
Research (AFOSR) under grant number FA9550-09-1-0249, and in part by the AFOSR MURI
Grant FA9550-10-1-0573.
The material in this paper was partially presented in [9] and [10].
Q. Zhu () • T. Başar
Coordinated Science Laboratory and Department of Electrical and Computer Engineering,
University of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL 61801, USA
e-mail: [email protected]; [email protected]
H. Tembine
Department of Telecommunications, École Supérieure d’Electricité (SUPELEC),
3 rue Joliot-Curie, 91192 Gif-Sur-Yvette Cedex, France
e-mail: [email protected]
G-function dynamics and Smith dynamics on rate control and channel selection,
respectively. We show that the evolving game has an equilibrium and illustrate these
dynamics with numerical examples.
3.1 Introduction
Recently much interest has been devoted to understanding the behavior of multiple
access controls under constraints. A considerable amount of work has been carried
out on the problem of how users can obtain an acceptable throughput by choosing
rates independently. Motivated by an interest in studying a large population of users
playing a game over time, evolutionary game theory was found to be an appropriate
framework for communication networks. It has been applied to problems such as
power control in wireless networks and mobile interference control [1, 5, 6, 11].
The game-theoretical models considered in previous studies on user behavior in
code division multiple access (CDMA) [4, 33] are static one-shot noncooperative
games in which users are assumed to be rational and optimize their payoffs
independently. Evolutionary game theory, on the other hand, studies games that are
played repeatedly and focuses on the strategies that persist over time, yielding the
best fitness of a user in a noncooperative environment on a large time scale.
In [19], an additive white Gaussian noise (AWGN) multiple-access-channel
problem was modeled as a noncooperative game with pairwise interactions, in which
users were modeled as rational entities whose only interests were to maximize their
own communication rates. The authors obtained the Nash equilibrium NE/NEs
of Nash equilibrium/equilibria of the two-user game and introduced a two-player
evolutionary game model with pairwise interactions based on replicator dynamics.
However, the case where interactions are not pairwise arises frequently in communi-
cation networks, such as the CDMA or the orthogonal frequency-division multiple
access (OFDMA) in a Worldwide Interoperability for Microwave Access (WiMAX)
environment [11].
In this work, we extend the study of [19] to wireless communication systems with
an arbitrary number of users corresponding to each receiver. We formulate a static
noncooperative game with m users subject to rate capacity constraints and extend
the constrained game to a dynamic evolutionary game with a large number of users
whose strategies evolve over time. Unlike evolutionary games with discrete and
finite numbers of actions, our model is based on a class of continuous games, known
as continuous-trait games. Evolutionary games with continuum action spaces are
encountered in a wide variety of applications in evolutionary ecology, such as
evolution of phenology, germination, nutrient foraging in plants, and predator–prey
foraging [7, 23].
3 Evolutionary Games for Multiple Access Control 41
3.1.1 Contribution
The main contributions of this work can be summarized as follows. We first intro-
duce a game-theoretic framework for local interactions between many users and a
single receiver. We show that the static continuous-kernel rate allocation game with
coupled rate constraints has a convex set of pure NEs, coinciding with the maximal
face of the polyhedral capacity region. All the pure equilibria are Pareto optimal
and are also strong equilibria, resilient to simultaneous deviation by coalition of any
size. We show that the pure NEs in the rate allocation problem are 100 % efficient in
terms of price of anarchy (PoA) and constrained strong price of anarchy (CSPoA).
We study the stability of strong equilibria, normalized equilibria, and evolutionarily
stable strategies (ESSs) using evolutionary game dynamics such as Brown–von
Neumann–Nash dynamics, generalized Smith dynamics, and replicator dynamics.
We further investigate the correlated equilibrium of the multiple-access game where
the receiver can send signals to the users to mediate the behaviors of the transmitters.
Based on the single-receiver model, we then propose an evolutionary game-
theoretic framework for the hybrid additive white Gaussian noise multiple-access
channel. We consider a communication system of multiple users and multiple
receivers, where each user chooses a rate and splits it over the receivers. Users have
coupled constraints determined by the capacity regions. We characterize the NE of
the static game and show the existence of the equilibrium under general conditions.
Building upon the static game, we formulate a system of hybrid evolutionary game
dynamics using G-function dynamics and Smith dynamics on rate control and
channel selection, respectively. We show that the evolving game has an equilibrium
and illustrate these dynamics with numerical examples.
The rest of the paper is structured as follows. We present in Sect. 3.2.1 the evolu-
tionary game model of rate allocation in additive white Gaussian multiple-access
wireless networks and analyze its equilibria and Pareto optimality in Sect. 3.2.2. In
42 Q. Zhu et al.
Sect. 3.2.3, we present strong equilibria and the PoA of the game. In Sect. 3.2.4,
we discuss how to select one specific equilibrium such as normalized equilibrium
and ESSs. Section 3.2.5 studies the stability of equilibria and the evolution of
strategies using game dynamics. Section 3.2.6 analyzes the correlated equilibrium
of the multiple-acess game.
In Sect. 3.3.1, we present the hybrid rate control model where users can choose
the rates and the probability of the channels to use. In Sect. 3.3.2, we characterize
the NE of the constrained hybrid rate control game model, pointing out the existence
of the NE of the hybrid model and methods to find it. In Sect. 3.3.3, we apply
evolutionary dynamics to both rates and channel selection probabilities. We use
simulations to demonstrate the validity of these proposed dynamics and illustrate
the evolution of the overall evolutionary dynamics of the hybrid model. Section 3.4
concludes the paper. For the reader’s convenience, we summarize the notations in
Table 3.1 and the acronyms in Table 3.2.
3 Evolutionary Games for Multiple Access Control 43
of evolutionary game theory are (a) evolutionarily stable state [25], which is a
refinement of equilibrium, and (b) evolutionary game dynamics, such as replicator
dynamics [29], which describes the evolution of strategies or frequencies of use of
strategies in time [7, 21].
The single population evolutionary rate allocation game is described as follows:
there is one population of senders (users) and several receivers. The number of
senders is large. At each time, there are many one-shot finite games called local
interactions, which models the interactions among a finite number of users in
the population. Each sender of the population chooses from his or her set of
strategies Ai , which is a nonempty, convex, and compact subset of R. Without
loss of generality, we can suppose that user i chooses his or her rate in the interval
Ai = [0,C{i} ], where C{i} is the rate upper bound for user i (to be made precise
subsequently) as outside of the capacity region the payoff (to be defined later) will
be zero. Let Δ (Ai ) be the set of probability distributions over the pure strategy
set Ai . The set Δ (Ai ) can be interpreted as the set of mixed strategies for the N-
person game at the local interaction. In the case where the N-person local interaction
is identical at all local interactions in the population, the set Δ (Ai ) can also
be interpreted as the set of distributions of strategies among the population. Let
λi ∈ Δ (Ai ) and E be a λi − measurable subset of R; then λi (E) represents the
fraction of users choosing a strategy from E at time t. A distribution λi ∈ Δ (Ai )
is sometimes called the “state” of the population. We denote by B(Ai ) the Borel
σ -algebra on Ai and by d(λ , λ ) the distance between two states measured with
respect to the weak topology. An example of such a distance could be the classical
Wasserstein distance or the Monge–Kantorovich distance between two measures.
Each user’s payoff depends on the opponents’ behavior through the distribution
of the opponents’ choices and of their strategies. The payoff of user i in a local
interaction with (N − 1) other users is given as a function ui : RN −→ R. The rate
profile α ∈ RN must belong to a common capacity region C ⊂ RN defined by 2N − 1
linear inequalities. The expected payoff of a sender i transmitting at a rate a when
the state of the population is μ ∈ Δ (Ai ) is given by Fi (a, μ ). The expected payoff
for user i is
Fi (λi , μ ) := ui (α ) λi (dαi ) μ (dα j ).
α ∈C =i
j
Local interaction refers to the problem setting of one receiver and its uplink
additive white Gaussian noise (AWGN) multiple-access channel with N senders
with coupled constraints (or actions). The signal at the receiver is given by Y =
ξ + ∑Ni=1 Xi , where Xi is a transmitted signal of user i and ξ is a zero-mean Gaussian
3 Evolutionary Games for Multiple Access Control 45
noise with variance σ02 . Each user has an individual power constraint E(Xi2 ) ≤ Pi and
channel gain hi . The optimal power allocation scheme is to transmit at the maximum
power available, i.e., Pi , for each user. Hence, we consider the case in which the
maximum power is attained. The decisions of the users, then, consist of choosing
their communication rates, and the receiver’s role is to decode, if possible. The
capacity region is the set of all vectors α ∈ RN+ such that users i ∈ N := {1, 2, . . . , N}
can reliably communicate at rate αi , i ∈ N. The capacity region C for this channel
is the set
Pj h j
C = α ∈ R+ ∑ αi ≤ log 1 + ∑ 2 .∀ 0/ Ω ⊆ N .
N
i∈Ω j∈Ω σ0
Example 3.1 (Example of capacity region with three users). In this example, we
illustrate the capacity region with three users. Let α1 , α2 , α3 be the rates of the users
and Pi = P, hi = h, ∀i ∈ {1, 2, 3}. Based on (3.1), we obtain a set of inequalities
⎧
⎪
⎪ α1 ≥ 0, α2 ≥ 0, α3 ≥ 0,
⎪
⎪
⎪
⎪
⎪ αi ≤ log 1 + Ph2 , i = 1, 2, 3,
⎪
⎨ σ0
⎪
⎪ α + α ≤ log 1 + 2 Ph
,i
= j, i, j = 1, 2, 3
⎪
⎪ i j σ02
⎪
⎪
⎪
⎪
⎩ α + α + α ≤ log 1 + 3 Ph ,
1 2 3 σ 2
0
Note that M3 is a totally unimodular matrix. By letting Ph = 25, σ02 = 0.1, we sketch
in Fig. 3.2 the capacity region with three users.
The capacity region reveals the competitive nature of interactions among senders:
if a user i wants to communicate at a higher rate, then one of the other users must
lower his or her rate; otherwise, the capacity constraint is violated. We let
Pi hi
ri,Ω := log 1 + 2 , i ∈ N, Ω ⊆ N
σ0 + ∑i ∈Ω ,i =i Pi hi
denote the bound on the rate of a user when the signals of the |Ω | − 1 other users
are treated as noise.
46 Q. Zhu et al.
Due to the noncooperative nature of the rate allocation, we can formulate the
one-shot game
Ξ = N, (Ai )i∈N , (ui )i∈N ,
where the set of users N is the set of players, Ai , i ∈ N, is the set of actions, and
ui , i ∈ N, are the payoff functions.
3.2.2 Payoffs
where C is the indicator function, α−i is a vector consisting of other players’ rates,
i.e., α−i = [α1 , . . . , αi−1 , αi+1 , . . . , αN ], and gi is a positive and strictly increasing
function for each fixed α−i . Since the game is subject to coupled constraints, the
action set Ai is coupled and dependent on other players’ actions. Given the strategy
profile α−i of other players, the constrained action set Ai is given by
We then have an asymmetric game. The minimum rate that user i can guarantee in
the feasible regions is ri,N , which is different than r j,N .
Each user i maximizes ui (αi , α−i ) over the coupled constraint set. Owing to the
monotonicity of the function gi and the inequalities that define the capacity region,
we obtain the following lemma.
Lemma 3.1. Let BRi (α−i ) be the best reply to the strategy α−i , defined by
where Γi = {Ω ∈ 2N , i ∈ Ω }.
Proposition 3.1. The set of NEs is
(αi , α−i ) | αi ≥ ri,N , ∑ αi = CN .
i∈N
If β ∈ {(αi , α−i ) αi ≥ ri,N , ∑Ni=1 αi = CΩ }, then, from Lemma 3.1, BRi (β−i ) =
{βi }. Hence, β is a strict equilibrium. Moreover, this strategy β is Pareto optimal
because the rate of each user is maximized under the capacity constraint. These
strategies are social welfare optimal if the total utility ∑Ni=1 ui (αi , α−i ) = ∑Ni=1 gi (αi )
is maximized subject to constraints.
Note that the set of pure NEs is a convex subset of the capacity region.
48 Q. Zhu et al.
∑ αi < CN − ∑ αi .
i∈Dev i∈Dev
/
1 Note thatthe set of constrained strong equilibria is a subset of the set of NEs (by taking coalitions
of size one) and any constrained strong equilibrium is Pareto optimal (by taking coalition of full
size).
3 Evolutionary Games for Multiple Access Control 49
Then there exists i such that αi > αi . Since gi is nondecreasing, this implies that
gi (αi ) > gi (αi ). A user i who is a member of coalition Dev does not improve his or
her payoff. If the rates of some of the deviants are increased, then the rates of some
other users from the coalition must decrease. If (αi )i∈Dev satisfies
∑ αi = CN − ∑ αi ,
i∈Dev i∈Dev
/
then some users in coalition Dev have increased their rates compared with (αi )i∈Dev
while others in Dev have decreased their rates of transmission (because the total rate
is the constant CN − ∑i∈Dev
/ αi ). The users in Dev with a lower rate αi ≤ αi do not
benefit by being a member of the coalition (the Shapley criterion of membership of
coalition does not hold). And this holds for any 0/ Dev N. This completes the
proof.
Corollary 3.1. In the constrained rate allocation game, NEs and strong equilibria
in pure strategies coincide.
N
W (α ) =C (α ) ∑ gi (αi ) ,
i=1
∂ ∂
W (α ) = gi (αi ) = ui
∂ αi ∂ αi
in the interior of the capacity region C, and W is a constrained potential function [3]
in pure strategies.
Corollary 3.2. The local maximizers of W in C are pure NEs. Global maximizers
of W in C are both constrained strong equilibria and social optima for the local
interaction.
50 Q. Zhu et al.
Throughout this subsection, we assume that the functions gi are the identity
function, i.e., gi (x) = id(x) := x. One metric used to measure how much the
performance of decentralized systems is affected by the selfish behavior of its
components is the price of anarchy (PoA). We present a similar price for strong
equilibria under the coupled rate constraints. This notion of PoA can be seen as
an efficiency metric that measures the price of selfishness or decentralization and
has been extensively used in the context of congestion games or routing games
where typically users have to minimize a cost function [37, 38]. In the context of
rate allocation in the multiple-access channel, we define an equivalent measure of
PoA for rate maximization problems. One of the advantages of a strong equilibrium
is that it has the potential to reduce the distance between the optimal solution and
the solution obtained as an outcome of selfish behavior, typically in cases where the
capacity constraint is violated at each time. Since the constrained rate allocation
game has strong equilibria, we can define the strong price of anarchy (SPoA),
introduced in [12], as the ratio between the payoff of the worst constrained strong
equilibrium and the social optimum value which CN .
Theorem 3.2. The SPoA of the constrained rate allocation game is 1 for gi (x) = x.
Note that for gi = id, the constrained SPoA (CSPoA) can be less than one. However,
the optimistic PoA of the best constrained equilibrium, also called the price of
stability [13], is one for any function gi , i.e., the efficiency of the “best” equilibria is
100 %.
We showed in the previous sections that our rate allocation game has a continuum
of pure NEs and strong equilibria. We address now the problem of selecting
one equilibrium that has a certain desirable property: the normalized pure NE,
introduced in [26]; see also [20, 22, 28]. We introduce the problem of constrained
maximization faced by each user when the other rates are at the maximal face of
polytope C:
max ui (α ) (3.5)
α
s.t.α1 + · · · + αN = CN , (3.6)
For a fixed vector ζ with identical entries, define the normal form game Γ (ζ )
with N users, where actions are taken as rates and the payoffs given by L(α , ζ ).
A normalized equilibrium is an equilibrium of the game Γ (ζ ∗ ), where ζ ∗ is
normalized into the form ζi∗ = τci for some c > 0, τi > 0. We now have the following
result due to Goodman [20], which implies Rosen’s condition on uniqueness for
strict concave games.
Theorem 3.3. Let ui be a smooth and strictly concave function in αi , with each ui
convex in α−i , and let there exist some ζ such that the weighted nonnegative sum
of the payoffs ∑Ni=1 ζi ui (α ) is concave in α . Then, the matrix G(α , ζ ) + GT (α , ζ )
is negative definite (which implies uniqueness), where G(α , ζ ) is the Jacobian with
respect to α of
In this subsection, we study the stability of equilibria and several classes of evolu-
tionary game dynamics under a symmetric case, i.e., Pi = P, hi = h, gi = g, Ai = A,
∀i ∈ N. We will drop subscript index i where appropriate. We show that the
associated evolutionary game has a unique pure constrained ESS.
C C
Proposition 3.2. The collection of rates α = NN , . . . , NN , i.e., the Dirac distri-
CN
bution concentrated on the rate N , is the unique symmetric pure NE.
Proof. Since the constrained rate allocation game is symmetric, there exists a
symmetric (pure or mixed) NE. If such an equilibrium exists in pure strategies, each
user transmits with the same rate r∗ . It follows from Proposition 3.1 of Sect. 3.2.2
and the bound rN ≤ NN that r∗ satisfies Nr∗ = CN and r∗ is feasible.
C
Since the set of feasible actions is convex, we can define a convex combination
of rates in the set of feasible rates. For example, εα + (1 − ε )α is a feasible rate if
52 Q. Zhu et al.
α and α are feasible. The symmetric rate profile (r, r, . . . , r) is feasible if and only
if 0 ≤ r ≤ r∗ = NN . We say that rate r is a constrained ESS if it is feasible and for
C
every mutant strategy mut = α there exists εmut > 0 such that
rε := ε mut + (1 − ε )r ∈ C ∀ε ∈ (0, εmut )
u(r, rε , . . . , rε ) > u(mut, rε , . . . , rε ) ∀ε ∈ (0, εmut )
Define the mixed capacity region M(C) as the set of measure profile
(μ1 , μ2 , . . . , μN ) such that
|Ω |
R+
∑ αi μi (dαi ) ≤ CΩ , ∀Ω ⊆ 2N .
i∈Ω i∈Ω
where νk = k1 μ is the product measure on [0, ∞[k . The constraint set becomes the
set of probability measures on R+ such that
CN
0 ≤ E(μ ) := αi μ (dαi ) ≤ < C{1} .
R+ N
N
(b2 , . . . , bN ) ∈ Da , where Da = {(b2 , . . . , bN ) ∈ RN−1 =1 bi ≤ DΩ , Ω ⊆ 2 }.
+ , ∑i∈Ω ,i
a
Thus, we have
F(a, μ ) = u(a, b2 , . . . , bN ) νN−1 (db)
RN−1
+
= g(a) νN−1 (db)
b∈RN−1
+ , (a,b)∈C
= [0,CN −(N−1)E(μ )] g(a) × νN−1 (db).
b∈Da
As we can see, βxa is independent of a. Thus, in the unconstrained case, the first
double integral becomes
βxa (λt )μ (dx)μ (da) = βxa (λt )μ (dx),
a∈A x∈E x∈E
where
V (x, λt ) = K βxa (λt )λt (da) − βax (λt )λt (da) .
a∈A a∈A
we obtain
V (x, λt ) = K [F(x, λt )−F(a, λt )]λt (da)=K F(x, λt )− F(a, λt )λt (da) ,
a∈A a∈A
where the difference between the payoff at x and the average payoff.
A common property that applies to all these dynamics is that the set of NEs is
a subset of rest points (stationary points) of the evolutionary game dynamics. Here
we extend the concepts of these dynamics to evolutionary games with a continuum
action space and coupled constraints, and interactions with more than two users.
The counterparts of these results in discrete action space can be found in [21, 27].
Theorem 3.5. Any NE of a game is a rest point of the following evolutionary game
dynamics: constrained Brown–von Neumann–Nash, generalized Smith dynamics,
and replicator dynamics. Futhermore, the ESS set is a subset of the rest points of
these constrained evolutionary game dynamics.
Proof. It is clear for pure equilibria using the revision protocols β of these
dynamics. Let λ be an equilibrium. For any rate a in the support of λ , βxa = 0
if F(x, λ ) ≤ F(a, λ ). Thus, if λ is an equilibrium, then the difference between
the microscopic inflow and outflow is V (a, λ ) = 0, given that a is the support of
measure λ . The last assertion follows from the fact that the ESS (if it exists) is an
equilibrium that is a rest point of these dynamics.
Let λ be a finite Borel measure on [0,C{i} ] with full support. Suppose g is
continuous on [0,C{i} ]. Then λ is a rest point of the Brown–von Neumann–Nash
dynamics if and only if λ is a symmetric NE. Note that the choice of topology is
an important issue when defining the convergence of dynamics and the stability
of the dynamics. The most used topology in this area is the topology of the
weak convergence to measure the closeness of two states of the system. Different
56 Q. Zhu et al.
Define a rule of assignment of user i as a map from his signals to his action set
r̄i : βi
−→ αi . A CCE is then characterized by
3 Evolutionary Games for Multiple Access Control 57
dμ ∗ (β ) [ui (αi , α−i | βi )−ui (ri (βi ), α−i )]
Theorem 3.6. The set of constrained pure NEs of the MISO game is given by
max-face(C) = (α1 , . . . , αN ) | αi ≥ 0, ∑ αk = CN .
k∈N
is a CCE. Moreover, any probability distribution over the maximal face of the
capacity region max-face(C) is a CCE distribution.
In this section, we extend the single-receiver case to one with multiple receivers. The
multi-input and multioutput (MIMO) channel access game has been studied in the
context of power allocation and control. For instance, the authors in [6] formulate
a two-player zero-sum game where the first player is the group of transmitters and
the second one is the set of MIMO subchannels. In [5], the authors formulate an N-
person noncooperative power allocation game and study its equilibrium under two
different decoding schemes.
58 Q. Zhu et al.
In this subsection, we establish a model for multiple users and multiple receivers.
Each user needs to decide the rate at which to transmit and the channel to pick.
We formulate a game Ξ = N, (Ai )i∈N , (U i )i∈N , in which the decision variable
is (αi , pi ), and pi = [pi j ] j∈J is a J-dimensional vector, where pi j is the probability
that user i ∈ N will choose channel j ∈ J and pi j needs to satisfy the probability
measure constraints
∑ pi j = 1, pi j ≥ 0, ∀i ∈ N. (3.11)
j∈J
The game Ξ is asymmetric in the sense that the sets of strategies of the users are
different and the payoffs are not symmetric.
P h
Let C j,Ω := log(1 + ∑i∈Ω iσj 2i j ) be the capacity for a subset Ω ⊆ N of users at
0
Pi j hi j
receiver j ∈ J and ri j,Ω := log(1 + σ 2 + ) the bound rate of a user i
0 ∑i ∈Ω ,i =i Pi j hi j
when the signals of the |Ω | − 1 other users are treated as noise at receiver j. Each
receiver j has a capacity region C( j) given by
C( j) = (α , p j ) ∈ [0, 1] N
× RN+
∑ αi pi j ≤ C j,Ω , ∀ 0/ ⊂ Ω j ⊆ N, j∈J .
i∈N
(3.12)
where α = (αi , α−i ) ∈ RN+ and P = (pi , p−i ) ∈ [0, 1]N×J , with pi ∈ [0, 1]J , p−i ∈
[0, 1](N−1)×J . Assume that the utility ui j of a user i transmitting to receiver j is only
3 Evolutionary Games for Multiple Access Control 59
dependent on the user himself and is described by a positive and strictly increasing
function gi : R+ → R+ , i.e., ui j = gi , ∀ j ∈ J, when capacity constraints are satisfied.
With the presence of coupled constraints (3.12) from each receiver and proba-
bility measure constraint (3.11), each sender has his own individual optimization
problem (IOP) given as follows:
s.t. ∑ j∈J pi j = 1, ∀i ∈ N;
pi j ≥ 0, ∀i ∈ N, j ∈ J;
(α , p j ) ∈ C( j), ∀ j ∈ J.
3.3.1.1 An Example
Suppose we have three users and three receivers, that is, N = {1, 2, 3} and
J = {1, 2, 3}. The capacity region at receiver 1 is given by
⎧ ⎫
⎪
⎪ αi ≥ 0, i ∈ {1, 2, 3} ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ p11 α1 ≤ log(1 + Pσ1 h21 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
0
⎪
⎪
⎪
⎪ p21 α2 ≤ log(1 + Pσ2 h22 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
0 ⎪
⎪
⎪
⎪ p31 α3 ≤ log(1 + Pσ3 h23 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎨ 0 ⎬
C(1) = p11 α1 + p21 α2 ≤ log(1 + P1 h1σ+P 2 h2
) .
⎪
⎪
2
⎪
⎪
⎪ ⎪
0
⎪
⎪ α + α ≤ + P1 h1 +P2 h2
) ⎪
⎪
⎪
⎪ p 11 1 p 31 3 log(1 σ 2 ⎪
⎪
⎪
⎪ 0 ⎪
⎪
⎪
⎪ P1 h1 +P2 h2 ⎪
⎪
⎪
⎪ p 21 α 2 + p 31 α 3 ≤ log(1 + ) ⎪
⎪
⎪
⎪ σ02
⎪
⎪
⎪
⎪ P1 h1 +P2 h2 +P3 h3 ⎪⎪
⎪
⎪ p α + p α + p α ≤ log(1 + ) ⎪
⎪
⎪
⎪
11 1 21 2 31 3 σ02 ⎪
⎪
⎪
⎩ ⎪
⎭
0 ≤ pi1 ≤ 1, i ∈ {1, 2, 3}
60 Q. Zhu et al.
In this subsection, we characterize the NEs of the defined game Ξ under the given
capacity constraint. We use the following theorem to prove the existence of an NE
for the case where the rates are predetermined; this result is then used to solve the
game for the case where both the rates and the connection probabilities are (joint)
decision variables.
Theorem 3.7 (Başar and Olsder [34]). Let A = A1 × A2 · · · × AN be a closed,
bounded, and convex subset of RN , and for each i ∈ N let the payoff functional U i :
A → R be jointly continuous in A and concave in ai for every a j ∈ A j , j ∈ N, j
= i.
Then, the associated N-person non-zero-sum game admits an NE in pure strategies.
Applying Theorem 3.7, we have the following results immediately.
Proposition 3.3. Suppose αi , i ∈ N, are predetermined feasible rates. Let feasible
set F be closed, bounded, and convex. If gi in the IOP are continuous on F and
concave in pi (without the assumption of their being positive and strictly increasing)
and the expected payoff functions U i : RN+ × [0, 1]N×J → R are concave in pi and
continuous on F , then the static game admits an NE.
The existence result in Proposition 3.3 only captures the case where the rates
αi are predetermined, and it relies on the convexity requirement of the utility
3 Evolutionary Games for Multiple Access Control 61
maxα ,P Ψ (α , P),
s.t. (α , P) ∈ F .
Using the result in [3], we can conclude that if there exists a solution to the COP,
then there exists an NE to the game Ξ . Since F is compact and nonempty, and the
objective function is continuous, then there exists a solution to the COP and thus an
NE to the game.
The foregoing problem is generally not convex, and the uniqueness of the NE
may not be guaranteed. However, we can still further characterize the NE through
the following propositions.
Proposition 3.5. Let βi j := αi pi j . Without predetermining α , suppose that
(p−i , α−i ) is feasible. A best response strategy at receiver j ∈ J for user i must
satisfy
0 ≤ pi j αi ≤ C j,Ω j − ∑ αk pk j , ∀Ω j , (3.18)
=i
k
where ri j,N is the bound on the rate of user i when signals of |N| − 1 other users
are treated as noise.
Proof. The proof is immediate by observing that the rate of user i at receiver j must
satisfy (3.18) due to the coupled constraints. Thus, the maximum rate that user i can
use to transmit to receiver j without violating the constraints is clearly the minimum
of C j,Ω j − ∑i =i αi pi j over all Ω j . Since the payoff is a nondecreasing function, the
best response for i at receiver j is given by (3.19).
62 Q. Zhu et al.
Proposition 3.6. Let Ki∗ = arg max j∈J gi j (βi j ). If Ki∗ = {k∗ } is a singleton, then
the best response for user i is to choose
pi j = 1 if j = k∗ ,
pi j = 0 otherwise,
β
and we can determine αi by αi = pik∗∗ .
ik
If |Ki∗ | ≥ 2, then the best response correspondence is
pi ∈ Δ (Ki∗ ) if j ∈ Ki∗ ,
0 otherwise.
Interactions among users are dynamic, and users can update their rates and channel
selection with respect to their payoffs and the known coupled constraints. Such a
dynamic process can generally be modeled by an evolutionary process, a learning
process, or a trial-and-error updating process. In classical game theory, the focus
is on strategies that optimize payoffs to the players, whereas in evolutionary game
theory, the focus is on strategies that will persist through time. In this subsection,
we formulate evolutionary game dynamics based on the static game discussed
in Sect. 3.3.1. We use generalized Smith dynamics for channel selection and G-
function-based dynamics for rates. Combining them, we set up a framework of
hybrid dynamics for the overall system.
The action of each user has two components (αi , pi ) ∈ R+ × [0, 1]J . We use
pi as strategies that determine the fitness of user i’s rate αi to receiver j. The
rate selection evolves according to the channel selection strategy P. We may view
channel selection as an inner game that involves a process on a short time scale, but
rate selection is an outer game that represents the dynamical link via fitness on a
longer time scale [7, 8].
3 Evolutionary Games for Multiple Access Control 63
Let α be a fixed rate on the capacity region. We assume that user i occasionally tests
the weights pi j with alternative receivers, keeping the new strategy if and only if it
leads to a strict increase in payoff. If the choice of receivers’ weights of some users
decreases the payoff or violates the constraints due to a strategy change by another
user, then the user starts a random search for a new strategy, eventually settling on
one with a probability that increases monotonically with its realized payoff. For
the foregoing generating-function-based dynamics, the weight of switching from
receiver j to receiver j is given by
η ij j (α , P) = max(0, ui j (α , P) − ui j (α , P))θ , θ ≥ 1,
if the payoff obtained at receiver j is greater than the payoff obtained at receiver j
and the constraints are satisfied; otherwise, η ij j (p, α ) = 0. The frequency of uses of
each receiver is then seen as the selection strategy of receivers.
The expected change at each receiver is the difference between the incoming and
the outgoing flows. The dynamics is also called generalized Smith dynamics [2] and
is given by
ṗi j (t) = ∑
pi j (t)η ij j (α , P(t)) − pi j (t) ∑ η ij j (α , P(t)). (3.20)
j ∈J j ∈J
d := ∑ ṗi j ui j (α , P) = ∑ χi j ui j (α , P)
j∈J j∈J
!
= ∑
pi j ui j (α , P) − ui j (α , P) η ij j
j, j ∈J
!
= ∑
pi j max 0, ui j (α , P) − ui j (α , P) η ij j ,
j, j ∈J
64 Q. Zhu et al.
if the pair (α , P) is in the hybrid capacity region. Notice that the term C j,N −
∑i =i pi j βi j (t) is the maximum rate of i using channel j at time t. Hence, the
G-function-based dynamics is given by
" #
β̇i j = −μ̄ pi j βi j − C j,N + ∑ pi j βi j pi j βi j , (3.21)
i
=i
with initial conditions βi j (0) ≤ C j,{i} , where β = [βi j ] is defined in Proposition 3.5,
which is of the same dimension as α , and αi (t) = ∑ j∈J βi j (t); μ̄ is an appropriate
parameter chosen for the rate of convergence.
3 Evolutionary Games for Multiple Access Control 65
We now combine the two evolutionary game dynamics described in the previous
subsections. Variables α and P are both evolving in time. The overall dynamics are
given by
⎧
⎪
⎪ ṗi j (t) = ∑ j ∈J pi j (t)η ij j (α (t), P(t)) − pi j (t) ∑ j ∈J η ij j (α (t), P(t)),
⎪
⎨
β̇i j (t) = −μ̄ pi j (t)βi j (t) − C j,N + ∑i =i pi j (t)βi j (t) pi j (t)βi j (t), (3.22)
⎪
⎪
⎪
⎩
αi (t) = ∑ j∈J βi j (t), βi j (0) ≤ C j,{i} , ∀ j ∈ J, i ∈ N.
All the equilibria of the hybrid evolutionary rate control and channel selection are
rest points of the preceding hybrid dynamics. The following result can be obtained
directly from Proposition 3.7 and (3.21).
Proposition 3.9. Let (β ∗ , P∗ ) be interior rest points of the hybrid dynamics, i.e.,
βi∗j > 0, p∗i j > 0 and χ (α ∗ , P) = 0. Then for all j,
N N
∑ p∗i j βi∗j = C j,N ; χ ∑ βi∗j , P∗ = 0.
i=1 j=1
0.4
p1
0.35
0.3
0.25
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time
0.4
0.35
p2
0.3
0.25
0.2
In the second experiment, we assume that the probability matrix P was optimally
found by the users using (3.20). Figures 3.7 and 3.8 show that the β values converge
to an equilibrium from which we can find the optimal value for α . Since these
dynamics are much slower compared to Smith dynamics on P, our assumption of
knowledge of optimal P for a slowly varying α becomes valid.
3 Evolutionary Games for Multiple Access Control 67
12
12
13
10
8
Value
0
0 2 4 6 8 10 12 14 16 18 20
Time
23
16
14
12
Value
10
0
0 2 4 6 8 10 12 14 16 18 20
Time
In the next experiment, we simulate the hybrid dynamics in (3.22). Let the
probability pi j of user i choosing transmitter j and the transmission rates be
initialized as follows:
0.2 0.3 0.5 0.2
P(0) = , α (0) = .
0.25 0.5 0.25 0.1
68 Q. Zhu et al.
0.4
Probability
0.35
0.3
0.25
0.2
0 2 4 6 8 10 12 14 16 18 20
Time
We let the parameter μ̄ = 0.9. Figure 3.9 shows the evolution of the weights of
user 1 on each of the receivers. The weights converge to p1 j = 1/3 for all j
within 2 s, leading to an unbiased choice among receivers. In Fig. 3.10, we show the
evolution of the weights of the second user on each receiver. At equilibrium, p2 =
[0.3484, 0.4847, 0.1669]T. It appears that user two favors the second transmitter over
the other ones. Since the utility ui j is of the same form, the optimal response set Ki∗
is naturally nonempty and contains all the receivers. As shown in Proposition 3.6,
the probability of choosing a receiver at equilibrium is randomized among the three
receivers and can be determined by the rates α chosen by the users.
The β -dynamics determines the evolution of α in (3.22). In Fig. 3.11, we see
that the evolutionary dynamics yield α = [15.87, 23.19]T at equilibrium. It is easy
to verify that they satisfy the capacity constraints outlined in Sect. 3.2. It converges
within 5 s and appears to be much slower than in Figs. 3.9 and 3.10. Hence, it can be
seen that P-dynamics may be seen as the inner-loop dynamics, whereas β -dynamics
can be seen as an outer-loop evolutionary dynamics. They evolve on two different
time scales. In addition, thanks to Proposition 3.8, finding the rest points for the
preceding dynamics ensures that we will find the equilibrium.
0.45 p21
p22
p23
0.4
Probability
0.35
0.3
0.25
0.2
0 2 4 6 8 10 12 14 16 18 20
Time
30
1
25
20
Rates
15
10
0
0 2 4 6 8 10 12 14 16 18 20
Time
Smith dynamics, and the replicator dynamics to study the stability of equilibria
in the long run. In addition, we introduced a hybrid multiple-access game model
and its corresponding evolutionary game-theoretic framework. We analyzed the NE
for the static game and suggested a system of evolutionary-game-dynamics-based
method to find it. It was found that the Smith dynamics for channel selections are
a lot faster than the β -dynamics, and the combined dynamics yield a rest point that
corresponds to the NE. An interesting extension that we leave for future research is
70 Q. Zhu et al.
References
1. Tembine, H., Altman, E., ElAzouzi, R., Hayel, Y.: Evolutionary games in wireless networks.
IEEE Trans. Syst. Man Cybern. B Cybern. 40(3), 634–646 (2010)
2. Tembine, H., Altman, E., El-Azouzi, R., Sandholm, W.H.: Evolutionary game dynamics with
migration for hybrid power control game in wireless communications. In: Proceedings of 47th
IEEE Conference on Decision and Control (CDC), Cancun, Mexico, pp. 4479–4484 (2008)
3. Zhu, Q.: A Lagrangian approach to constrained potential games: theory and example.
In: Proceedings of 47th IEEE Conference on Decision and Control (CDC), Cancun, Mexico,
pp. 2420–2425 (2008)
4. Saraydar, C.U., Mandayam, N.B., Goodman, D.J.: Efficient power control via pricing in
wireless data networks. IEEE Trans. Commun. 50(2), 291–303 (2002)
5. Belmega, E.V., Lasaulce, S., Debbah, M., Jungers, M., Dumont, J.: Power allocation games
in wireless networks of multi-antenna terminals. Springer Telecomm. Syst. J. 44(5–6),
1018–4864 (2010)
6. Palomar, D.P., Cioffi, J.M., Lagunas, M.A.: Uniform power allocation in MIMO channels: a
game theoretic approach. IEEE Trans. Inform. Theory 49(7), 1707–1727 (2003)
7. Vincent, T.L., Vincent, T.L.S.: Evolution and control system design. IEEE Control Syst. Mag.
20(5), 20–35 (2000)
8. Vincent, T.L., Brown, J.S.: Evolutionary Game Theory, Natural Selection, and Darwinian
Dynamics. Cambridge University Press, Cambridge (2005)
9. Zhu, Q., Tembine, H., Başar, T.: A constrained evolutionary Gaussian multiple access channel
game. In: Proceedings of the International Conference on Game Theory for Networks
(GameNets), Istanbul, Turkey, 13–15 May 2009
10. Zhu, Q., Tembine, H., Başar, T.: Evolutionary games for hybrid additive white Gaussian noise
multiple access control. In: Proceedings of GLOBECOM (2009)
11. Altman, E., El-Azouzi, R., Hayel, Y., Tembine, H.: Evolutionary power control games
in wireless networks. In: NETWORKING 2008 Ad Hoc and Sensor Networks, Wireless
Networks, Next Generation Internet, pp. 930–942. Springer, New York (2008)
12. Andelman, N., Feldman, M., Mansour, Y.: Strong price of anarchy, Games and Economic
Behavior, 65(2), 289–317 (2009)
13. Anshelevich, E., Dasgupta, A., Kleinberg, J., Tardos, E., Wexler, T., Roughgarden, T.:
The price of stability for network design with fair cost allocation. In: Proceedings of the FOCS,
pp. 59–73 (2004)
14. Tembine, H., Altman, E., El-Azouzi, R., Hayel, Y.: Multiple access game in ad-hoc networks.
In: Proceedings of 1st International Workshop on Game Theory in Communication Networks
(GameComm) (2007)
15. Forges, F.: Can sunspots replace a mediator? J. Math. Econ. 17, 347–368 (1988)
16. Forges, F.: Sunspot equilibrium as a game-theoretical solution concept. In: Barnett, W.A.,
Cornet, B., Aspremont, C., Gabszewicz, J.J., Mas-Colell, A. (eds.) Equilibrium Theory and
Applications: Proceedings of the 6th International Symposium in Economic Theory and
Econometrics, pp. 135–159. Cambridge University Press, Cambrdige (1991)
17. Forges, F., Peck, J.: Correlated equilibrium and sunspot equilibrium. Econ. Theory 5, 33–50
(1995)
3 Evolutionary Games for Multiple Access Control 71
18. Aumann R. J.: Acceptable points in general cooperative n-person games, Contributions to the
Theory of Games IV, Annals of Mathematics Study 40, Princeton University Press, 287–324
(1959)
19. Gajic, V., Rimoldi, B.: Game theoretic considerations for the Gaussian multiple access channel.
In: Proceedings of the IEEE International Symposium on Information Theory (ISIT) (2008)
20. Goodman, J.C.: A note on existence and uniqueness of equilibrium points for concave N-person
games. Econometrica 48(1), 251 (1980)
21. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge
University Press, Cambridge (1998)
22. Ponstein, J.: Existence of equilibrium points in non-product spaces. SIAM J. Appl. Math. 14(1),
181–190 (1966)
23. McGill, B.J., Brown, J.S.: Evolutionary game theory and adaptive dynamics of continuous
traits. Annu. Rev. Ecol. Evol. Syst. 38, 403–435 (2007)
24. Shaiju, A.J., Bernhard, P.: Evolutionarily robust strategies: two nontrivial examples and
a theorem. In: Proceedings of 13-th International Symposium on Dynamic Games and
Applications (ISDG) (2006)
25. Maynard Smith, J., Price, G.M.: The logic of animal conflict. Nature 246, 15–18 (1973)
26. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave N-person games.
Econometrica 33, 520–534 (1965)
27. Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Cambridge
(2010)
28. Takashi, U.: Correlated equilibrium and concave games. Int. J. Game Theory 37(1), 1–13
(2008)
29. Taylor, P.D., Jonker, L.: Evolutionarily stable strategies and game dynamics. Math. Biosci. 40,
145–156 (1978)
30. Tembine, H., Altman, E., El-Azouzi, R., Hayel, Y.: Evolutionary games with random number
of interacting players applied to access control. In: Proceedings of IEEE/ACM International
Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
(WiOpt), March 2008.
31. Tembine, H., Altman, E., El-Azouzi, R.: Delayed evolutionary game dynamics applied to the
medium access control. In: Proceedings of the 4th IEEE International Conference on Mobile
Ad-hoc and Sensor Systems (MASS) (2007)
32. Alpcan, T., Başar, T.: A hybrid noncooperative game model for wireless communications. In:
Proceedings of 11th International Symposium on Dynamic Games and Applications, Tucson,
AZ, December 2004
33. Alpcan, T., Başar, T., Srikant, R., Altman, E.: CDMA uplink power control as a noncooperative
game. Wireless Networks 8, 659–670 (2002)
34. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, SIAM Series in Classics in
Applied Mathematics, 2nd ed. SIAM, Philadelphia, (1999)
35. Dodis, Y., Halevi, S., Rabin, T.: A cryptographic solution to a game theoretic problem. In:
Annual International Cryptology Conference, Santa Barbara CA, USA. Lecture Notes in
Computer Science, vol. 1880, pp. 112–130 Springer-Verlag London, UK (2000)
36. Alpcan, T., Başar, T.: A hybrid system model for power control in multicell wireless data
networks. Perform. Eval. 57, 477–495 (2004)
37. Başar, T., Zhu, Q.: Prices of anarchy, information, and cooperation in differential games.
J. Dynamic Games Appl. 1(1), 50–73 (2011)
38. Roughgarden, T.: Selfish Routing and the Price of Anarchy. MIT Press, Cambridge (2005)
Chapter 4
Join Forces or Cheat: Evolutionary Analysis
of a Consumer–Resource System
4.1 Introduction
Among the many ecosystems found on Earth, one can easily identify many
examples of resource–consumer systems like e.g. plant–grazer, prey–predator or
host–parasitoid systems known in biology [13]. Usually, individuals involved in
such systems (bacteria, plants, insects, animals, etc.) have conflicting interests
and models describing such interactions are based on principles of game theory
[2, 7, 8, 16]. Hence, the investigation of such models is of interest to both game
theoreticians and behavioral and evolutionary biologists.
One of the main topics of evolutionary theory is addressing whether individuals
should behave rationally throughout their lifetime. Darwin’s statement of the
survival of the fittest indicates that evolution selects the best reproducers, so that
the evolutionary process should result in selecting organisms which appear to
behave rationally, even though they may know little about rationality. Evolutionary
processes may thus result in organisms which actually maximize their number
of descendants [19]; this is true in systems in which density dependence can be
neglected, or in which the relation between the organisms and their environment is
fairly simple [14]. Otherwise, such a rule may not apply and evolution is expected to
yield a population which employs an evolutionarily stable strategy; such a strategy
will not allow them to get the maximum possible number of descendants, but cannot
be beaten by any strategy a deviant organism may choose to follow [10, 21]. In the
following, since we will be concerned with populations in which some organisms
may deviate from the others, we will use the terminology from Adaptive Dynamics
[6] and designate by “mutants” the organisms adopting a strategy different from the
one of the main population, which will be referred to as the resident population.
In this work we study the fate of mutants based on an example of a seasonal
consumer–resource system with optimal consumers as introduced by [1] using a
semi-discrete approach [9]. In such a system, consumer and resource individuals
are active during seasons of fixed length T separated by winter periods. To give
an idea of what such a system could represent, the resource population could be
annual plants and the consumer population some univoltine phytophagous insect
species. All consumers and resources die at the end of the season and the size of
the next generation is determined by the number of offspring produced during the
previous season (i.e. offspring are made of seeds or eggs which mature into active
stages at the beginning of the season). We assume that consumers have to share their
time between foraging for resources, which increases their reproductive abilities, or
reproducing. The reproduction of the resource population is assumed to occur at a
constant rate.
In nature several patterns of life-history can be singled out, but they frequently
contain two main phases: growth phase and reproduction phase. The transition
between these two phases is said to be strict when the consumers only feed
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 75
at the beginning of their life and only reproduce at the end, or there could
exist an intermediate phase between them where growth and reproduction occur
simultaneously. Such types of behaviors are called determinate and indeterminate
growth patterns respectively [17]. Time-sharing between laying eggs and feeding
for the consumers will be modeled by the variable u: u = 1 means feeding, u = 0 on
the other hand means reproducing. Intermediate values u ∈ (0, 1) describe a situation
where, for some part of the time, the individual is feeding and, for the other part of
the time, it is reproducing.
Firstly, we consider a population of consumers maximizing their common fitness,
all consumers being individuals having the same goal function and acting for the
common good; these will be the residents. We then suppose that a small fraction
of the consumer population starts to behave differently from the main population,
and accordingly will call them mutants. The aim of this paper is to investigate
how mutants will behave in the environment shaped by the residents, and what
consequences can be expected for multi-season consumer–resource systems.
Let us first consider a system of two populations: resources and consumers without
any mutant. The consumer population is modeled with two state variables: the
average energy of one individual p and the number of consumers c present in
the system, while the resource population is described solely by its density n. We
suppose that both populations are structured in mature (adult insects/plants) and
immature stages (eggs/seeds). During the season, mature consumers and resources
interact and reproduce themselves. Between seasons (during winter periods) all
mature individuals die and immature individuals become mature in the next season.
We suppose that no consumers have any energy (p = 0) at the beginning of the
season. The efficiency of reproduction is assumed to be proportional to the value
of p; it is thus intuitive that consumers should feed on the resource at the beginning
and reproduce at the end once they have gathered enough energy. The consumers
thus face a trade-off between investing their time in feeding (u = 1) or laying eggs
(u = 0). According to [1], the within season dynamics are given by
where we assume that neither population suffers from intrinsic mortality; κ , η and
δ are constants. After rescaling the time and state variables, the constants κ and η
can be eliminated and the system of Eq. (4.1) can be rewritten in the simpler form:
x
1
u=0
0.5
Sσ
S
u=1
τ
0 T1 T
t = T− τ
Fig. 4.1 Optimal collective behavior of the residents illustrated in the (τ , x) plane [see Eq. (4.4)]
where τ is reverse time. On the figure, solutions are then initiated at (T, p(0)/n(0)) where T is the
length of the season
The amount of immature offspring produced during the season depends on the
sizes of the populations
T T
J= θ c(1 − u(t))p(t) dt, Jn = γ n(t) dt, (4.3)
0 0
These solutions are not restricted to the case where consumers have no energy at
the initial time. The region with u = 1 is separated from the region with u = 0 by a
switching curve S and a singular arc Sσ such that
S: x = 1 − e−τ (4.5)
2 4
Sσ : τ = − log x + − , (4.6)
xc c
where τ = T − t. They are shown in Fig. 4.1 by thick curves. Along the singular arc
Sσ the consumer uses intermediate control u = û:
2x
û = . (4.7)
2 + xc
When p(0) = 0, one might identify a bang-bang control pattern for short seasons
T ≤ T1 and a bang-singular-bang pattern for long seasons T > T1 . The value T1 is
computed as
log(c + 1) + (c − 2) log2
T1 = , (4.8)
c−1
Suppose that there is a subpopulation of consumers that deviate from the residents’
behavior. Let us assume that these are selfish and maximize their own fitness, and
not the fitness of the whole population, taking into account that the main resident
population acts as if the mutants were kin (i.e. residents do not understand that
mutants are selfish). This means that the residents adjust their strategy by changing
the control whenever its level is intermediate. Such adjustment is possible only when
some certain conditions are satisfied and mutant subpopulation is small enough (see
Sect. 4.3.2).
Denote the proportion of mutants in the whole population of consumers by ε and
the variables describing the state of the mutant and resident populations by symbols
with subscripts “m” and “r” respectively. Then the number of mutants and residents
will be cm = ε c and cr = (1 − ε )c and the dynamics of the system can written as
ṗr = −pr + nur, ṗm = −pm + num, ṅ = −nc [(1 − ε )ur + ε um ], (4.9)
78 A.R. Akhmetzhanov et al.
similarly to (4.2). The variable um ∈ [0, 1] defines the decision pattern of the mutants.
The control ur ∈ [0, 1] is the decision pattern of the residents and defined by the
solution of the optimal control problem (4.2)–(4.3).
The number of offspring in the next season is defined similarly to (4.3):
T T T
Jr = θ (1 − ur(t))cr pr (t) dt, Jm = θ (1 − um(t))cm pm (t) dt, Jn = γ n(t) dt,
0 0 0
(4.10)
where the mutant chooses its control um striving to maximize its fitness Jm .
We can see that the problem under consideration is described in terms of a two-
step optimal control problem (or a hierarchical differential game): in the first step
we define the optimal behavior of the residents (see Sect. 4.2.1), in the second step
we identify the optimal response of the mutants to this strategy.
Since θ and γ are constants, they can be omitted from the description of the
optimization problem Jm → max. In this case the functional Jm /(θ cm ) can be taken
um
instead of the functional Jm .
Let one introduce the Bellman function Ũm for the mutant population. It satisfies
the Hamilton–Jacobi–Bellman (HJB) equation
∂ Ũm ∂ Ũm ∂ Ũm
+ max (−pr + nur ) + (−pm + num)
∂t u m ∂ pr ∂ pm
∂ Ũm
− nc((1 − ε )ur + ε um ) + pm(1 − um) = 0. (4.11)
∂n
where the components of the gradient of the Bellman function are denoted by
∂ Um /∂ xr = λr , ∂ Um /∂ xm = λm and ∂ Um /∂ τ = ν , variable τ denotes backward
time, τ = T − t. The optimal control can be defined as um = Heav(Am ), where
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 79
We get the following equations for state and conjugate variables and for the Bellman
function: xr = xr (T ) eτ , xm = xm (T ) eτ , λr = 0, λm = 1 − e−τ , Um = xm (1 − e−τ ).
From this solution we can see that there could exist a switching surface Sm :
Sm : xm = 1 − e−(T−t) , (4.14)
such that Am = 0 on it, where the mutant changes its control. Equation (4.14)
is similar to (4.5). However, we should take into account the fact that there is
also a hypersurface Sr , where the resident changes its control from ur = 0 to
ur = 1 independently of the decision of the mutant. Hence it is important to define
which surface, Sr or Sm the characteristic intersects first, see Fig. 4.2. Suppose
that this is the surface Sr . Since the control ur changes its value on Sr , the HJB-
equation (4.12) also changes and, as a consequence, the conjugate variables ν , λr
and λm could possibly be discontinuous. Let us denote the incoming characteristic
field (in backward time) by “−” and the outcoming field by “+”. Consider a point
of intersection of the characteristic and the surface Sr with coordinates (xr1 , xm1 , τ1 ).
Thus xr1 = 1 − e−τ1 and the normal vector ϑ to the switching surface is written in
the form
1
xr
0.5
Sr
0
1
Sm 0.5 xm
l1
1.5
1
0.5
0 τ
Fig. 4.2 Some family of optimal trajectories emanating from the terminal surface
From the incoming field we have the following information about the co-state
λr− = 0, λm− = xr1 , ν − = xm1 (1 − xr1 ). Since the Bellman function is continuous
on the surface Sr , we have: Um+ = Um− = Um = xm1 xr1 . The gradient ∇Um has a jump
in the direction of the normal vector ϑ : ∇Um+ = ∇Um− + kϑ . Here k is an unknown
scalar. Thus
(1 − ε )xr1 c + (1 − xr1 )
A+ + +
m = λr xr1 ε c+ λm (xm1 ε c+1)− ε cUm −xm1 = (xr1 −xm1 ) ,
xr1 c + (1 − xr1 )
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 81
xr
1
0.5
0
σ 1
Sr
Sr
σ
S1
0.5 xm
Sm l1
1.5
1
0.5
τ
0
which is positive when xr1 > xm1 . In Fig. 4.2 this corresponds to the points of the
surface Sr which are below the line l1 : xr = xm = 1 − e−τ . For the optimal trajectories
which go through such points: ur (τ1 + δ ) = um (τ1 + δ ) = 1, where δ is arbitrarily
small. One can show that there will be no more switches of the control. However,
if we consider a trajectory going from a point above l1 , then ur (τ1 + δ ) = 1 and
um (τ1 + δ ) = 0; a switch of the control um from zero to one then takes place later
(in backward time). After that, there will be no more switches.
Now consider a trajectory emitted from the terminal surface which first intersects
the surface Sm rather than the surface Sr . In this case the situation depicted in
Fig. 4.3 takes place: one might expect the appearance of a singular arc S1σ there.
The following are necessary conditions for its existence
H = 0 = H0 + Am um , H 0 = − ν − λ xr − λ m xm + xm , (4.17)
Am = 0 = λr xr ε c + λm (xm ε c + 1) − ε cUm − xm , (4.18)
.
Am = {Am H0 } = 0 = Am1 , (4.19)
where the curly brackets denote the Poisson (Jacobi) brackets. If ξ is a vector of
state variables and ψ is a vector of conjugate ones (in our case ξ = (xr , xm , τ ) and
ψ = (λr , λm , ν )), then the Poisson brackets of two functions F = F(ξ , ψ ,Um ) and
G = G(ξ , ψ ,Um ) are given by the formula: {F G} = Fξ + ψ FUm , Gψ − Fψ , Gξ +
ψ GUm . Here ·, · denotes the scalar product and e.g. Fψ = ∂ F/∂ ψ .
After some algebra, (4.19) takes the form
We can derive the variable ν from (4.17) and substitute it into (4.20). We get
Am1 = xm − 1 + λm = 0. This leads to λm = 1 − xm and
xm + ε Um + (1 − xm)(xm ε c + 1)
λr = ,
xr ε c
According to the computations done in Sect. 4.2.1, resident consumers must adopt
a behavior ur which keeps the surface Srσ invariant (see Fig. 4.3). In a mutant-free
population, this is done by playing the singular control (4.7), but if mutants are
present in the population, the dynamics of the system are modified and the mutant-
free singular control (4.7) does not make Srσ invariant any more. However, residents
may still make Srσ invariant by adopting a different behavior, denoted ûr , as long
as the mutants’ influence, i.e. ε , is not too large. To compute ûr , we notice that it
should make xr follow the dynamics depicted in Fig. 4.1, i.e. ẋr = −xr (1 − cur) + ur
with ur = û defined in Eq. (4.7). We get that ûr should be computed from:
x2r c
xr = − = xr (1 − c((1 − ε )ûr + ε um )) − ûr ,
2 + xr c
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 83
so that
2xr (1 + xr c) xr ε cum
ûr = − . (4.23)
(1 + (1 − ε )xrc)(2 + xrc) 1 + (1 − ε )xrc
Thus, the residents will be able to keep Srσ invariant provided ûr ∈ [0, 1] for all points
belonging to Srσ and for all possible values of um ∈ [0, 1].
To identify for which parameters of the model this is possible, we may notice
that ûr is a linear function of um and decreasing. Moreover,
2xr (1 + xr c) 1 + xr c
ûr (um = 0) = ≤ 2xr ≤ 1,
(1 + (1 − ε )xrc)(2 + xr c) 2 + xr c
It is interesting to notice that ûr (um = 0) is larger than the original û in (4.7), since
the residents must compensate for the non-eating mutants. Conversely, when um = 1,
ûr < û. The tipping point takes place when um = û, which ensures ûr = um ; mutants
behaving like the original residents allow the residents to behave as such.
In this paper we consider only the values of ε satisfying (4.24), i.e. such that
the residents are able to adopt their optimal behavior, in spite of the presence of
mutants. Otherwise, the influence of the mutants on the system may be too large,
and the residents would not have the possibility to stick to their fitness maximization
program.
The control ûr = ûr (xr , xm , τ , um ) is defined in feedback form, i.e. it depends on
the time and on the state of the system. The corresponding Hamiltonian (4.12) needs
to be modified to
This expression allows us to compute the optimal behavior of the mutants on the
surface Srσ , but the calculations are quite complicated. To make things simpler, let
us first consider the particular case of vanishingly small values of ε and study the
optimal behavioral pattern.
84 A.R. Akhmetzhanov et al.
If ε ∼
= 0, the mutants’ influence on the system is negligible and, to make Srσ invariant,
the resident should apply the mutant-free singular behavior computed in (4.7): ûr =
2xr /(2 + xrc). In addition, Eqs. (4.25) and (4.26) take the following form
λr x2r c 2 − xr c 2xr c
Ĥ = −ν + + λm −xm + um − Um + xm (1 − um) (4.27)
2 + xr c 2 + xr c 2 + xr c
Âm = λm − xm . (4.28)
If the trajectory originates (in backward time) from some point belonging to Srσ
.
such that xσm = xm (τ = log 2) > 1/2, then um (τ = log 2) = 0 and the system of
characteristics for the Hamiltonian (4.27) is
x2r c 2 − xr c 2xr c
xr = − , x = xm , λ = −λm + 1 , Um = −Um + xm (4.29)
2 + xr c m 2 + xr c m 2 + xr c
with boundary conditions: τ = log 2, xr = 1/2, xm = xσm , λm = 1/2, Um = x2m /2. Thus
λm = 1 − e−τ and there exists a switching curve Ŝ, which is defined as: xm = 1 − e−τ
in addition to τ = − log xr + 2/(xr c) − 4/c. Thus Ŝ = Sm ∩ Srσ .
The switching curve Ŝ ends at the point with coordinates (xr2 , xm2 , τ2 ) where the
characteristics become tangent to it and the singular arc Ŝσ appears (see Fig. 4.4).
Before determining the coordinates of this point, let us define the singular arc,
denoted Ŝσ . From (4.27)–(4.28) we get
λr x2r c 2 − xr c 2xr c
ν= − λ m xm − Um + xm , λ m = xm (4.30)
2 + xr c 2 + xr c 2 + xr c
along the singular arc. Substitution of (4.30) into equation Âm = 0 gives
xm = (2 + xr c)/4.
In addition, the intermediate control ûm can be derived from Âm = 0 and is
equal to
1
ûm = ,
2 + xr c
which is positive and belongs to the interval between zero and one.
We see that the coordinates xr2 , xm2 and τ2 can be defined by the following
equations
2 + x r2 c 2 4
xm2 = = 1 − e − τ2 , τ2 = − log xr2 + − ,
4 x r2 c c
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 85
τ
2.5
2
1.5
1
0.5
Srσ
1 Sr
(xr2,xm2,τ2) σ
S
σ
S1
S
xm 0.5
Sm
1
0.5
xr
which comes from the fact that the point (xr2 , xm2 , τ2 ) belongs to Ŝσ and is located
on the intersection of the curves Ŝσ and Ŝ. This result is illustrated in Fig. 4.4.
If the state is outside the surface Srσ , things are a little easier since at least the
behavior of the residents, ur , is constant and equal to 0 or 1, depending on the
respective value of τ and xr .
We can actually show that the surface S1σ (where ur = 0) can be extended further
by considering the situation in Fig. 4.3. Indeed, the following conditions are fulfilled
for this region:
H
= −ν − λr xr − λmxm + xm = 0, Am = λm − xm = 0, Am = 0 .
ur =0
xr
1 0.5
σ
Sr
σ
S
1 Sr
σ
S1
xm 0.5
2.5
2
1.5
Sm
1
τ
0.5
H
= −ν − λr (xr (1 − c) − 1) − λmxm (1 − c) − zUm + xm = 0 (4.31)
ur =1
Am = λm − xm = 0, Am = 0, (4.32)
which give a possible candidate for a singular arc S2σ : xm = 1/(2 − c). We see that
its appearance is possible only for c < 1, since xm must belong to Sm . For c > 1
the structure of the solution in the domain below the surface Srσ is actually simpler
and consists only of the switching surface Sm , see Fig. 4.5. Notice that in the case
xr (0) = xm (0) = 0 investigated below, the existence of the singular arc S2σ is not
relevant, since it cannot be reached from such initial conditions.
Following [1], we assume that at the beginning of the season the energy of
consumers is zero: xr (0) = xm (0) = 0. Therefore, we should take into account only
the trajectories coming from these initial conditions. The phase space is reduced
in this case to the one shown in Fig. 4.6. One can see that there are three different
regions depending on the length of the season T . If it is short enough, i.e. T ≤ T1
(see Eq. (4.8)), then the behavior of the mutant coincides with the behavior of the
resident and the main population cannot be invaded: actually, the behavior of the
mutant coincides with the behavior of the residents. If the length of the season is
between T1 and T2 , there is a period in the life-time of a resident when it applies
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 87
3
τ
2
(ur , um) C
1 B
C
(ur, 0)
D
(ur , 1)
A
xm 0.5
B
(1,1)
T2
A
(0,0) (1,1)
T1
1
0.5
O
xr
Fig. 4.6 The reduced optimal pattern for trajectories satisfying the initial conditions xr (0) =
xm (0) = 0 with c = 3
an intermediate strategy and spares some amount of the resource for its future use.
Mutants are able to use this fact and there exists a strategy that guarantees them
better results.
Let us introduce the analogue of the value function Ũm for the resident and denote
it by Ũr :
T
Ũr (pr , pm , n,t) = pr (s)(1 − ur (s)) ds .
T −t
The value Ũr (0, 0, n(0), T ) represents the amount of eggs laid by the resident
during a season of length T . Its value depends on the state of the system and
the following transformation can be done Ũr (pr , pm , n,t) = nUr (xr , xm ,t). In the
following, we omit some parameters and write the value function in the simplified
.
form Ur (T ) = Ur (0, 0, T ) where the initial conditions xr (0) = xm (0) = 0 have been
taken into account.
In the region A (see Fig. 4.6) the value functions for both populations (of mutants
and residents) are equal to each other Um (T ) = Ur (T ) = x1 e−c(T −τ1 ) . Here the value
τ1 can be defined from the intersection of the trajectory and the switching curve
Sr ∩ Sm :
e(c−1)(T −τ1 ) − 1
1 − e − τ1 = .
c−1
88 A.R. Akhmetzhanov et al.
To obtain the value functions in the regions B and C, one must solve the
system of characteristics (4.29) in the case when the characteristics move along
the surface Srσ and um = 1. This leads to the following characteristic equations for
the Hamiltonian (4.27):
x2r c 2 − xr c x2r c
xr = − , xm = xm − 1, Um = −Um ,
2 + xr c 2 + xr c 2 + xr c
and consequently
where C1 and C2 are defined from the boundary conditions, while Eq. (4.6) is also
fulfilled.
Along the singular arc Ŝσ the mutant uses the intermediate strategy (4.21). In this
case,
2xr c 1 + xr c
Um = −Um cûr + xm (1 − ũm) = −Um + .
2 + xr c 4
2 (1+xr c)(2+xr c)
Since xr = − 2+x
xr c
rc
, we have dUm
dr = 2Um
xr − 4x2r c
. Thus
4 + 3xrc(3 + 2xrc)
Um = C3 x2r + , C3 = const. (4.34)
24xr c
We now undertake to compute the limiting season length T2 that separates the
region B from the region C. The coordinates of the point B were obtained in the
previous section. To define the coordinates of the point (xσr2 , xσm2 , τ2σ ) of intersection
of the optimal trajectory with the curve AD, we use the dynamics of motion along the
surface Srσ with ur = ûr and um = 1 (4.33): xm = C1 x2r eτ + xr z+ 1, where the constant
2+xr c
C1 should be chosen such that: xm2 = C1 x2r2 eτ2 + xr2 c + 1, xm2 = 4 2 = 1 − e−τ2 .
(xr2 c−2)(3xr2 c+2)
Therefore C1 = 16x2r
. After that the coordinates: xσr2 , xσm2 and τ2σ can be
2
defined from the following conditions
σ 2 4
xσm2 = xσ2 = C1 (xσr2 )2 eτ2 + xσr2 c + 1, τ2σ = − log xσr2 + − . (4.35)
xσr2 c c
The boundary value T2 can be obtained from T2 = τ2σ + log(xσr2 (c − 1) + 1)/(c − 1).
Now we compute the value functions Ur (T ) and Um (T ) for the region B (T1 <
T ≤ T2 ), where only the mutant uses bang-bang control. For the resident population
we have
1 − 2xr2
Ur (T ) = Ur2 e−c(T −τ2 ) , Ur2 = xr2 (1 − xr2 ) + , (4.36)
c
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 89
where the point with coordinates (xr2 , xr2 , τ2 ) defines the intersection of the
trajectory and surface Srσ :
2 4 e(c−1)(T −τ2 ) − 1
τ2 = − log xr2 + − , x r2 = . (4.37)
x r2 c c c−1
For the mutant population the value function Um in the region with u = û and um = 1
satisfies the equation resulting from (4.33):
(û,1)
Um = x2m1 (xr /xr1)2 , (4.38)
where (xr1 , xm1 , τ1 ) is the point of intersection of the trajectory with the curve
AB (see Fig. 4.6). Using (4.38) and notation from (4.37), we can write Um (T ) =
Um2 e−c(T −τ2 ) , Um2 = x2m1 (xr2 /xr1)2 , which is analogous to (4.36).
In the region C the value function for the resident has the same form as in (4.36),
but it has a different form for the mutant. Suppose that the optimal trajectory
intersects the surface Sσ at the point with coordinates (x̃r2 , x̃m2 , τ̃2 ). Then the
Bellman function at this point is given by
c2 4 + 3x̃r2 c 3x̃r c(2x̃r2 c + 3) + 4
Ũm2 = x̃2r2 − + 2 ,
16 24x̃3r2 c 24x̃r2 c
which is written using (4.34) with definition of the constant C3 from the given
boundary conditions.
When the optimal trajectory moving along the surface Sσ intersects the curve AD
at some point with coordinates (x̃σr2 , x̃σm2 , τ̃2σ ) (see Fig. 4.6), the Bellman function can
σ
be expressed as follows: Ũmσ2 = Ũm2 x̃σr2 /x̃r2 . Thus Um (T ) = Ũmσ2 e−c(T −τ2 ) .
The difference in the value functions (number of offspring per mature individual)
of themutant and optimally behaving resident is presented in Fig. 4.7. It is shown
that as soon as the season length is longer than T1 , residents may be out-competed
by selfish “free riding” mutants (see also Fig. 4.8). Otherwise the pay-off functions
of the mutants and residents are the same. Therefore, if the season length is shorter
than T1 , the optimal strategy of the resident is evolutionary stable in the sense that it
cannot be beaten by any other strategy [11]. Thus, in the present example, collective
optimal strategies of the bang-bang type are also evolutionary stable, while those of
the bang-singular-bang type may always be outcompeted by alternative strategies.
Whether such properties also hold in more general settings is an important topic of
future research.
90 A.R. Akhmetzhanov et al.
0.4
Um
Ur
U
0.2
0 T1 2 4 6
T
Fig. 4.7 Difference in the value functions of the resident and the mutant (c = 3)
x
xm
xr
um= um
um= 0
um= 1
0.5
ur = 0 ur = ur
ur= um = 1
τ
0 1 2 T 3
t = T− τ
In this section we consider the case of non-zero ε such that the condition (4.24)
remains fulfilled. This means that the trajectory intersecting the singular surface
Srσ does not cross it, but moves along it due to the residents who make it invariant
through the behavior ûr (4.23).
In this case, the phase space can also be divided into two regions: according to
whether xr is smaller or larger than on Srσ . In both of these regions the structure
of the solution has similar properties to the case considered above when ε is
arbitrarily small. On the surface Srσ the optimal behavior is also similar to that of the
previous case.
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 91
In the region with larger xr values than the ones on the surface Srσ , there is a
part of the switching surface Sm and a singular arc S1σ where the mutant uses an
intermediate strategy. The surface S1σ can be defined using the expression (4.22). In
the other region, we also have a part of Sm and a singular arc S2σ which is different
from S1σ and may not exist for some values of the parameters c and ε .
To identify the values for which the surface S2σ is a
part of the solution, let
us write the necessary conditions as in (4.31)–(4.32): H
ur =1 = 0, Am = 0, Am
= {Am H } = 0. Using these equations, we are able to obtain the values of λr ,
λm and ν on the surface S2σ and substitute them into the second derivative Am =
{{Am H }H } = 0 to derive the expression for the singular control applied by the
mutant on this surface:
There are several conditions which must be satisfied. First of all, the control (4.39)
should be between zero and one
∂ d2 ∂ H
= {Am {Am H }} ≤ 0 .
∂ um dt 2 ∂ um
2 − (1 − ε ) + xmε c ≥ 0 . (4.41)
Srσ
S2σ
Sr
1
Sσ
S1σ S
0.5 xm
3
2.5
2
0
1.5
1
τ Sm
0.5
0.5 xr
1
Fig. 4.9 Structure of the optimal behavioral pattern for c = 1.25 and ε = 0.35
Model (4.2) was introduced in [1] as the intra-seasonal part of a more complex
multi-seasonal model of population dynamics in which consumers and resources
live for one season only. It was assumed that the (immature) offspring produced
by the consumers and resources in season i and defined by the system of Eq. (4.3),
mature during the inter-season to form the initial consumer and resource populations
of season (i + 1), up to some overwintering mortality. The consumer and resource
population densities at the beginning of season i + 1 is thus ci+1 = μc Ji , ni+1
(t = 0) = μn Jn,i , with Ji and Jn,i defined in (4.3) (μn , μc < 1 allow for overwintering
mortality).
In the presence of a mutant invasion, things differ slightly as the total consumer
population is structured into cri = (1 − εi )ci residents and cmi = εi ci mutants that
have different reproduction strategies. Assuming that reproduction is asexual and
an offspring simply inherits the strategy of their parent, the inter-seasonal dynamics
are as follows: cri+1 = α Ũr (ci , εi , ni , T ) = (1 − i+1 )ci+1 , cmi+1 = α Ũm (ci , εi , ni , T ) =
i+1 ci+1 and ni+1 = β Ṽ (ci , εi , ni , T ), where α = μc θ , β = μn γ , and the functions
Ũr = (1 − εi )ci 0T (1 − ur (t))pr (t) dt, Ũm = εi ci 0T (1 − um (t))pm (t) dt, Ṽ = 0T n(t) dt
can be computed from the solution of the optimal control problem (4.10) with the
dynamics given by (4.9). As stated earlier, the energies of both the mutants and
residents are zero at the beginning of each season (pr (0) = pm (0) = 0). For the
particular case ε = 0, the values Ũr and Ũm were derived analytically in Sect. 4.3.4,
but these are not useful in a multi-season study where the frequency of mutants is
bound to evolve. In the following, we therefore resorted to a numerical investigation,
in order to decipher the long-term fate of the mutants’ invasion.
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 93
Density
1.2
Resource (n)
Consumers (c)
0.8
mutants (cm=ε c)
0.4
residents (cr=(1−ε) c)
Season
0 50 100 150 200 250
Here, we follow an adaptive dynamics type approach and assume that, among
all possible behaviors [1], the resident consumer and the resource population are at
a (globally stable) equilibrium. We investigate what happens when a small fraction
of mutants appear in the resident consumer population. We actually assume that
resident consumers are “naive” in the sense that even if the mutant population
becomes large through the season-to-season reproduction process, the resident
consumers keep their collective optimal strategy and treat mutants as cooperators,
even if they do not cooperate.
The case that we investigated is characterized by α = 2, β = 0.5 and T = 4.
Initially, the system is near the all-residents long-term stable equilibrium point c =
0.9055 and n = 1.0848. At the beginning of some season, a mutant population of
small size cm = 0.001 then appears (ε ≈ 1.1 10−3 < 1/c). We see in Fig. 4.10 that
the mutant population increases its frequency within the consumer population and
modifies the dynamics of the system. Despite this drastic increase, it is however
noteworthy to underline that ci < 1 in all seasons, so that ε < 1/ci is true and the
analysis presented in this paper is valid for all seasons.
The naive behavior of the consumers is detrimental to their progeny: as the
seasons pass, mutant consumers progressively take the place of the collectively
optimal residents and even replace them in the long run (Fig. 4.10), making the
mutation successful. We should however point out that the mutants’ strategy, as
described in (4.10), is also a kind of “collective” optimum: in some sense, it is
assumed that mutants cooperate with other mutants. If the course of evolution
drives the resident population to 0 and only mutants survive in the long run, this
means that the former mutants become the new residents, with exactly the same
strategy as the one of the former residents they replaced. Hence, they are also prone
to being invaded by non-cooperating mutants. The evolutionary dynamics of this
94 A.R. Akhmetzhanov et al.
Acknowledgements This research has been supported by grants from the Agropolis Foundation
and RNSC (project ModPEA, covenant support number 0902-013), and from INRA (call for
proposal “Gestion durable des résistances aux bio-agresseurs”, project Metacarpe, contract number
394576). A.R.A. was supported by a Post-Doctoral grant from INRIA Sophia Antipolis –
Méditerrannée and by the grant of the Russian Federal Agency on Education, program 1.2.1,
contract P938.
References
1. Akhmetzhanov, A.R., Grognard, F., Mailleret, L.: Optimal life-history strategies in seasonal
consumer–resource dynamics. Evolution 65(11), 3113–3125 (2011). doi:10.1111/j.1558-
5646.2011.01381.x
2. Auger, P., Kooi, B.W., de la Parra, R.B., Poggiale, J.-C.: Bifurcation analysis of a predator-prey
model with predators using hawk and dove tactics. J. Theor. Biol. 238(3), 597–607 (2011).
doi:10.1016/j.jtbi.2005.06.012
3. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
4. Carathéodory, C.: Calculus of variations and partial equations of the first order. Holden-Day,
San Francisco (1965)
5. Carroll, L.: Through the Looking-Glass, and What Alice Found There. MacMillan and Co.,
London (1871)
6. Dercole, F., Rinaldi, S.: Analysis of Evolutionary Processes: The Adaptive Dynamics
Approach and its Applications. Princeton University Press, Princeton (2008)
7. Hamelin, F., Bernhard, P., Wajnberg, É.: Superparasitism as a differential game. Theor. Popul.
Biol. 72(3), 366–378 (2007). doi:10.1016/j.tpb.2007.07.005
8. Houston, A., Székely, T., McNamara, J.: Conflict between parents over care. Trends Ecol. Evol.
20(1), 33–38 (2005). doi:10.1016/j.tree.2004.10.008
9. Mailleret, L., Lemesle, V.: A note on semi-discrete modelling in life sciences. Phil. Trans.
R. Soc. A 367(1908), 4779–4799 (2009). doi:10.1098/rsta.2009.0153
10. Maynard Smith, J.: Evolution and the Theory of Games. Cambridge University Press,
Cambridge (1982)
11. Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246(5427), 15–18 (1973).
doi:10.1038/246015a0
12. Melikyan, A.A.: Generalized Characteristics of First Order PDEs: Applications in Optimal
Control and Differential Games. Birkhäuser, Boston (1998)
13. Murray, J.D.: Mathematical Biology. Springer, Berlin (1989)
14. Mylius, S.D., Diekmann, O.: On evolutionarily stable life histories, optimization and the need
to be specific about density dependence. Oikos 74(2), 218–224 (1995)
15. Noether, E.: Invariante Variationsprobleme. Nachrichten der Königlichen Gesellschaft der
Wissenschaften zu Göttingen. Math.-phys. Klasse 235–257 (1918)
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 95
16. Perrin, N., Mazalov, V.: Local competition, inbreeding, and the evolution of sex-biased
dispersal. Am. Nat. 155(1), 116–127 (2000). doi:10.1086/303296
17. Perrin, N., Sibly, R.M.: Dynamic-models of energy allocation and investment. Annu. Rev. Ecol.
Syst. 24, 379–410 (1993)
18. Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mishchenko, E.F.: The Mathematical
Theory of Optimal Processes. Wiley, New York (1962)
19. Schaffer, W.M.: The application of optimal control theory to the general life history problem.
Am. Nat. 121, 418–431 (1983)
20. Van Valen, L.: A New evolutionary law. Evol. Theory 1, 1–30 (1973)
21. Vincent, T.L., Brown, J.S.: Evolutionary Game Theory, Natural Selection and Darwinian
Dynamics. Cambridge University Press, Cambridge (2005)
Chapter 5
Strong Strategic Support of Cooperative
Solutions in Differential Games
5.1 Introduction
Like the analysis in [9], in this paper the problem of strategic support of cooperation
in a differential m-person game with prescribed duration T and independent motions
is considered. Based on the initial differential game, a new associated differential
game (CD game) is designed. In addition to the initial game, it models players’
actions in connection with the transition from a strategic form of the game to a
cooperative one with the principle of optimality chosen in advance. The model
makes it possible to refuse cooperation at any time instant t for any coalition of
players. As the cooperative principle of optimality, the Shapley value operator is
considered. Under certain assumptions, it is shown that the components of the
Shapley value along any admissible trajectory are absolutely continuous functions
of time. In the foundation of the CD-game construction lies the so-called imputation
distribution procedure described in [9] (see also [1]). The theorem established
by the authors states that if at each time instant along the conditionally optimal
(cooperative) trajectory future payments to each coalition of players according to
the imputation distribution procedure exceed the maximal guaranteed value that this
coalition can achieve in the CD game, then there exists a strong Nash equilibrium
in the class of recursive strategies first introduced in [2]. In other words, the
aforementioned equilibrium exists if in any subgame along the conditionally optimal
trajectory the Shapley value belongs to its core. The proof of this theorem uses
results and methods published in [2, 3]. The proved theorem is also true for other
value operators possessing the property of absolute continuity along admissible
trajectories of the differential game under consideration. The motions of players
in the game are independent. Thus the motion equations have the form
dx(i)
= f (i) (t, x(i) , u(i) ), i ∈ I = [1; m], (5.1)
dt
x(i) ∈ Rn(i) , u(i) ∈ P(i) ∈ CompRk(i)
(i)
x(i) (t0 ) = x0 , i ∈ I. (5.2)
Here u(·) = (u(1) (·), . . . , u(m) (·)) is a given m-vector of open-loop controls:
x(t,t0 , x0 , u(·)) = x(1) (t,t0 , x0 , u(1) (·)), . . . , x(m) (t,t0 , x0 , u(m) (·)) ,
(i)
where x(i) (·) = x(·,t0 , x0 , u(i) (·)) is the solution of the Cauchy problem for the ith
subsystem of (5.1) with corresponding initial conditions (5.2) and admissible open-
loop control u(i) (·) of player i.
Admissible open-loop controls of players i ∈ I are Lebesgue measurable open-
loop controls
u(i) (·) : t
→ u(i) (t) ∈ Rk(i)
such that
u(i) (t) ∈ P(i) for all t ∈ [t0 , T ].
It is supposed that all of the functions
are continuous, locally Lipschitz with respect to x(i) , and satisfy the condition
∃λ (i) > 0 such that
|| f (i) (t, x(i) , u(i) )|| ≤ λ (i) (1 + ||x(i)||) ∀x(i) ∈ Rk(i) , ∀u(i) ∈ P(i) .
is also continuous.
It is supposed that at each time instant t ∈ [t0 , T ], the players have information
about the trajectory (solution) x(i) (τ ) = x(τ ,t0 , x0 , u(i) (·)) of the system (5.1), (5.2)
on the time interval [t0 ,t] and use recursive strategies [1, 2].
Recursive strategies were first introduced in [1] to justify the dynamic programming
approach in zero-sum differential games, known as the method of open-loop
iterations in nonregular differential games with a nonsmooth value function. The
ε -optimal strategies constructed with the use of this method are universal in the
sense that they remain ε -optimal in any subgame of the previously defined differen-
tial game (for every ε > 0). Exploiting this property it became possible to prove the
existence of ε -equilibrium (Nash equilibrium) in non-zero-sum differential games
(for every ε > 0) using the so-called “punishment strategies” [4].
The basic idea is that when one of the players deviates from the conditionally
optimal trajectory, other players after some small time delay start to play against the
deviating player. As a result, the deviating player is not able to obtain much more
than he could have gotten using the conditionally optimal trajectory. Punishment of
the deviating player at each time instant using the same strategy is possible because
of the universal character of ε -optimal strategies in zero-sum differential games.
In this paper the same approach is used to verify the stability of cooperative
agreements in the game Γ (t0 , x0 ) and, as in the aforementioned case, the principal
argument is the universal character of ε -optimal recursive strategies in specially
defined zero-sum games ΓS (t0 , x0 ), S ⊂ I, associated with the non-zero-sum game
Γ (t0 , x0 ).
The recursive strategies lie somewhere in between piecewise open-loop strategies
[6] and ε -strategies introduced by Pshenichny [10]. The difference from piecewise
open-loop strategies consists in the fact that, as in the case of Pshenichny’s
ε -strategies, the moments of correction of open-loop controls are not prescribed
from the beginning of the game but are defined during the course of the game. At the
same time, they differ from Pshenichny’s ε -strategies in the fact that the formation
of open-loop controls occurs in a finite number of steps.
102 S. Chistyakov and L. Petrosyan
(n)
The recursive strategies Ui of player i with the maximal number of control
corrections n is a procedure for an admissible open-loop formation by player i in
the game Γ (t0 , x0 ), (t0 , x0 ) ∈ D.
(n)
At the beginning of the game Γ (t0 , x0 ), player i, using the recursive strategy Ui ,
(i)
defines the first correction instant t1 ∈ (t0 , T ] and his admissible open-loop control
(i) (i)
u(i) = u(i) (t) on the time interval [t0 ,t1 ]. Then, if t1 < T , possessing information
(i)
about the state of the game at time instant t1 , he chooses the next moment of
(i)
correction t2 and his admissible open-loop control u(i) = u(i) (t) on the time interval
(i) (i)
(t1 ,t2 ] and so on. Then whether the admissible control on the time interval [t0 , T ]
is formed at the kth step (k ≤ n − 1) or at step n, player i will end up with the process
(i)
by choosing at time instant tn−1 his admissible control on the remaining time interval
(i)
(tn−1 , T ].
For each given state (t∗ , x∗ ) ∈ D and nonvoid coalition S ⊂ I consider the zero-sum
differential game ΓS (t∗ , x∗ ) between coalition S and I\S with the same dynamics as
in Γ (t∗ , x∗ ) and the payoff of coalition S equal to the sum of payoffs of the players
i ∈ S in the game Γ (t∗ , x∗ ):
T
∑ Ht∗ x∗ (u(S) (·), u(I\S) (·)) = ∑ Ht∗ x∗ (u(·)) = ∑
(i) (i)
h(i) (t, x(t), u(t)) dt,
i∈S i∈S i∈S t0
where
u(S) (·) = {u(i) (·)}i∈S ,
u(I\S) (·) = {u( j) (·)} j∈I\S ,
valΓS (t∗ , x∗ ).
and call it the conditionally optimal cooperative trajectory. This trajectory may not
necessarily be unique. Then on the set D the mapping
I
v(·) : D → R2
is defined with coordinate functions
vS (·) : D → R, S ⊂ I,
vS (t∗ , x∗ ) = valΓS (t∗ , x∗ ).
the functions
ϕS : [t0 , T ] → R, S ⊂ I, ϕS (t) = vS (t, x(t))
the functions
We shall connect the realization of the single-valued solution of the game Γ (t0 , x0 )
with the known imputation distribution procedure (IDP) [7, 8].
By the IDP of the solution M(t0 , x0 ) of the game Γ (t0 , x0 ) along the conditionally
optimal trajectory x0 (·) we understand function
β (t) = (β1 (t), . . . , βm (t)), t ∈ [t0 , T ], (5.4)
satisfying
T
M(t0 , x0 ) = β (t) dt (5.5)
t0
and
T
β (t) dt ∈ E(t, x0 (t)) ∀t ∈ [t0 , T ], (5.6)
t
where E(t, x0 (t)) is the set of imputations in the game (I, v(t, x0 (t))).
The IDP β (t), t ∈ [t0 , T ] of the solution M(t0 , x0 ) of the game Γ (t0 , x0 ) is called
dynamically stable (time-consistent) along the conditionally optimal trajectory
x0 (·) if
5 Strong Strategic Support of Cooperative Solutions in Differential Games 105
T
β (t) dt = M(t, x0 (t)), ∀t ∈ [t0 , T ]. (5.7)
t
The solution M(t0 , x0 ) of the game Γ (t0 , x0 ) is dynamically stable (time-
consistent) if along at least one conditionally optimal trajectory the dynamically
stable IDP exists.
Using the corollary from Theorem 5.1 we have the following result.
Theorem 5.2. For any conditionally optimal trajectory x0 (·) the following IDP of
the solution Sh(t0 , x0 ) of the game Γ (t0 , x0 )
d
β (t) = − Sh(t, x0 (t)), t ∈ [t0 , T ], (5.8)
dt
is the dynamically stable IDP along this trajectory. Therefore, the solution Sh(t0, x0 )
of the game Γ (t0 , x0 ) is dynamically stable.
If in the game a cooperative agreement is reached and each player receives his
payoff according to the IDP (5.8), then it is natural to suppose that those who violate
this agreement are to be punished. The effectiveness of the punishment (sanctions)
comes to question of the existence of the strong Nash equilibrium in the differential
game Γ Sh (t0 , x0 ), which differs from Γ (t0 , x0 ) only by player payoffs.
The payoff of player i in Γ Sh (t0 , x0 ) is equal to
t(u(·)) T
(Sh, i) d
Ht0 ,x0 (u(·)) = − Shi (t, x0 (t)) dt + h(i) (t, x(t,t0 , x0 , u(·))) dt,
t0 dt t(u(·))
In this paper we use the following definition of the strong Nash equilibrium.
Definition 5.1. Let γ = I, {Xi }i∈I , {Ki }i∈I be an m-person game in normal form;
here I = [1 : m] is the set of players, Xi the set of strategies of player i, and
Ki : X = X1 × X2 × · · · × Xm → R
the payoff function of player i. We shall say that in the game γ there exists a strong
Nash equilibrium if
∀ε > 0 ∃xε = (xε1 , xε2 , . . . , xεm ) ∈ X
106 S. Chistyakov and L. Petrosyan
such that
∀S ⊂ I, ∀xS ∈ XS = ∏ Xi ,
i∈S
where
xεI\S = {xεj } j∈I\S (xεI\S ∈ XI\S ).
holds, then in the game Γ Sh (t0 , x0 ) there exists a strong Nash equilibrium.
The idea of the proof is as follows. Condition (5.9) can be rewritten in the form
This means that at each time instant t ∈ [t0 , T ], moving along the conditionally
optimal trajectory x0 (·), no coalition can guarantee itself a payoff [t, T ] more than
according to IDP (5.8), i.e., more than
T T
d
∑ β (τ ) dτ = − ∑
dt
Sh(τ , x0 (τ )) dτ = ∑ Shi (t, x0 (t));
i∈S t i∈S t i∈S
at the same time, on the time interval [t0 ,t], according to the IDP, the coalition
already received a payoff equal to
t t
d
∑ βi (τ ) dτ = − ∑
dt
Shi (τ , x0 (τ )) dτ
i∈S t0 i∈S t0
i.e., more than Sh(t0, x0 ). According to the cooperative solution x0 (·) but moving
always in the game Γ Sh (t0 , x0 ) along the conditionally optimal trajectory, each
5 Strong Strategic Support of Cooperative Solutions in Differential Games 107
coalition will receive its payoff according to the Shapley value. Thus no coalition
can benefit from the deviation from the conditionally optimal trajectory, which in
this case is natural to call a “strongly equilibrium trajectory.”
5.6 Conclusion
Let us conclude with some remarks about the limits of our approach. The main
condition that guarantees a strong strategic support of the Shapley value in the
m-person differential game under consideration is the fact that the Shapley value
belongs to the core of any subgame along a cooperative trajectory. This can
be guaranteed only if the cores are not void and the characteristic functions in the
subgames are convex. At the same time, one can easily verify that if instead of
the Shapley value any fixed imputation from the core is taken as the optimality
principle, then for strong strategic support of this imputation the principle condition
is that the cores in subgames along the cooperative trajectory will be nonempty.
In addition, strategic support of the cooperation proposed here based on the
notion of a strong Nash equilibrium is coalition proof in the sense that no coalition
can force its members to deviate from the cooperative trajectory because in any
deviating coalition there will be at least one player who is not interested in the
deviation.
References
1. Chistyakov, S.V.: To the solution of game problem of pursuit. Prikl. Math. i Mech. 41(5),
825–832 (1977) (in Russian)
2. Chistyakov, S.V.: Operatory znacheniya antagonisticheskikx igr (Value Operators in Two-
Person Zero-Sum Differential Games). St. Petersburg Univ., St. Petersburg (1999)
3. Chentsov, A.G.: On a game problem of converging at a given instant of time. Math. USSR
Sbornic 28(3), 353–376 (1976)
4. Chistyakov, S.V.: O beskoalizionnikx differenzial’nikx igrakx (On coalition-free differential
games). Dokl. Akad. Nauk. 259(5), 1052–1055 (1981). English transl. in Soviet Math. Dokl.
24(1), 166–169 (1981)
5. Fridman, A.: Differential Games. Wiley, New York (1971)
6. Petrosjan, L.A.: Differential Games of Pursuit. World Scientific, Singapore (1993)
7. Petrosjan, L.A.: The shapley value for differential games. In: Olsder, G.J. (ed.) Annals of the
International Society of Dynamic Games, vol. 3, pp. 409–417, Birkhauser (1995)
8. Petrosjan, L.A., Danilov, N.N.: Stability of solutions in nonzero-sum differential games with
integral payoffs, pp. 52–59. Viestnik Leningrad University, no. 1 (1979)
9. Petrosjan, L.A., Zenkevich, N.A.: Principles of stable cooperation. Math. Games Theory Appl.
1(1), 102–117 (2009) (in Russian)
10. Pschenichny, B.N.: E-strategies in differential games. In: Topics in Differential Games,
pp. 45–56. North-Holland Pub. Co. New York (1973)
Chapter 6
Characterization of Feedback Nash Equilibrium
for Differential Games
Yurii Averboukh
6.1 Introduction
Y. Averboukh ()
Institute of Mathematics and Mechanics UrB RAS, S. Kovalevskaya Street 16,
GSP-384, Ekaterinburg 620990, Russia
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 109
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 6,
© Springer Science+Business Media New York 2012
110 Y. Averboukh
constructed solutions [4]. Bressan and Shen investigated the Nash equilibrium using
a hyperbolic system of conservation laws [2, 3]. An approach based on singular
surfaces was considered by Olsder [13].
In this paper, we develop the approach of Kononenko [11], Kleimenov [10], and
Chistyakov [8]. The main result is the characterization of the set of Nash equilibrium
payoffs in terms of nonsmooth analysis. In addition we obtain the sufficient
conditions for a pair of continuous functions to provide a Nash equilibrium. This
result generalizes the method of the systems of Hamilton–Jacobi equations.
Here u and v are the controls of Players I and II, respectively. Payoffs are terminal.
Player I wants to maximize σ1 (x(ϑ0 )), whereas Player II wants to maximize
σ2 (x(ϑ0 )). We assume that sets P and Q are compacts, function f , σ1 , and σ2 are
continuous, and f is Lipschitz continuous with respect to the phase variable and
satisfies the sublinear growth condition with respect to x.
We use the control design suggested in [10]. This control design follows the
Krasovskii discontinuous feedback formalization. A feedback strategy of Player I
is a pair of functions U = (u(t, x, ε ), β1 (ε )). Here u(t, x, ε ) is a function of position
(t, x) ∈ [t0 , ϑ0 ] × Rn and the precision parameter ε , β1 (ε ) is a continuous function
of the precision parameter. We suppose that β1 (ε ) → 0, ε → 0. Analogously, a
feedback strategy of Player II is a pair V = (v(t, x, ε ), β2 (ε )).
Let a position (t∗ , x∗ ) be chosen. The step-by-step motion is defined in the
following way. We suppose that the ith player chooses his own precision parameter
εi . Let Player I choose the partition of the interval [t∗ , ϑ0 ] Δ1 = {τ j }rj=0 . Assume
that the mesh of the partition Δ is less than ε1 . Suppose that Player II chooses the
partition Δ2 = {ξk }νk=1 of the mesh to be less than ε2 . The solution x[·] of Eq. (6.1)
with initial date x[t∗ ] = x∗ such that the control of Player I is equal to u(τ j , x[τ j ], ε1 )
on [τ j , τ j+1 ), and the control of Player II is equal to v(ξk , x[ξk ], ε2 ) on [ξk , ξk+1 )
is called a step-by-step motion. Denote it by x[·,t∗ , x∗ ;U, ε1 , Δ1 ;V, ε2 ; Δ2 ]. The set
of all step-by-step motions from the position (t∗ , x∗ ) under strategies U and V and
precision parameters ε1 and ε2 is denoted by X(t∗ , x∗ ;U, ε1 ;V, ε2 ). The step-by-step
motion is called consistent if ε1 = ε2 .
A limit of step-by-motions x[·,t k , xk ;U, ε1k , Δ1k ;V ε2k , Δ2k ] is called a constructive
motion if t k → t∗ , xk → x∗ , ε1k → 0, ε2k → 0, as k → ∞. Denote by X(t∗ , x∗ ;U,V ) the
set of constructive motions. By the Arzela–Ascoli theorem, the set of constructive
motions in nonempty. A consistent constructive motion is a limit of step-by-step
motions x[·,t k , xk ;U, ε k , Δ1k ;V, ε k , Δ2k ] such that t k → t∗ , xk → x∗ , ε1k → 0, ε2k → 0,
as k → ∞. Denote the set of all consistent constructive motions by X c (t∗ , x∗ ;U,V ).
This set is also nonempty.
6 Characterization of Feedback Nash Equilibrium for Differential Games 111
The payoff (σ1 (x[ϑ0 ]), σ2 (x[ϑ0 ])) determined by a Nash equilibrium solution is
called a Nash equilibrium payoff of a game. In the typical case, there are many Nash
equilibria with different payoffs. The set of all Nash equilibrium payoffs is called a
Nash value of a game and is denoted by N(t∗ , x∗ ). One can consider a multivalued
map taking (t∗ , x∗ ) to N(t∗ , x∗ ).
The set N(t∗ , x∗ ) is nonempty under the Isaacs condition [10, 11]. The proof is
based on the punishment strategy technique. If the Isaacs condition is not fulfilled,
then the Nash equilibrium solution exists in the class of mixed strategies or in the
class of the pair counterstrategy/strategy [10].
Below we suppose that the Isaacs condition holds: for all t ∈ [t0 , ϑ0 ], x, s ∈ Rn
min maxs, f (t, x, u, v) = max mins, f (t, x, u, v).
u∈P v∈Q v∈Q u∈P
Remark 6.1. If the Isaacs condition does not hold, then one can consider the
solution in the class of mixed strategies. To this end, we consider the doubly
controlled system
ẋ = f (t, x, u, v)ν (dv)μ (du), t ∈ [t0 , ϑ ], x ∈ Rn , μ ∈ rpm(P), ν ∈ rpm(Q).
P Q
(6.2)
σ2 (x(ϑ0 )), whereas Player I want to minimize it. Denote the value function of this
game by ω2 .
Consider the initial value problem
T (t, x) T α (t, x) .
α ∈I
The multivalued map T ∗ is closed, has compact images, and satisfies conditions
(N1)–(N3). By T + denote the closure of the pointwise union of all upper semicon-
tinuous multivalued maps from [t0 , ϑ0 ] × Rn to R2 satisfying conditions (N1)–(N3).
It follows [10] that T + (t, x) = N(t, x) for all (t, x) ∈ [t0 , ϑ0 ] × Rn.
Condition (N1) is a boundary condition, and condition (N2) is connected with
the theory of zero-sum differential games. Further, we formulate condition (N3) in
terms of viability theory and obtain the infinitesimal form of this condition.
Theorem 6.1. Let the map T : [t0 , ϑ0 ] × Rn → P(R2 ) be closed. Then condi-
tion (N3) is equivalent to the following one: for all (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn , (J1 , J2 ) ∈
T (t∗ , x∗ ) there exist θ > t∗ and y(·) ∈ Sol(t∗ , x∗ ) such that
6 Characterization of Feedback Nash Equilibrium for Differential Games 113
dist[(J1 , J2 ), T (t + δ , x + δ w )]
DH T (t, x; (J1 , J2 ), w) lim inf .
δ ↓0,w →w δ
Theorem 6.2. Let T : [t0 , ϑ0 ] × Rn → P(R2 ) be closed. Then condition (N3) at the
position (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn is equivalent to the following one:
Then for all (t, x) ∈ [t0 , ϑ0 ] × Rn the couple (c1 (t, x), c2 (t, x)) is a Nash equilibrium
payoff of the game.
Corollary 6.1 follows from the definition of modulus derivative and the
property of the upper solution of equation (6.4) [14]: ωi (t, x) ≤ ci (t, x) for all
(t, x) ∈ [t0 , ϑ0 ] × Rn.
Let us show that the proposed method is a generalization of the method based on
the system of Hamilton–Jacobi equations. This method provides a Nash solution in
the class of continuous strategies [1].
Proposition 6.2. Let the function (ϕ1 , ϕ2 ) : [t0 , ϑ0 ]×Rn → R2 be differentiable, and
(ϕ1 (ϑ0 , ·), ϕ2 (ϑ0 , ·)) = (σ1 (·), σ2 (·)). Suppose that the function (ϕ1 , ϕ2 ) satisfies the
following condition: for all positions (t, x) ∈ [t0 , ϑ0 ] × Rn there exist un ∈ P, vn ∈ Q
such that
max ∇ϕ1 (t, x), f (t, x, u, vn ) = ∇ϕ1 (t, x), f (t, x, un , vn ) , (6.5)
u∈P
max ∇ϕ2 (t, x), f (t, x, un , v) = ∇ϕ2 (t, x), f (t, x, un , vn ) (6.6)
v∈Q
∂ ϕi (t, x)
+ ∇ϕi (t, x), f (t, x, un , vn ) = 0, i = 1, 2. (6.7)
∂t
In this case, condition (6.7) is equal to the following one: (ϕ1 , ϕ2 ) is a solution of
the system
∂ ϕi
+ Hi(t, x, ∇ϕ1 , ∇ϕ2 ) = 0, i = 1, 2.
∂t
6.3 Example
t ∈ [0, 1], u, v ∈ [−1, 1]. Payoffs are determined by the formulas σ1 (x, y) −|x − y|,
σ2 (x, y) y. We recall that each player wants to maximize his payoff.
To determine the multivalued map N : [0, 1] × R2 → P(R2 ), we use auxiliary
multivalued maps Si : [0, 1] × R2 → P(R) such that
Si (t, x∗ , y∗ ) z ∈ R : ωi (t, x∗ , y∗ ) ≤ z ≤ c+
i (t, x∗ , y∗ ) .
Here
⎛ ⎞
ϑ0 ϑ0
c+
i (t, x, y) sup σi ⎝x + u(ξ ) dξ , y + v(ξ ) dξ ⎠.
u∈U,v∈V
t t
Obviously,
N(t, x∗ , y∗ ) ⊂ S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ ). (6.9)
First we determine the map S2 . The value function of the game Γ2 is equal to
ω2 (t, x∗ , y∗ ) = y∗ + (1 − t). In addition, c+
2 (t, x∗ , y∗ ) = y∗ + (1 − t). Consequently,
Let us determine the set S1 . The programmed iteration method [7] yields that
ω1 (t, x∗ , y∗ ) = −|x∗ − y∗ |.
Moreover,
c+
1 (t, x∗ , y∗ ) = min − |x∗ − y∗ | + 2(1 − t), 0 .
We obtain that
S1 (t, x∗ , y∗ ) = ω1 (t, x∗ , y∗ ), c+
1 (t, x∗ , y∗ ) . (6.11)
116 Y. Averboukh
Now we determine the map N(t, x∗ , y∗ ). The linearity of the right-hand side
of (6.8) and the convexity of control spaces yield that any measurable control
functions u(·) ∈ U and v(·) ∈ V can be substituted by the constant controls u ∈ P,
v ∈ Q. We have that for all (J1 , J2 ) ∈ S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ )
The equality is achieved only if u = 1. Condition (N1) yields that the inclusion
is fulfilled. Substituting the value (1, 1) for w in the formula for DH N(t, x∗ , y∗ ;
(−|x∗ − y∗ |, y∗ + (1 − t)), w) we claim that for y∗ ≥ x∗
N(t, x∗ , y∗ ) = (−|x∗ − y∗ |, y∗ + (1 − t)) .
Clearly, conditions (N1) and (N2) hold for this map. Let γ0 be a maximal number
of segment [0, 2] such that −|x∗ − y∗ | + γ0 (1 − t) ≤ 0. If (J1 , J2 ) ∈ N(t, x∗ , y∗ ), then
J2 = y∗ + (1 − t), J1 = −|x∗ − y∗ | + d(1 − t) for some d ∈ [0, γ0 ]. Let us prove that
there exists a number δ > 0 with the property
J1 = y∗ − x∗ + d(1 − t) ≤ y∗ − x∗ + δ d + γ0 (1 − t − δ ).
Also,
y∗ − x∗ + δ d + γ0 (1 − t − δ ) ≤ min y∗ − x∗ + d δ + 2(1 − t − δ ); 0 .
y∗ − x∗ + δ d + γ0 (1 − t − δ ) ≤ y∗ − x∗ + d δ + 2(1 − t − δ ).
Since N(t, x∗ , y∗ ) coincide with the set S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ ) in this case, we
claim that the set N(t, x∗ , y∗ ) is a Nash value of the game at the position (t, x∗ , y∗ ).
Let us compare the obtained result with the method based on the system of
Hamilton–Jacobi equations [1]. In the considered case the system of equations is
given by
⎧
⎪
⎪ ∂ ϕ1 ∂ ϕ1 n ∂ ϕ1 n
⎨ ∂ t + ∂ x u (t, x, y) + ∂ y v (t, x, y) = 0,
⎪
(6.13)
⎪
⎪ ∂ ϕ ∂ ϕ ∂ ϕ
⎪
⎩ 2
+
2
u (t, x, y) +
n 2
v (t, x, y) = 0.
n
∂t ∂x ∂y
Here the values un (t, x, y) and vn (t, x, y) are determined by the following conditions:
∂ ϕ1 (t, x, y) n ∂ ϕ (t, x, y)
1
u (t, x, y) = max u ,
∂x u∈P ∂x
∂ ϕ2 (t, x, y) n ∂ ϕ2 (t, x, y)
v (t, x, y) = max v .
∂y v∈Q ∂y
118 Y. Averboukh
In other words, the value (ϕ1 (t, x, y), ϕ2 (t, x, y)) is the maximal Nash equilibrium
payoff of the game at the position (t, x, y).
One can check that the pair of functions (ϕ1 , ϕ2 ) satisfies the conditions of
Corollary 6.1. Simultaneously, there exists a family of functions satisfying the
conditions of Corollary 6.1. Actually, for γ ∈ [0, 2] define
γ −|x∗ − y∗ |, y∗ ≥ x∗ ,
c1 (t, x∗ , y∗ ) =
min − |x∗ − y∗| + γ (1 − t); 0 , y∗ < x∗ ,
γ
c2 (t, x∗ , y∗ ) = y∗ + (1 − t).
γ γ
Let us show that the pair of functions (c1 , c2 ) satisfies the conditions of Corol-
lary 6.1. We have that in our case,
γ
First we prove that the functions ci are the supersolutions of equations (6.4). By [14,
condition U4] it suffices to show that for all (t, x, y) ∈ [t0 , ϑ0 ] × R2 (a, sx , sy ) ∈
γ
D− ci (t, x, y) the following inequality holds:
a + Hi(sx , sy ) ≤ 0, i = 1, 2. (6.15)
The closeness of the map T gives that for all k (J1 , J2 ) ∈ T (t, y∗ (t)), t ∈ [t∗ , θk ]. By
the same argument we claim that (J1 , J2 ) ∈ T (τ , y∗ (τ )). Denote x∗ = y∗ (τ ).
Let us show that τ = ϑ0 . If τ < ϑ0 , then there exist a motion ŷ(·) ∈ Sol(τ , x∗ ) and
a moment θ > τ such that (J1 , J2 ) ∈ T (t, ŷ(t)), t ∈ [τ , θ ]. Consider a motion
y∗ (t), t ∈ [t∗ , τ ],
ỹ(t)
ŷ(t), t ∈ [τ , θ ].
One can reformulate the condition of Theorem 6.1 in the following way: the
graph of T is weakly invariant under the differential inclusion
⎛ ⎞ ⎧⎛ ⎞ ⎫
ẋ ⎨ f (t, x, u, v) ⎬
(t, x) co ⎝
⎝ J˙1 ⎠ ∈ F 0 ⎠ : u ∈ P, v ∈ Q .
⎩ ⎭
J˙2 0
(t, x) = ∅
Dt (grT )(t, x, J1 , J2 ) ∩ F (6.17)
for all (t, x) ∈ [t0 , ϑ0 ] × Rn , (J1 , J2 ) ∈ T (t, x). Here Dt denotes the right-hand
derivative in t. It is defined in the following way. Let G ⊂ [t0 , ϑ0 ] × Rm , G[t] denote
a section of G by t:
G[t] {w ∈ Rm : (t, x) ∈ G},
and the symbol d denote the Euclidian distance between a point and a set. Following
[9, 14] set
#
d(y + δ h; G[t + δ ])
(Dt G)(t, y) h ∈ R : lim inf
m
=0 .
δ →0 δ
dist (J1 , J2 ), T (t + δ ,t + δ (wr + γ ))
b
r
lim inf
δ ↓0,γ ∈Rn ,γ ↓0 δ
dist (J1 , J2 ), T (t + δ r,k ,t + δ r,k (wr + γ r,k ))
= lim .
k→∞ δ r,k
Further,
dist (J1 , J2 ), T (t + δ̂ r , x + δ̂ r (w∗ + γ̂ r ))
δ̂ r
dist (J1 , J2 ), T (t + δ r,k̂(r) , x + δ r,k̂(r) (w∗ + γ r,k̂(r) + wr − w∗ ))
=
δ r,k̂(r)
dist (J1 , J2 ), T (t + δ r,k̂(r) , x + δ r,k̂(r) (wk + γ r,k̂(r) ))
=
δ r,k̂(r)
≤ br + 2−r → b̃, r → ∞.
We have that in (6.19) the right- and left-hand sides are equal. This means that
condition (6.18) is valid.
Thus, condition (6.3) is equivalent to the following one: for all (J1 , J2 ) ∈ T (t, x)
there exists w ∈ F (t, x) such that
dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
lim inf
δ ↓0,γ ∈Rn ,γ ↓0 δ
#
|ζ1 − J1 | + |ζ2 − J2 |
= lim inf inf : (ζ1 , ζ2 ) ∈ T (t + δ , x + δ (w + γ )) = 0.
δ ↓0,γ ∈Rn ,γ ↓0 δ
(6.20)
δ k , γ k , ε1k , ε2k → 0, as k → ∞;
•
• t + δ k , x + δ k (w + γ k ), J1 + δ k ε1k , J2 + δ k ε2k ∈ grT .
Thus,
#
|ζ1 − J1 | + |ζ2 − J2|
inf : (ζ1 , ζ2 ) ∈ T (t + δ , x + δ (w + γ )) = ε1k + ε2k .
k k
δk
$ %
J1 + δ k ε1k , J2 + δ k ε2k ∈ T (t, x + δ k (w + γ k )).
6 Characterization of Feedback Nash Equilibrium for Differential Games 123
Consequently,
&
d (w + δ k w, J1 , J2 ), grT [t + δ k ] ≤ δ k γ k 2 + (ε1k )2 + (ε2k )2 .
∂ ϕi (t, x)
+ Hi (t, x, ∇ϕi (t, x)) ≤ 0, i = 1, 2.
∂t
Since the function ϕi is differentiable, its subdifferential at the position (t, x) is equal
to {∂ ϕ1 (t, x)/∂ t, ∇ϕ1 (t, x)}. Consequently, the function ϕ1 is the upper solution of
Eq. (6.4) for i = 1 [14, Condition (U4)]. Analogously, the function ϕ2 is the upper
solution of Eq. (6.4) for i = 2.
Now let us show that dabs (ϕ1 , ϕ2 )(t, x; w) = 0 for w ∈ F (t, x). Put w =
f (t, x, un , vn ). Indeed,
Let {δ k }∞ k ∞
k=1 ⊂ R, {γ }k=1 ⊂ R be a minimizing sequence. Then
n
∂ ϕ1 (t, x) ∂ ϕ2 (t, x)
+ ∇ϕ1 (t, x), w = + ∇ϕ2 (t, x), w = 0.
∂t ∂t
Acknowledgements This work was supported by the Russian Foundation for Basic Research
(Grant No. 09-01-00436-a), a grant of the president of the Russian Federation (Project MK-
7320.2010.1), and the Russian Academy of Sciences Presidium Programs of Fundamental
Research, Mathematical Theory of Control.
References
1. Basar, T., Olsder G.J.: Dynamic Noncooperative Game Theory. SIAM, Philadelphia (1999)
2. Bressan, A., Shen, W.: Semi-cooperative strategies for differential games. Int. J. Game Theory
32, 561–59 (2004)
3. Bressan, A., Shen, W.: Small BV solutions of hyperbolic noncooperative differential games.
SIAM J. Control Optim. 43, 194–215 (2004)
4. Cardaliaguet, P.: On the instability of the feedback equilibrium payoff in a nonzero-sum
differential game on the line. Ann. Int. Soc. Dyn. Games 9, 57–67 (2007)
5. Cardaliaguet, P., Buckdahn, R., Rainer, C.: Nash equilibrium payoffs for nonzero-sum
stochastic differential games. SIAM J. Control Optim. 43, 624–642 (2004)
6. Cardaliaguet, P., Plaskacz, S.: Existence and uniqueness of a Nash equilibrium feedback for a
simple nonzero-sum differential game. Int. J. Game Theory 32, 33–71 (2003)
7. Chentsov, A.G.: On a game problem of converging at a given instant time. Mat. USSR Sb. 28,
353–376 (1976)
8. Chistyakov, S.V.: On noncooperative differential games. Dokl. AN USSR 259, 1052–1055
(1981) (in Russian)
9. Guseinov, H.G., Subbotin, A.I., Ushakov, V.N.: Derivatives for multivalued mappings with
applications to game-theoretical problems of control. Problems Control Inform. Theory 14,
155–167 (1985)
6 Characterization of Feedback Nash Equilibrium for Differential Games 125
10. Kleimenov, A.F.: Non zero-sum differential games. Ekaterinburg, Nauka (1993) (in Russian)
11. Kononenko, A.F.: On equilibrium positional strategies in nonantagonistic differential games.
Dokl. AN USSR 231, 285–288 (1976) (in Russian)
12. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer, New York
(1988)
13. Olsder, G.J.: On open- and closed-loop bang-bang control in nonzero-sum differential games.
SIAM J. Control Optim. 49(4), 1087–1106 (2001)
14. Subbotin, A.I.: Generalized Solutions of First-order PDEs. The Dynamical Perspective.
Birkhauser, Boston, Ins., Boston (1995)
Chapter 7
Nash Equilibrium Payoffs in Mixed Strategies
Anne Souquière
Abstract We consider non zero sum two players differential games. We study
Nash equilibrium payoffs and publicly correlated equilibrium payoffs. If players
use deterministic strategies, it has been proved that the Nash equilibrium payoffs
are precisely the reachable and consistent payoffs. Referring to repeated games, we
introduce mixed strategies which are probability distributions over pure strategies.
We give a characterization of the set of Nash equilibrium payoffs in mixed strategies.
Unexpectedly, this set is larger than the closed convex hull of the set of Nash
equilibrium payoffs in pure strategies. Finally, we study the set of publicly correlated
equilibrium payoffs for differential games and show that it is the same as the set of
Nash equilibrium payoffs using mixed strategies.
7.1 Introduction
We study equilibria for non zero sum differential games. In general, for a given
equilibrium concept, existence and characterization of the equilibria highly depend
on the strategies used by the players. There are mainly three types of strategies:
• Non-anticipative strategies or memory-strategies where the control depends on
the entire past history of the game (trajectory and controls played so far).
A. Souquière ()
Institut TELECOM, TELECOM Bretagne, UMR CNRS 3192 Lab-STICC,
Technopole Brest Iroise, CS 83818, 29238 Brest Cedex, France
Laboratoire de Mathématiques, Université de Bretagne Occidentale,
UMR 6205, 6 Avenue Victor Le Gorgeu, CS 93837, 29238 Brest Cedex, France
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 127
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 7,
© Springer Science+Business Media New York 2012
128 A. Souquière
• Feed-back strategies where the current control depends only on the actual state
of the system.
• Open-loop controls where the control depends only on time.
Looking for Nash equilibrium payoffs in feedback strategies, one usually computes
Nash equilibrium payoffs as functions of time and space. This leads to a system
of non linear partial differential equations for which there is no general result for
existence nor uniqueness of a solution. If the system admits regular enough solu-
tions, they allow to compute the optimal feedbacks [3, 12]. There are few examples
for this approach, the results essentially deal with linear quadratic differential games
where solutions are sought amongst quadratic functions. For linear quadratic games,
there are conditions for existence of Nash equilibria in feedback strategies and
for existence and uniqueness of Nash equilibria in open-loops. Some numerical
methods can be applied to compute equilibria [11]. The drawback is that feedback
equilibria are highly unstable [5], except in some particular cases of one dimensional
games [6].
In the case of deterministic differential games where players use non-anticipative
strategies, there are existence and characterization results for Nash equilibrium
payoffs in [15, 16, 20]. Our aim is to extend this characterization to the case
where players use mixed non-anticipative strategies, namely random combination
of memory-strategies. The disadvantage of using non-anticipative strategies is that
the associated equilibria lack weak consistency compared to feedback strategies.
Their main interest is that they allow to characterize some kind of upper hull of all
Nash equilibrium payoffs using reasonable strategies.
The notion of mixed strategies is strongly inspired by repeated games. The folk
theorem for repeated games characterizes Nash equilibrium payoffs as feasible and
individually rational [2, 18]. As in repeated games, the difficulty is that mixed
strategies are unobservable [13].
Deterministic nonzero sum differential games are close to stochastic games,
where there is a characterization of the set of correlated equilibria in case the
punishment levels do not depend on the past history. This characterization, relying
on “rational payoffs” [19] is close to ours and to the characterization of Nash
equilibrium payoffs for stochastic games [10]. Our point is to give the link between
these two sets. However, in our case, the punishment level varies with time and the
specific conditions on the game comparable to the ones in [10] do not hold.
The notion of publicly correlated strategies has strong links with non zero sum
stochastic differential games. As for the deterministic case, there is a general result
of existence and characterization [7] in case players use non-anticipative strategies
which is quite close to ours. For non degenerate stochastic differential games,
there is a general result for existence of a Nash equilibrium in feedback strategies
[4] based on the existence of smooth enough solutions for the system of partial
differential equations defining the equilibrium. Another approach [14] uses BSDEs
to check the existence of solutions, prove the existence of a Nash equilibrium and
optimal feedbacks. Note that the equilibria defined through this last approach are in
fact equilibria in non-anticipative strategies [17] when they both exist.
7 Nash Equilibrium Payoffs in Mixed Strategies 129
Here we deal with deterministic non zero sum differential games in mixed
strategies and we study Nash equilibria and publicly correlated equilibria. We now
expose the framework of our game.
We consider a two players non zero sum differential game in RN that runs for
t ∈ [t0 , T ]. The dynamics of the game is given by:
ẋ(t) = f (x(t), u(t), v(t)) t ∈ [t0 , T ], u(t) ∈ U and v(t) ∈ V
(7.1)
x(t0 ) = x0
We first define open-loop controls: we denote by U(t0 ) (respectively V(t0 )) the set
of measurable controls of Player I (respectively Player II):
• A map α : ṼC (t0 ) → ŨC (t0 ) which is strongly non-anticipative with delay [7]:
there exists τ (α ) > 0 such that ∀(Ft )-stopping time S and for all ṽ1 , ṽ2 ∈ ṼC (t0 ),
if ṽ1 ≡ ṽ2 on t0 , S, then α (ṽ1 ) ≡ α (ṽ2 ) on t0 , (S + τ (α )) ∧ T ,
• A map β : Ũ(t0 )C → ṼC (t0 ) which is a strongly non-anticipative strategy with
delay.
We denote by Ac (t0 ) the set of publicly correlated strategies.
Note that our definition is somehow broader than the usual definition of correlated
strategies in repeated games where the correlation signal is given only at the
beginning of the game. Our correlation device is closer to the autonomous corre-
lation device described in [19]. We will call C-correlated strategies any publicly
correlated strategies using the correlation device C. Note that we can associate a
unique pair of C-admissible controls to any C-correlated strategies and therefore
define a unique payoff associated to any publicly correlated strategies as recalled in
Lemma 7.1.
We assume that the payoff functions g1 and g2 are Lipschitz continuous and
bounded, and assume Isaacs’condition: for all (x, ξ ) ∈ RN × RN
In this case the two-players zero sum game whose payoff function is g1 (respec-
tively g2 ) has a value. We denote by
the value of the zero sum game with payoff function g1 where Player I aims at
maximizing his payoff and
the value of the zero sum game with payoff function g2 where Player II is the
maximizer. We recall that these definitions remain unchanged whether α ∈ A(t)
or Ar (t) and β ∈ B(t) or Br (t) [8]. Our assumptions also guarantee that these value
functions are Lipschitz continuous.
As we are interested in nonzero sum games, we need equilibrium concepts:
Definition 7.7 (Nash Equilibrium Payoff in Pure Strategies). The pair (e1 , e2 ) ∈
R2 is a Nash equilibrium payoff in pure strategies (PNEP) for the initial conditions
(t0 , x0 ) if for all > 0, there exists (α , β ) ∈ A(t0 ) × B(t0) such that
1. For i = 1, 2, |Ji (t0 , x0 , α , β ) − ei | ≤
2. For all α ∈ A(t0 ): J1 (t0 , x0 , α , β ) ≤ J1 (t0 , x0 , α , β ) +
3. For all β ∈ B(t0 ): J2 (t0 , x0 , α , β ) ≤ J2 (t0 , x0 , α , β ) +
We denote by E p (t0 , x0 ) the set of all PNEPs for the initial conditions (t0 , x0 ).
132 A. Souquière
where Vi refers to (7.2) or (7.3). Furthermore, the set of PNEPs is non empty.
In this paper, we study MNEPs. First of all, noticing that any pure strategy can be
considered as a trivial mixed strategy, the set Em (t0 , x0 ) is a non empty superset of
E p (t0 , x0 ). It appears that the set Em (t0 , x0 ) is in fact compact, convex and generally
strictly larger than the closed convex hull of the set E p (t0 , x0 ). Our main result
(Theorem 7.1 below) states that:
The payoff e = (e1 , e2 ) ∈ R2 is a MNEP iff for all > 0, there exists a random
control ((Ω, F , P), (u , v )) such that ∀i = 1, 2:
• e is -reachable: |Ji (t0 , x0 , u , v )] − ei | ≤
• (u , v ) are -consistent:
∀t ∈ [t0 , T ], denoting by Ft = σ ((u , v )(s), s ∈ [t0 ,t]):
t ,x ,u ,v
P Vi (t, Xt 0 0 ) ≤ E gi (XT0 0 )Ft + ≥ 1 −
t ,x ,u ,v
The proof heavily relies on techniques introduced for repeated games in [1] known
as “jointly controlled lotteries” and on the fact that we work with non-anticipative
strategies with delay.
Finally, studying publicly correlated equilibria, we show that the set of PCEPs
is equal to the set of MNEPs. The idea of the proof uses the similarity between
correlated equilibrium payoffs and equilibrium payoffs of stochastic non zero sum
differential games.
We complete this introduction by describing the outline of the paper. In Sect. 7.2,
we recall the assumptions on the differential game we study. In Sect. 7.3, we give
the main properties of the set of MNEPs and present an example where the set of
7 Nash Equilibrium Payoffs in Mixed Strategies 133
MNEPs is strictly larger than the convex hull of the set of PNEPs. In Sect. 7.4, we
prove the equivalence between the sets of MNEPs and of PCEPs. We postpone to
the last section the proof of the characterization of the set of MNEPs.
7.2 Definitions
where
⎧
⎪
⎪ U and V are compact subsets of some finite dimensional spaces
⎨
U and V have infinite cardinality,
⎪
⎪ f : RN × U × V → RN is bounded, continuous and uniformly
⎩
Lipschitz continuous with respect to x
In order to study equilibrium payoffs of this game we have introduced pure and
mixed strategies. The major interest of working with non-anticipative strategies with
delay is the following useful result:
Lemma 7.1 (Controls Associated to a Pair of Strategies). 1. For any pair of pure
strategies (α , β ) ∈ A(t0 ) × B(t0 ) there is a unique pair of controls (uαβ , vαβ ) ∈
U(t0 ) × V(t0 ) such that α (vαβ ) = uαβ and β (uαβ ) = vαβ .
134 A. Souquière
Notice that pure strategies are degenerated correlated strategies using some trivial
correlation device. Finally, note that in a zero sum game, using correlated strategies
with a fixed correlation device leads to the same value as using pure strategies.
Indeed, fix the device ((Ω, F , P), C) and denote by (C, α̃ , β̃ ) any C-correlated
strategies and (α , β ) any pair of pure strategies. For i = 1, 2:
t,x,α̃ ,β̃ t,x,α̃ ,β
sup inf E gi (XT ) ≥ sup inf E gi (XT )
α̃ β α̃
β̃
t,x,α ,β
= sup inf gi (XT ) = Vi (t, x)
β α
t,x,α ,β
= inf sup gi XT
α β
t,x,α ,β̃
= inf sup E gi XT
α
β̃
t,x,α̃ ,β̃
≥ inf sup E gi XT
α̃
β̃
7.2.3 Definitions
will be called -optimal. Note that we just have to check the -optimality of α
(respectively β ) against pure strategies β ∈ B(t0 ) (respectively α ∈ A(t0 )), if α
and β are defined on a finite probability space.
7.3.1 Characterization
Note that the characterization could be given using trajectories following [20] rather
than controls, provided the trajectory stems from the dynamics (7.1).
We just give the idea of the proof which is postponed to Sect. 7.5.
136 A. Souquière
The fact that any MNEP satisfies such a characterization is in fact quite natural
if we extend the definition to any random control. Otherwise, there would exist
profitable deviations for one of the players. The way to restrict the definition only to
finite random controls is given through appropriate projection as shown in Sect. 7.4.
The sufficient condition is not intuitive. We have to build non anticipative
strategies with delay such that no unilateral deviation is profitable. The idea is to
build a trigger strategy: follow the same trajectory as the one defined through the
consistent controls (u , v ) as long as no deviation occurs and punish any deviation
in such a way that if a deviation occurred at the point (t, x(t)) the deviating player,
say i, will be rewarded with his guaranteed payoff Vi (t, x(t)). The unique difficulty
is to coordinate the choice of the trajectory to be followed each time there is some
node in the trajectories generated by (u , v ). To this end, players will use some
small delay at each node in order to communicate through jointly controlled lottery.
Assume for example that the trajectory is splitting in two, one generated by ω1 with
probability 1/2 and another generated by ω2 with probability 1/2. During the small
communication delay, Player I chooses either the control u1 or u2 and Player II
selects v1 or v2 . If (u1 , v1 ) or (u2 , v2 ) are played, players will follow the trajectory
generated by ω1 and the one generated by ω2 otherwise. Note that if each player
selects each communication control with probability 1/2 no unilateral cheating in
the use of the control may change the probability of the outcome: each trajectory
will be followed with probability 1/2. This jointly controlled lottery procedure is
easily extended to any finite probability over the trajectories. Of course, if one player
does not use the communication control, he will be punished and get his guaranteed
payoff which, by assumption, is not profitable.
Proposition 7.1. The set Em (t0 , x0 ) of all MNEPs for the initial conditions (t0 , x0 )
is convex and compact in R2 .
Proof. Compactness comes from the fact that the payoff functions are bounded.
Let (e1 , e2 ) ∈ R4 be a pair of Nash equilibrium payoffs in mixed strategies. We
will prove that (λ e1 + (1 − λ ) e2 ) is a Nash equilibrium payoff in mixed strategies
for all λ ∈ (0, 1). We will simply build a finite random control satisfying the
characterization property of Theorem 7.1. As for j = 1, 2, e j is a Nash equilib-
rium payoff, we may choose random controls ((Ω j , P(Ω j ), P j ), (u j , v j )) such that
∀i, j = 1, 2:
• ∀t ∈ [t0 , T ], denoting by Ft = σ (u j , v j )(s), s ∈ [t0 ,t] :
j
t ,x ,u j ,v j t ,x ,u j ,v j j
P j Vi (t, Xt 0 0 ) ≤ E j gi (XT0 0 ) Ft + ≥ 1−
3
7 Nash Equilibrium Payoffs in Mixed Strategies 137
We need to build controls close to the initial pairs (u j , v j ), j = 1, 2, but with some
tag in order to distinguish them. Set some small delay δ > 0 such that for all x ∈
B(x0 , δ f ∞ ), for all (u, v) ∈ U(t0 ) × V(t0 ), for all i = 1, 2, for all t ≥ t0 + δ :
⎧
⎨ Vi (t, Xtt0 ,x0 ,u,v ) − Vi (t − δ , X t0 ,x,u,v ) ≤
t−δ 3
(7.4)
⎩ gi (X t0 ,x0 ,u,v ) − gi (X t0 ,x,u,v ) ≤
T T −δ 3
For i, j = 1, 2, denote by
Σti j = Vi (t, X̄t j ) ≤ E j gi (X̄Tj )F̄t j +
We now define a new finite random space Ω = {1, 2} × Ω1 × Ω2 endowed with the
probability P defined for all ω = ( j, ω 1 , ω 2 ) by:
P( j, ω 1 , ω 2 ) = λ P1 (ω 1 )P2 (ω 2 ) if j = 1
P( j, ω , ω ) = (1 − λ )P (ω )P (ω ) if j = 2
1 2 1 1 2 2
Therefore, assuming w.l.o.g. that the functions gi are non negative and using (7.6):
E gi (XT )Ft ≥ [Vi (t, X̄t1 ) − ]1{1}×Σ i1 ×Ω2 + [Vi (t, X̄t2 ) − ]1{2}×Ω1 ×Σ i2
t t
And finally:
P Σti ≥ P {1} × Σti1 × Ω2 ∪ {2} × Ω1 × Σti2
≥ λ (1 − ) + (1 − λ )(1 − )
≥ 1−
We have just proven that the set of MNEPs is convex; therefore it contains the closed
convex hull of the set of PNEPs. When trying to compare these two sets, it appears
that in general, they are not equal. This result is not intuitive because the guaranteed
7 Nash Equilibrium Payoffs in Mixed Strategies 139
payoffs are exactly the same whether players use pure or mixed strategies. It appears
because players may correlate their strategies throughout the whole game and not
only at the beginning of it.
Proposition 7.2. There exist nonzero sum differential games in which the set of
MNEPs is larger than the convex hull of the set of PNEPs.
Proof. We will build a counter-example where an MNEP does not belong to the
closed convex hull of the PNEPs.
Consider the simple game in finite time in R2 with dynamics:
ẋ = u + v u, v ∈ [−1/2, 1/2]2
starting from the origin O = (0, 0) at time t = 0 and ending at time t = T = 1. The
set of all reachable points in this game is the unit ball in R2 for the L1 norm.
The payoff functions are the Lipschitz continuous functions defined as follows:
⎧
⎪
⎪ g1 (x) = 1 − 4|x2| for |x2 | ≤ 1/4 and |x2 | ≥ |x1 |
⎨
g1 (x) = 1 − 4|x1| for |x1 | ≤ 1/4 and |x1 | ≥ |x2 |
g1 :
⎪
⎪ g (x) = x + 2|x | − 1 for x2 ≥ −2|x1| + 1
⎩ 1 2 1
g1 (x) = 0 elsewhere
In fact, g1 is the nonnegative function defined on the unit square shown in Fig. 7.1
g2 (x) = 0 for x2 ≥ 0
g2 :
g2 (x) = −x2 for x2 ≤ 0
The game clearly fulfill the regularity assumptions listed in the introduction. We
will denote by Lg the greater of the Lipschitz-constants of g1 and g2 for the L1 -norm.
140 A. Souquière
The set of all reachable payoffs is [0, 2] × {0} ∪ y∈(0,1] ([0, 1 − y], y). It is also
clear that
V1 (t, x) = g1 (x)
V2 (t, x) = g2 (x)
The initial values are V1 (0, O) = 1 and V2 (0, O) = 0, implying any Nash
equilibrium payoff has to reward Player I with at least 1 and Player II with a non-
negative payoff. In pure strategies, no trajectory can end up at time T at some x such
that x2 < 0 because this would cause Player I to earn strictly less than 1. We then
have e2 = 0 corresponding to x2 ≥ 0 for every PNEP. We can easily compute
and
⎧
⎪
⎪ for t ∈ [0, 3/4] V2 (t, Xt ) = 0 and E g2 (XT )Ft = 1/8
⎨
for t ∈ (3/4, 1] : either V2 (t, Xt ) = 0 and E g2 (XT )Ft = 0
⎪
⎪
⎩ or V2 (t, Xt ) = t − 3/4 ∈ [0, 1/4] and E g2 (XT )Ft = 1/4
This proves that the final payoff (e1 , e2 ) = (1, 1/8) is a MNEP.
7 Nash Equilibrium Payoffs in Mixed Strategies 141
This characterization and Theorem 7.1 ensure that any MNEP is in fact a PCEP.
We now will prove that Em (t0 , x0 ) ⊇ Ec (t0 , x0 ). Note that the only difference
between the characterizations of MNEPs and PCEPs is that the latest relies on
a random control possibly defined on an infinite underlying probability space,
whereas MNEPs are characterized through finite random controls. We will consider
some PCEP satisfying the characterization of Proposition 7.3 and we will prove
that we are able to build a finite random control satisfying the characterization of
Theorem 7.1, implying it will be a MNEP.
Consider some PCEP e. Fix and consider the 2 -optimal random control
((Ω, F , P), (u , v )). Denote by X· = X· 0 0 and set for all ω ∈ Ω: X· (ω ) =
t ,x ,u ,v
t0 ,x0 ,(u ,v )(ω )
X· . Note that this random control satisfies:
⎧
⎨ E[gi (XT )] − ei ≤
⎪ 2
If Ω is finite, there is nothing left to prove. Else, we will build a finite random control
rewarding a payoff close to e and consistent.
We set h > 0 and h̄ > 0 to be defined later such that there exist Nh , Nh̄ ∈
N∗ such that T − t0 = Nh h and (T − t0 ) f ∞ = Nh̄ h̄. We build the following
time partition Gh = {tk = t0 + kh}k=0,...,Nh and the grid in RN : Gh̄ = {x0 +
∑ni=1 ki h̄ei }(ki )∈{−Nh̄ ,...,0,...,Nh̄ }n where (ei )i=1...n is a basis of RN . We now introduce a
projection on the grid:
RN → Gh̄
Π: x
→ min{xi ∈ Gh̄ / d1 (x, xi ) = infx j ∈Gh̄ d1 (x, x j )}
142 A. Souquière
where the minimum is taken with respect to the lexicographic order and d1 is the
distance associated to the norm x1 .
To any (tk , xi , x j ) ∈ Gh × Gh̄ × Gh̄ we associate, if it exists some ϕ (tk , xi , x j ) =
tk ,x,u,v
(x, u, v) ∈ RN × U(tk ) × V(tk ) such that Π (x) = xi and Π (Xtk+1 ) = x j . We will
set ϕx (tk , xi , x j ) = x and ϕc (tk , xi , x j ) = (u, v).
We now are able to build a finite random control on (Ω, F , P). To any ω ∈ Ω we
associate (uη , vη )(ω ) in the following way:
• Fix (u0 , v0 ) ∈ U × V
• (uη , vη )(ω )|[t0 ,t1 ) = (u0 , v0 )
• For all k = 1 . . . Nh − 1, for all s ∈ [tk ,tk+1 ):
(uη , vη )(ω )(s) = ϕc tk−1 , Π (Xtk−1 (ω )), Π (Xtk (ω )) (s − h)
Note that the definition of (uη , vη ) is non anticipative. From now on, we will
η t ,x ,u ,v η t ,x ,(u ,v )(ω )
denote by X· = X· 0 0 η η and set for all ω ∈ Ω: X· (ω ) = X· 0 0 η η .
We now would like to prove that the set of finitely many random control (uη , vη )
defined on (Ω, F , P) satisfies for i = 1, 2, for some constants C1 ,C2 ,C3 :
• |E[gi (XTη )] − ei | ≤ C1
• ∀t∈ [t0 , T ], ifwe denote by Ftη = σ {(u
η , vη )(s), s ∈ [t0 ,t]}
η η
P E gi (XT ) Ft ≥ Vi (t, Xtη ) − C2 ≥ 1 − C3
First of all, we shall prove that the trajectories generated by (uη , vη ) and (u , v )
are close for sufficiently small values of h and h̄.
For all k = 0 . . . Nh − 1, we have
induction, and noticing that Xtη1 (ω ) − Xt0 (ω )1 ≤ f ∞ h, we have that for all
k = 0 . . . Nh − 1:
k−1
η
Xtk+1 (ω ) − Xtk (ω ) ≤ h̄(1 + eL f h ) ∑ eiL f h + h ekL f h f ∞
1 i=0
T − t0 L f (T −t0 )
≤ 2h̄ e + h eL f (T −t0 ) f ∞
h
In order to minimize the distance between X· (ω ) and X·η (ω ), we set for example
h̄ = h2 in order to get for all k = 0 . . . Nh :
Xtk (ω ) − Xtηk (ω )1 ≤ h eL f (T −t0 ) (2(T − t0 ) + f ∞) + f ∞
It is now easy to check that the final payoff using (uη , vη ) is close to the payoff
generated by (u , v ). Indeed for all i = 1, 2:
Ji (t0 , x0 , u , v ) − Ji (t0 , x0 , uη , vη ) ≤ gi (XT (ω )) − gi(X η (ω )) dP(ω )
T
Ω
≤ Lg XT (ω ) − XTη (ω )1 dP(ω )
Ω
≤ Lg
E gi (XT )Ftη ≤ E gi (XTη )Ftη + Lg (7.10)
We now have to use the Assumption (7.7) on (u , v ): if we denote by
Σi t := ω / Vi (t, Xt ) ≤ E gi (XT )Ft + 2
≤ E gi (XTη )Ftη + KP (Σi t )c Ftη + (LV + Lg + 1) due to (7.10)
Finally, for all > 0, we have built finitely many controls (uη , vη ) defining a finite
random control satisfying for < 1 for i = 1, 2:
E[gi (X η )] − ei ≤ 2C∗
T
optimal mixed strategies (α , β ). We will consider the random control defined
on Ω = Ωα × Ωβ using the probability P = Pα ⊗ Pβ by (u , v )(ωα , ωβ ) =
(uωα ωβ , vωα ωβ ). We will denote the associated trajectories by X· = X· 0 0 .
t ,x ,u ,v
We will prove that these controls are -consistent. Suppose on the contrary that
there exists t¯ ∈ [t0 , T ] such that for example:
P E g1 (XT )Ft¯ ≥ V1 (t¯, Xt¯ ) − < 1 − .
Denote by
Σ := (ωα , ωβ )/ E g1 (XT )Ft¯ ≥ V1 (t¯, Xt¯ ) − .
Proof of Lemma 7.2. We will build the Maximin strategy αg,t (·) as a collection of
finitely many pure strategies with delay. For all x ∈ B(x0 , (t − t0 ) f ∞ ), there exists
some pure strategy αx ∈ A(t) such that:
t,x,αx (v),v
inf g1 (XT ) ≥ V1 (t, x) − /2
v∈V(t)
For continuity reasons, there exists a Borelian partition (Oi )i=1,...I of the ball
B(x0 ,(t − t0 ) f ∞ ) such that for any i there exists some xi ∈ Oi such that
t,z,α (v),v
x
∀z ∈ Oi , inf g1 XT i ≥ V1 (t, z) −
v∈V(t)
146 A. Souquière
and for all x ∈ B(x0 , (t − t0 ) f ∞ ), we define the Maximin strategy αg,t (x) as the
strategy that associates to any v ∈ V(t) the control:
≥ E(V1 (t¯ + δ , Xt¯+δ )1Σc ) − (1 − P(Σ )) + E(g1 (XT )1Σ )
4
≥ E(V1 (t¯, Xt¯ )1Σc ) − L(1 + f ∞)δ − (1 − P(Σ ))
4
+ E(g1 (XT )1Σ )
3
≥ E(E(g1 (XT )|Ft¯)1Σc ) + E(g1(XT )1Σ ) + (1 − P(Σ )) − 2 /4
4
3
≥ E(g1 (XT )1Σc ) + E(g1(XT )1Σ ) + (1 − P(Σ )) − 2 /4
4
2
> J1 (t0 , x0 , α , β ) +
2
7 Nash Equilibrium Payoffs in Mixed Strategies 147
t ,x ,u ,v t ,x ,(u ,v )(ω )
We will set X·η = X· 0 0 η η and for any ω ∈ Ω: X·η (ω ) = X· 0 0 η η .
If the random control is in fact deterministic, we already know a way to build
some pure strategies (α , β ) that are -optimal and reward a payoff -close to e (cf.
the construction of Proposition 6.1 in [20] for example). If the controls (uη , vη ) are
real random controls, we have to build -optimal mixed strategies rewarding a payoff
-close to e. The idea of the optimal strategies (α , β ) is to build “trigger” mixed
strategies that are correlated in order to generate controls close to (uη , vη ). We
will use some jointly correlated lottery at each “node” of the trajectories generated
by (uη , vη ) and, if the opponent does not play the expected control, the player
who detected the deviation swaps to the “punitive strategy”. The proof proceeds
in several steps. First of all, we have to build jointly controlled lotteries for each
“node”. Then we build the optimal strategies, and check that they reward a payoff
close to e and that they are optimal.
To begin with, we introduce the explosions that are kind of “nodes” in the
trajectories generated by (uη , vη ):
Definition 7.10 (Explosion). Consider a finite random control ((Ω, P(Ω), P),
(u , v )) associated to its natural filtration (Ft ). We set Ft− = {0,
/ Ω}. An explosion
0
is any t ∈ [t0 , T ) such that Ft−
= Ft+ .
Assume that (uη , vη ) generates M̄ distinct pairs of deterministic controls with
M̄ ≥ 2 and M explosions with 1 ≤ M ≤ M̄ − 1 denoted by {τi }. We introduce an
148 A. Souquière
auxiliary time step τ to be defined later such that τ < min j=k |τ j − τk |/2, τ < T −
max j τ j and ∃N̄ ∈ N\{0, 1} such that N̄ τ = δ . This ensures that there is no explosion
on [T − τ , T ]. We introduce another time partition (t0 , . . . ,tk = t0 + kτ , . . . ,tNδ N̄ = T ).
We now will explain how to correlate the strategies at each explosion using
jointly controlled lotteries.
First note that we can approximate the real probability P through a probability Q
taking rational values in such a way that the random control ((Ω, FT , Q), (uη , vη ))
rewards a payoff 2η close to e and 2η consistent: for all t ∈ [t0 , T ]:
Q Vi (t, Xtη ) ≤ EQ gi (XTη )Ft + 2η ≥ 1 − 2η (7.15)
Proof of Lemma 7.3. The proof is similar to the proof of Lemma 7.2.
We now have everything needed to define the -optimal strategies.
We recall that the idea of the strategy for Player I is to play the same control as
uη (ω ), ω ∈ Ω as long as there is no explosion and as long as Player II plays
vη (ω ). If an explosion takes place on [tk ,tk+1 ) meaning Ftk+1 is generated by the
atoms (Ωi )i∈I , play on this interval some correlation control as defined by the
corresponding explosion procedure. Then observe at tk+1 the control played by
the opponent on [tk ,tk+1 ) and deduce from the explosion procedure on which Ωi
the game is now correlated and play uη (ωi ), ωi ∈ Ωi from tk+1 on until the next
explosion as long as Player II plays vη (ωi ). Player I repeats the same procedure at
each explosion. As soon as Player I detects that Player II played some unexpected
control, he swaps to the punitive strategy.
In order to define the strategy in a more convenient way, we have to introduce
some auxiliary random processes depending only on the past, namely Ω̄ keeping the
information on which trajectory generated by (uη , vη ) is currently being followed
and S such that S = 0/ if no deviation was observed in the past and S = (tk , x) where
tk ∈ {t0 . . .tNδ N̄ } means that some deviation occurred on [tk ,tk+1 ) and the punitive
strategy is to be played from the state (tk+2 , x) because there is a delay between
the time at which deviation is detected and the time from which punitive strategy is
played.
First of all, in order to build the strategy α for example, we will define the
associated underlying finite probability space. We will define it by induction on
the number of explosions. We will always assume that an explosion procedure is
defined using constant correlation controls that are not used in any other explosion
procedure. This allows to build the set Ωα by backward induction adding new
correlation controls for each explosion.
Any ωα ∈ Ωα prescribes one correlation control for any of the possible
explosion procedures. Fix any sequence of correlation controls (ui ) possibly leading
to the explosion τ̄ ∈ [tk ,tk+1 ) associated to the atom Ωl of Ftk . Consider the set
of correlation controls {ua } associated to this explosion. Then, the conditional
probability of each ua given (ui ) is by definition q(t1l ) :
k
1
Pα [ωα ua |ωα (ui )] = (7.16)
q(tkl )
and
Ω̄α : Ωα × V(t0 ) × {tk }k=0...Nδ N̄ → FT .
At time t0 , for any ωα , for any control v ∈ V(t0 ), we set St0 (ωα , v) = 0/ and
α
α
Ω̄t0 (ωα , v) = Ω and fix u0 ∈ U. For all k ∈ {0, . . . Nδ N̄ − 1}, if α (ωα )(v) is built
on [t0 ,tk ), we define α (ωα )(v) further by:
1. If Stk (ωα , v) = 0,
α
/ for example Stk (ωα , v) = (ti , x), this means that Player II
α
did not play the expected control from ts ∈ [ti ,ti+1 ) on, then play the punitive
η ,t
strategy α (v)|[tk ,tk+1 ) = α p i+2 (x)(v|[ti+2 ,T ] )|[tk ,tk+1 ) as defined in Lemma 7.3 and
α α
(ωα , v) = Stk (ωα , v).
α
set Ω̄tk+1 (ωα , v) = 0/ and Stk+1
α
2. If Stk (ωα , v) = 0, / then
• If there is no explosion on [tk ,tk+1 ) for (uη , vη )(ω ), ω ∈ Ω̄tk (ωα , v),
α
then play α (ωα )(v)|[tk ,tk+1 ) = uη (ω )|[tk ,tk+1 ) for some ω ∈ Ω̄tk (ωα , v) and
α
α α
set Ω̄tk+1 (ωα , v) = Ω̄tk (ωα , v). If k ≥ 1 and if v|[tk−1 ,tk ] ≡ vη (ω )|[tk−1 ,tk ]
for all ω ∈ Ω̄tk (ωα , v) then set Stk+1
α α t0 ,x0 ,α ,v
(ωα , v) = (tk−1 , Xtk+1 ), else set
α
Stk+1 (ωα , v) = 0/
• If there is an explosion on [tk ,tk+1 ) for (uη , vη )(ω ), ω ∈ Ω̄tk (ωα , v), play
α
Ftk and {ST (v) ∈ {tk } × RN } ∈ Ftk where Ftk = σ ((α (v), v)(s), s ∈ [t0 ,tk ]).
α α ,v α ,v
The strategy β is built symmetrically using the auxiliary random processes Ω̄β
and Sβ .
7 Nash Equilibrium Payoffs in Mixed Strategies 151
We will first study the controls generated if Player I plays α and Player II plays
some pure strategy β with delay τ (β ) such that β generates no deviation. We will
say that β generates no deviation as soon as for all k ∈ {0, . . ., Nδ N̄}, Stk (β ) = 0/
α
α α
(equivalently ST (β ) = 0),/ even if ST (β ) = 0/ does not imply that the control
generated by β on [T − τ , T ] is one of the vη .
We will first consider the values taken by the process Ω̄α (β ).
Lemma 7.4. If the strategies (α , β ) are played where β is some pure strategy with
delay such that for all k ∈ {0, . . . , Nδ N̄}, Stk (β ) = 0,
α
/ then for all k ∈ {0, . . . , Nδ N̄},
for all F ∈ Ftk :
α
Pα Ω̄tk (β ) ⊂ F = Q(F)
Proof. We will prove the Lemma by induction on k for all F such that F is an atom
of the filtration Ftk .
For k = 0, this is obviously true for the filtration Ft0 is trivial and Ω̄t0 (β ) = Ω.
α
Assume that the property of the Lemma is true at stage k, k < Nδ N̄ − 1 and that
Ftk is generated by the atoms {Ωki }i∈I . We know that for all k, Ω̄tk (β ) ∈ {Ωki }i∈I ∪ 0.
α
/
Assume now that Ftk+1 = σ {Ω j } j∈J where the Ω j are the atoms of Ftk+1 .
k+1 k+1
Assume that Ω̄ (β ) = Ωk .
α
tk i
• If there exists j ∈ J such that Ωki = Ωk+1 j , this means that no explosion takes
place on [tk ,tk+1 ) for the controls (uη , vη )(ω ), ω ∈ Ωki . As Stk (β ) = 0,
α
/ the
strategy α will generate on [tk ,tk+1 ) the control uη (ω ) for any ω ∈ Ωi and
k
α α α (β ) = Ωk ≥
we will get Ω̄tk+1 (β ) = Ω̄tk (β ) = Ωki . This implies Pα Ω̄tk+1 i
α α
Pα Ω̄tk (β ) = Ωi . On the other hand, the definition of the process Ω̄ (β )
k
α
(β ) ⊆ Ω̄tk (β ) leading to
α
ensures that Ω̄tk+1
α
(β ) = Ωki = Pα Ω̄tk (β ) = Ωki = Q(Ωki )
α
Pα Ω̄tk+1
= Ωk+1
• Assume now that Ωki j for all j ∈ J. This means there is an explosion
on [tk ,tk+1 ) for the controls (uη , vη )(ω ), ω ∈ Ωki and Ωki = jj= i
j0 Ω j . As-
k+1
sume that we have for some ωα Ω̄tk (ωα , β ) = Ωki . Recall that Ftk =
α α ,v
α
σ ((α (v), v)(s), s ∈ [t0 ,tk ]). Note that Stk (ωα , β ) = 0,
/ implying on [tk ,tk+1 )
the strategy α will generate one of the correlation control ua ∈ Ωα prescribed
by the explosion procedure for Ωki . The conditional probability that the control
generated by α at time tk is ua given all correlation controls played so far is
α ,β 1 k α ,β
Pα α (β )|[tk ,tk+1 ) = ua Ftk
α
= × P α Ω̄t (β ) = Ω i Ft
q(tki ) k k
152 A. Souquière
due to (7.16) because every correlation control being unique, the only way to play
ua is when Ω̄tk (β ) = Ωki . Given the controls played on [t0 ,tk ) for any trajectory
α
such that Ω̄tk (β ) = Ωki , at time tk , the pure strategy β being a strategy with
α
delay will generate on [tk ,tk + τ (β )] the same control for example vb whatever
the control ua chosen by Player I on [tk ,tk+1 ). Note that we must have v|[tk ,tk +τ /2)
is equivalent to one of the constant correlation controls, else, Player I would
α
detect some deviation at time tk+1 and set Stk+2 (β ) = 0.
/ In the end, Player II has
to play on [tk ,tk + τ /2) one of the correlation control vb , and always plays the
same control whatever the control ua played by Player I. Finally, we will get for
all j = j0 . . . ji :
α α ,β α k α ,β
Pα Ω̄tk+1 (β ) = Ωk+1
j Ftk
= Q(Ω j |Ω
k+1 k
i ) × P α Ω̄tk
(β ) = Ω i Ftk
= Q(Ωk+1
j |Ωi )Q(Ωi ) = Q(Ω j )
k k k+1
We have proven that for all k ∈ {0, . . . , Nδ N̄ − 1}, for all atom Ωki of the
filtration Ftk ,
Pα Ω̄tk (β ) = Ωki = Q(Ωki ).
α
Noticing that there is no explosion on [T − τ , T ], we get FT = FtN N̄−1 and due to the
δ
definition of the strategy and the fact that S (β ) = 0,
/ we get Ω̄ (β ) = Ω̄ (β ),
α α α
T T tN N̄−1
δ
hence the result.
We still assume that Player I plays α and Player II plays some pure strategy β
such that β generates no deviation and we will compute the payoff Ji (t0 , x0 , α , β )
for i = 1, 2.
Lemma 7.5. If the strategies (α , β ) are played where β is some pure strategy with
delay such that ST (β ) = 0,
α
/ then for all i = 1, 2:
3
|Ji (t0 , x0 , α , β ) − ei| ≤
Nδ
3
|Ji (t0 , x0 , α , β ) − ei | ≤ |Ji (t0 , x0 , α , β (ωβ )) − ei |dPβ (ωβ ) ≤
Ωβ Nδ
7 Nash Equilibrium Payoffs in Mixed Strategies 153
Assume that FT = σ ({Ω j } j=1...M̄ ) where the Ω j are the atoms of FT and players are
using (α , β ) as in the assumptions of the Lemma. Notice that ∀ωα ∈ {Ω̄T (β ) =
α
t0 ,x0 ,(α ,β )(ωα ) η
Xt − Xt (ω j ) L f (T −t0 )
≤ M τ (1 + f ∞ ) e
M̄
≤ ∑ η Q(Ω j ) = η
j=1
It remains to prove that the strategies (α , β ) are optimal. We will prove it for β :
there exists some constant Cα satisfying
Consider some pure strategy with delay β . If β generates no deviation (ST (β ) = 0),
α
/
then we have just proven that:
3 6
J2 (t0 , x0 , α , β ) ≤ e2 + ≤ J2 (t0 , x0 , α , β ) + . (7.19)
Nδ Nδ
It remains to prove the same kind of result as (7.18) for any pure strategy β
generating some unexpected controls (leading for some ωα to ST (ωα , β ) = 0).
α
/
The idea of the proof is first to build some pure strategy β̃ generating the same
controls as β against α as long as no deviation occurs and generating no deviation
against α , that is some non-deviating extension of β . We then will compare the
payoffs induced by β and β̃ .
Lemma 7.6 (Non Deviating Extension β̃ of Some Pure Strategy β of Player II).
To any pure strategy with delay β , one can associate a pure strategy with delay β̃
satisfying:
1. Sα (β̃ ) = 0/
2. The pairs of strategies (α (ωα ), β ) and (α (ωα ), β̃ ) generate the same
pairs of controls on [t0 , T − τ ] × {ST (β ) = 0}
/ ∪k∈{0,...,Nδ N̄} [t0 ,tk ] × {ST (β ) ∈
α α
{tk } × RN }.
Proof. We juste give the sketch of the proof. The strategy β̃ is built the following
way. We need auxiliary random processes in order to keep in mind
• Which trajectory generated by (uη , vη ) is followed.
• Wether β deviated in the past: there exists t ∈ (t0 ,tk ) such that Sα (β̃ )t = 0.
/
• If the strategy played by Player I is α .
For all time interval [tk ,tk+1 ],
• If Player I deviated from α , play any control.
• If β deviated, then play the expected control (either vη (ω ) for the ω correspond-
ing to the followed trajectory or any expected correlation control in case there
is some explosion). Then check whether Player I played the expected controls
corresponding to α .
• If β did not deviate, then if there is no explosion, first play vη (ω ) for the ω to be
followed then check if β deviated and if Player I played the expected strategy, if
there is some explosion, if β is going to play some expected correlation control
on [tk ,tk + τ (β )] then play this control on the first half of the time interval, then
7 Nash Equilibrium Payoffs in Mixed Strategies 155
check if β deviates on [tk ,tk + τ /2], if it does not deviate, play β for the remaining
time interval and otherwise, play any correct correlation control, then check if
Player I deviated.
In this way we are able to build a pure strategy with delay. Indeed, β̃ is anticipative
with respect to β but non anticipative with respect to the control u of the opponent.
Furthermore β̃ satisfies ST (β̃ ) = 0/ and Ω̄T (β̃ )
α α
/ As long as β generates no
= 0.
deviation, the controls generated by (α , β ) and (α , β̃ ) are the same.
We have for any deviating pure strategy β :
Nδ N̄−1
Eα g2 (XT0 0 )1Sα (β )∈{t }×RN
t ,x ,α ,β
J2 (t0 , x0 , α , β ) = ∑ T i
i=0
t ,x0 ,α ,β
+ Eα g2 XT0 1Sα (β )=0/ (7.20)
T
Assume that for example ST (β ) = (ti , x). This means that some deviation occurred
α
on [ti ,ti+1 ). There exists k ∈ {1 . . . Nδ } such that [ti ,ti+1 ) ⊂ [θk−1 , θk ). Using the
definition of the strategy α and introducing the non deviating extension β̃ of β :
η ,t
t ,x0 ,α ,β t ,x,α p i+2 (x),β
g2 (XT0 )1Sα (β )=(t ,x) = g2 (XTi+2 )1Sα (β )=(t ,x)
T i T i
We introduce this last inequality because our estimate of V2 (ti , Xti0 0 ) induces
t ,x ,α ,β̃
some error term of length η , therefore we need to sum up at most Nδ such error
terms in order to bound the global error to some .
In the end we have for all ti ∈ [θk−1 , θk ):
!
t0 ,x0 ,α ,β̃ α β̃
g2 XT0 0
t ,x ,α ,β
1 α
ST (β )∈{ti }×RN
≤ E α V 2 θ k , Xθk Fti
+3 1 α
ST (β )∈{ti }×RN
(7.21)
V2 (θk , Xθ0k 0 ).
t ,x ,α ,β̃
The point now is to get an estimate of We will prove the
following Lemma:
156 A. Souquière
Lemma 7.7. For all t ∈ {tk }k=0...Nδ N̄ , for all pure strategy β̃ generating no
deviation against α , we have:
t ,x ,α ,β̃ α β̃
Pα V2 (t, Xt 0 0 ) ≤ Eα g2 (XT0 0 )Ft + 4η ≥ 1 − 2η
t ,x ,α ,β̃
t ,x0 ,α ,β t ,x ,α ,β̃ α β
g2 (XT0 )1Sα (β )∈{t }×RN ≤ Eα g2 (XT0 0 )Fti 1Sα (β )∈{t }×RN
T i T i
α β̃ α β
+ g∞Pα (Σθk )c Fti 1Sα (β )∈{t }×RN + 71Sα (β )∈{t }×RN
T i T i
Using the fact that {Sα (β ) ∈ {ti }×RN } is (Fti )-measurable due to the definition
α β
We now use this estimate to compute the expectation of the payoff in case there is
some deviation:
Nδ N̄−1
Eα g2 XT0 0
t ,x ,α ,β
∑ 1Sα (β )∈{t }×RN
T i
i=0
Nδ N̄−1
Eα Eα g2 XT0 0 1Sα (β )∈{t }×RN |Fti
t ,x ,α ,β̃ α β
≤ ∑ T i
i=0
" " ##
Nδ kN̄−1
α β
+∑ ∑ Eα g∞ Eα 1 1 α |Fti
(Σθ )c ST (β )∈{ti }×R
α β̃ N
k=1 i=(k−1)N̄ k
+ 7 due to (7.23)
≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/
T
" #
Nδ
+ g∞ ∑ Eα 1 1 α + 7
(Σθ )c ST (β )∈[θk−1 ,θk )×R
α β̃ N
k=1 k
Nδ
c
g2 (XT0 0 )1Sα (β )=0/
t ,x ,α ,β̃
Σθk
α β̃
≤ Eα + g∞ ∑ Pα + 7
T
k=1
Nδ
2
≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/ + g∞ ∑ + 7 thanks to Lemma 7.7
T N
k=1 δ
≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/ + 2g∞ + 7 (7.24)
T
158 A. Souquière
J2 (t0 , x0 , α , β )
Nδ N̄−1
Eα g2 (XT0 0 )1Sα (β )∈{t }×RN + Eα g2 (XT0 0 )1Sα (β )=0/
t ,x ,α ,β t ,x ,α ,β̃
= ∑ T i T
i=0
≤ Eα g2 (XT0 0 )1Sα (β )=0/ + (2g∞ + 7)
t ,x ,α ,β̃
T
+ Eα g2 (XT0 0 )1Sα (β )=0/ due to (7.24)
t ,x ,α ,β̃
T
This proves that β is (13 + 2g∞) optimal. The proof is symmetric to state
that α is (13 + 2g∞) optimal.
Finally, we have build mixed strategies (α , β ) rewarding a payoff 3 close to e
and (13 + 2g∞) optimal. This proves e is a Nash equilibrium payoff.
References
1. Aumann, R.J., Maschler, M.B.: Repeated Games with Incomplete Information. MIT Press,
Cambridge (1995)
2. Aumann, R.J., Shapley, L.S.: Long-Term Competition – A Game Theoretic Analysis Mimeo,
Hebrew University (1976), reprinted in Essays in Game Theory in Honor of Michael Maschler,
(N. Megiddo, ed.), Springer-Verlag, 1–15 (1994)
3. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. Academic, London
(1995)
4. Bensoussan, A., Frehse, J.: Stochastic games for N players. J. Optim. Theory Appl. 105(3),
543–565 (2000)
5. Bressan, A., Shen, W.: Semi-cooperative strategies for differential games. Int. J. Game Theory
32, 561–593 (2004)
6. Bressan, A., Shen, W.: Small BV solutions of hyperbolic noncooperative differential games.
SIAM J. Control Optim. 43(1), 194–215 (2004)
7. Buckdahn, R., Cardaliaguet, P., Rainer, C.: Nash equilibrium payoffs for nonzero sum
stochastic differential games. SIAM J. Control Optim. 43(2), 624–642 (2004)
8. Cardaliaguet, P.: Representations formulae for some differential games with asymetric infor-
mation. J. Optim. Theory Appl. 138(1), 1–16 (2008)
9. Cardaliaguet, P., Quincampoix, M.: Deterministic differential games under probability knowl-
edge of initial condition. Int. Game Theory Rev. 10(1), 1–16 (2008)
10. Dutta, P.K.: A folk theorem for stochastic games. J. Econ. Theory 66(1), 1–32 (1995)
11. Engwerda, J.C.: LQ Dynamic Optimization and Differential Games. Wiley, New York (2005)
12. Friedman, A.: Differential Games, Wiley-Interscience, New York (1971)
7 Nash Equilibrium Payoffs in Mixed Strategies 159
13. Gossner, O.: The folk theorem for finitely repeated games with mixed strategies. Int. J. Game
Theory 24, 95–107 (1995)
14. Hamadène, S., Lepeltier, J.-P., Peng, S.: BSDEs with continuous coefficients and stochastic
differential games. In: El Karoui et al. (eds.) Backward Stochastic Differential Equations,
Pitman Res. Notes in Math. Series, vol. 364, pp.161–175. Longman, Harlow (1997)
15. Kleimenov, A.F.: Nonantagonist Differential Games. “Nauka” Uralprime skoje Otdelenie
Ekaterinburg (1993) (in russian)
16. Kononenko, A.F.: Equilibrium positional strategies in non-antagonistic differential games
Dokl. Akad. Nauk SSSR 231(2), 285–288 (1976). English translation: Soviet Math. Dokl.
17(6), 1557–1560 (1977) (in Russian)
17. Rainer, C.: On two different approaches to nonzero sum stochastic differential games. Appl.
Math. Optim. 56, 131–144 (2007)
18. Rubinstein, A.: Equilibrium in supergames with the overtaking criterion. J. Econ. Theory 31,
227–250 (1979)
19. Solan, E.: Characterization of correlated equilibria in stochastic games. Int. J. Game Theory
30, 259–277 (2001)
20. Tolwinski, B., Haurie, A., Leitmann, G.: Cooperative equilibria in differential games. J. Math.
Anal. Appl. 119, 182–202 (1986)
Chapter 8
A Penalty Method Approach for Open-Loop
Variational Games with Equality Constraints
8.1 Introduction
For the past several years, we have been studying a class of variational games which
may be viewed as an extension of the calculus of variations. In particular, our focus
has been on exploiting a direct solution method, originally due to G. Leitmann in [4],
to investigate sufficient conditions for open-loop Nash equilibria. The study of such
problems pre-dates J. Nash’s work in non-cooperative games, and their study can be
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 161
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 8,
© Springer Science+Business Media New York 2012
162 D.A. Carlson and G. Leitmann
found in the 1920s with a series of mathematical papers by Roos [5–10] exploring
the dynamics of competition in economics. The last of Roo’s papers, provides an
extensive investigation into general variational games and provides analogues of
the standard first-order necessary conditions, such as the Euler–Lagrange equations,
the Weierstrass necessary condition, transversality conditions, Legendre’s necessary
condition and the Jacobi necessary condition. To date, most of these papers dealt
only with unconstrained problems (i.e., free problems of Lagrange type). In this
paper we investigate problems with equality constraints. Our approach is to consider
the feasibility of a penalty method for these problems which extends our recent
paper Carlson and Leitmann [1] from the case of a single-player game to an N-player
game. Penalty methods, of course, are not new and they have been used in a variety
of settings. However, in the study of games a quick search of MathSciNet produced
only 22 papers pertaining to penalty methods and games.
The remainder of the paper is organized as follows. In Sect. 8.2, we define
the class of games we consider and introduce the penalized game. In the next
section we digress to discuss some relevant results concerning growth conditions
and sequentially weak relative compactness. We prove our main result in Sect. 8.4.
In Sect. 8.5 we present an example illustrating our results and we conclude with
some brief remarks indicating how other known techniques might be useful.
over all of his/her possible admissible trajectories (see below), ẋ j (·) satisfying the
fixed end condition x(a) = xa and the equality constraint
Clearly, the trajectories of the other players influences the decision of the jth player
and so each player is unable to minimize independently of the other players. As
a consequence, the players seek to play a (open-loop) Nash equilibrium instead.
To introduce this concept we first introduce the following notation. For each fixed
j = 1, 2, . . . , N, x = (x1 , x2 , . . . , xN ) ∈ Rn , and y j ∈ Rn j we use the notation
.
[x j , y j ] = (x1 , x2 , . . . , x j−1 , y j , x j+1 , . . . xN )
x j (a) = xa j , j = 1, 2, . . . , N, (8.3)
satisfies the equality constraints (8.2), satisfies (t, x j (t), ẋ j (t)) ∈ A j for almost all
t ∈ [a, b] and such that I j (x(·)) exists for all j = 1, 2, . . . , N .
Definition 8.2. Given an admissible trajectory x(·) for the constrained variational
game (8.1), (8.2) we say a function y j (·) : [a, b] → Rn j is an admissible trajectory
for player j relative to x(·) if and only if the function [x j , y j ](·) is an admissible
trajectory for the constrained variational game.
With these definitions we can now give the definition of a Nash equilibrium.
Definition 8.3. An admissible trajectory for the constrained variational game (8.1),
(8.2) x∗ (·) : [a, b] → Rn is called a Nash equilibrium if and only if for each player
j = 1, 2, . . . , N and each function y j (·) : [a, b] → Rn j that is an admissible trajectory
for player j relative to x∗ (·) one has
b
I j (x∗ (·)) = L j (t, x∗ (t), ẋ∗j (t)) dt
a
b
≤ L j (t, [x∗ (t) j , y j (t)], ẏ j (t)) dt
a
Remark 8.2. From the above definitions it is clear that when all of the players “play”
a Nash equilibrium, then each player’s strategy is his best response to that of the
other players. In other words, if player j applies any other admissible trajectory
relative to the Nash equilibrium, than his equilibrium trajectory, his cost functional
will not decrease.
164 D.A. Carlson and G. Leitmann
Remark 8.3. The above dynamic game clearly is not the most general structure one
can imagine, even in a variational framework. In particular, the cost functionals
are coupled only through their state variables and not through their strategies (i.e.,
their time derivatives). While not the most general, one can argue that this form is
general enough to cover many cases of interest since in a “real-world setting,” an
individual player will not know the strategies of the other players (see e.g., Dockner
and Leitmann [3]).
To solve games of the type described above, one usually tries to solve the first-
order necessary conditions to obtain a candidate for the Nash equilibrium and then
apply a sufficient condition to verify that it is one. For the above constrained
problem, the first-order necessary conditions are complicated as a result of the
equality constraints. For such problems one must find a multiplier for each of the
constraints which in its most general form is a measure. As a consequence of this
fact we choose to consider a family of unconstrained games in which the objective
of each player incorporates the constraint multiplied by a positive constant. We now
describe this family of games.
To define the penalized games for each λ > 0 define the function Lλ , j : A j → R by
the formula
Lλ , j (t, x, p j ) = L(t, x, p j ) + λ g j (t, x, p j ), (8.5)
for each (t, x, p j ) ∈ A j . With this integrand we consider the unconstrained game in
which each player tries to minimize the integral functional
b
Iλ , j (x(·)) = Lλ , j (t, x(t), ẋ j (t)) dt, j = 1, 2, . . . N, (8.6)
a
over all of his admissible trajectories x j (·) satisfying the fixed end condition
x j (a) = xa j . Of course, the set of admissible trajectories for this family of un-
constrained games is larger than the set of admissible trajectories for the original
constrained game. For completeness we give the following definitions.
Definition 8.4. For a given λ > 0, a function x(·) : [a, b] → Rn is an admissible
trajectory for the unconstrained game (8.6) if it is absolutely continuous, satisfies
the fixed end condition (8.3), satisfies (t, x(t), ẋ j (t)) ∈ A j for almost all t ∈ [a, b] and
Iλ , j (x(·)) exists for all i = 1, 2, . . . , N.
Definition 8.5. Given an admissible trajectory x(·) for the unconstrained game
(8.6), we say a function y j (·) : [a, b] → Rn j is admissible for player j relative to
x(·) if the trajectory [x(·) j , y j (·)] is an admissible trajectory for the unconstrained
game (8.6).
8 A Penalty Method Approach for Open-Loop Variational Games 165
Definition 8.6. Given a fixed λ > 0, we say an admissible trajectory x∗λ (·) : [a, b] →
Rn for the unconstrained variational (8.6) is a Nash equilibrium of for each j =
1, 2, . . . N and any function y j (·) : [a, b] → Rn j that is admissible for player j relative
to x∗λ (·) one has
b
Iλ , j (x∗λ (·)) = Lλ , j (t, x∗λ (t), ẋ∗λ j (t)) dt
a
b
≤ Lλ , j (t, [x∗λ (t) j , y j (t)], ẏ j (t)) dt
a
We notice that if y(·) is an admissible trajectory for the constrained game (8.1),
(8.2) then it is an admissible trajectory for the unconstrained game (8.6) for any
value of λ ≥ 0 and I j (y(·)) = Iλ , j (y(·)). Thus, if it is the case that x∗λ (·) is both a
Nash equilibrium for the unconstrained game (8.6) and if it is also an admissible
trajectory for the constrained game (8.1), (8.2), then it is a Nash equilibrium for the
constrained game. Indeed if y(·) is admissible for player j for the constrained game
relative to x∗λ (·) (which implies that g j (t, [x∗λ (t) j , y(t) j ], ẏ(t)) = 0) then we have
∗j ∗j
I j (x∗λ (·)) = Iλ , j (x∗λ (·)) ≤ Iλ , j ([xλ , y](·)) = I j ([xλ , y](·)).
The above observation is useful only if we find that a Nash equilibrium for one
of the penalized games is an admissible trajectory for the constrained game. The
idea of a penalty method is that as the penalty parameter λ grows to infinity the
penalized term tends to zero. We now give conditions for when this occurs.
Lemma 8.1. Assume for each j = 1, 2, . . . , N that there exists constants A∗j and
B∗j such that for each admissible trajectory for the unconstrained games, x(·) one
has A∗j ≤ I j (x(·)) ≤ B∗j . Further assume that there exists a λ0 > 0 such that for
all λ > λ0 the unconstrained penalized games have Nash equilibria x∗λ (·) and that
corresponding to each there exists an absolutely continuous function yλ (·) such
that for each j = 1, 2, . . . , N the trajectories [x∗ j , yλ , j ](·) are admissible for the
constrained game (i.e., g(t, [x∗ j , yλ , j ](t), ẏλ j (t)) = 0 a.e. t ∈ [a, b]). Then one has,
b
lim g j (t, x∗λ (t), ẋ∗λ , j (t)) dt = 0, j = 1, 2, . . . , N.
λ →+∞ a
Proof. To prove this result we proceed by contradiction and assume that for some
j = 1, 2 . . . , N there exists a sequence {λk } and an 0 > 0 such that
b
g j t, x∗λk (t), ẋ∗λk , j (t) dt > 0 .
a
166 D.A. Carlson and G. Leitmann
≤ B∗j
for almost all t ∈ [a, b]. Furthermore, we also know nothing about the convergence
of the trajectories {x∗λ (·)} as λ → ∞.
Remark 8.5. The existence of the A∗j ’s can be realized by assuming that the
integrands L j (·, ·, ·) are bounded below, which is not an unusual assumption for
minimization problems. The existence of the admissible trajectories yλ (·) is much
more difficult to satisfy, but it is easy to see that such a trajectory exists if the
equality constraints are not dependent on the other players and if there exists feasible
trajectories for the original constrained game. That is, g j (·, ·, ·) : [a, b] × Rn j ×
Rn j → R and there exists at least one trajectory y(·) satisfying the fixed end
condition (8.3) such that
In this case one only needs to take yλ (·) = y(·) for all λ > λ0 . Finally, the existence
of the constants B∗j is perhaps the most difficult to verify, unless one assumes that
the integrands L j (·, ·, ·) are also bounded above. However, we note that in our proof,
this condition can be weakend slightly by assuming that for each j = 1, 2, . . . , N
one has I j (x(·)) ≤ B∗j for all feasible trajectories for the original constrained game
(8.1), (8.2).
We now begin to investigate the convergence properties of the family of Nash
equilibria x∗λ (·) .
8 A Penalty Method Approach for Open-Loop Variational Games 167
In this section we begin by reviewing some classical notions concerning the weak
topology of absolutely continuous functions and criteria for compactness of a
sequence of absolutely continuous functions. Following this discussion we apply
these ideas to our game model and the compactness of the set of Nash equilibria
{xλ (·)}λ >0 . Following this result we discuss the lower semicontinuity properties
of the integral functionals I j (·) and Iλ , j (·) with respect to the weak topology of
absolutely continuous funcntions. This will allow us to present our main result in
the next section. These questions have their roots in the classical existence results
of the calculus of variations.
The existence theory of the calculus of variations is a delicate balance between
the compactness properties of sets of admissible trajectories and the conditions
imposed on the integral functional to insure lower semicontinuity. Fortunately, this
is a well studied problem for the cases we consider here and indeed the results are
now classical. We begin first by discussing growth conditions and the weak topology
in the class of absolutely continuous functions.
The space of absolutely continuous functions, denoted as AC([a, b]; Rm ), is a
subspace of the set of continuous functions z(·) : [a, b] → Rm with the property that
their first derivatives ż(·) are Lebesgue integrable. Clearly they include the class of
piecewise smooth trajectories. Further, we also know that the fundamental theorem
of calculus holds, i.e.,
t
z(t) = z(a) + ż(s) ds
a
for every t ∈ [a, b] and moreover, whenever
t
z(t) = z(a) + ξ (s) ds, a ≤ t ≤ b,
a
holds for some Lebesgue integrable function ξ (·) : [a, b] → Rm , then necessarily we
have ż(t) = ξ (t) for almost all t ∈ [a, b]. The convergence structure imposed on this
space of functions is the usual weak topology which we define as follows.
Definition 8.7. A sequence {zk (·)}+∞ k=1 in AC([a, b]; R ) converges weakly to a
m
We make the following observations concerning the above definition. First, since we
are only interested in absolutely continuous functions satisfying the fixed endpoint
conditions (8.3), for any sequence of interest for us here we can take tk = a so
that the first condition in the above definition is automatically satisfied. Secondly,
the convergence property of the sequence of derivatives is referred to as weak
convergence in L1 ([a, b]; Rn ) (the space of Lebesgue integrable functions) of the
derivatives. As a consequence of these two observations, we need to consider the
weak compactness of a set of integrable functions. To this end we have the following
well known theorem.
Theorem 8.1. Let {h(·) : [a, b] → Rm } be a family of Lebesgue integrable functions.
The following two statements are equivalent.
1. The family {h(·)} is sequentially weakly relatively compact in L1 ([a, b]; Rm ).
2. There is a constant M ∈ R and a function Φ (·) : [0, ∞) → R such that
b
Φ (ζ )
lim = +∞ and Φ (h(s)) ds ≤ M
ζ →∞ ζ a
has been established in Theorem 2. Moreover, as a result of Lemma 1 (see also the
remarks following its proof) we also have that
b
lim g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt = 0, j = 1, 2, . . . N.
k→∞ a
Now, since the functions g j (·, ·, ·) are nonnegative and convex in their last n j
arguments we have that the integral functionals
b
G j ((z(·), p(·))) = g j (t, z(t), p(·)) dt, j = 1, 2, . . . N,
a
are lower semicontinuous on C([a, b]; Rn j ) × L([a, b]; Rn j ). In particular this means
that
b b
0 = lim g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt ≥ g j (t, x∗j (t), ẋ∗j (t)) dt ≥ 0,
k→+∞ a a
which implies that g j (t, x∗j (t), x∗j (t)) = 0 for almost all t ∈ [a, b]. This of course
says that x∗ (·) is an admissible trajectory for the constrained variational game. It
remains to show that it is a Nash equilibrium. To see this fix j = 1, 2, . . . N and
let y j (·) : [a, b] → Rn j be an admissible trajectory for player j relative to x∗ (·) and
consider the following inequalities for λk
b
I j (x∗λk (·)) = L j (t, x∗λ (t), ẋ∗λk , j (t)) dt
a
≤ Iλk , j (x∗λk (·))
b
= L j (t, x∗λk (t), ẋ∗ λk , j(t)) + λk g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt
a
where the first inequality is a result of the nonnegativeness of g j (·, ·, ·), the second
inequality is a consequence of the fact that x∗λ (·) is a Nash equilibrium for the
k
unconstrained variational game with λ = λk and the last equality follows because
g j (t, y j (t), ẏ j (t)) = 0 for almost all t ∈ [a, b] by the definition of y j (·). Letting k → ∞
in the above gives
172 D.A. Carlson and G. Leitmann
= I j ([x∗ j , y j ](t)).
in which f (k(t)) is a production rate and c(t) denotes a rate of external investments
required for the production (i.e., amount of raw materials). The goal of the firm is
to maximize its profit. The price per unit of each unit is given by a demand p =
p(k(t)), a function that depends on the available inventory of the firm, and the cost
of production C(c(t)) depends on the external investment rate at time t. Thus the
objective of the firm is to maximize a functional of the form
b
I(k(·)) = p(k(t))k(t) − C(c(t)) dt
0
b
= p(k(t))k(t) − C(k̇(t) − f (k(t))) dt. (8.8)
a
generates a pollutant s(t) at each time t which the government has mandated must
be reduced to a fraction of its initial level (i.e. to α s(0), α ∈ (0, 1)) over the time
interval [0, b]. That is, s(b) = α s(0). Each firm generates pollution according to the
following process:
in which g(k(t)) denotes the rate of production of the pollutant by the firm and
μ > 0 is a constant representing the “natural” abatement of the pollutant. This gives
our differential constraint. Thus the problem for the firm is to maximize its profit
given by (8.8) while satisfying the pollution constraint (8.9) and the end conditions
(k(0), s(0)) = (k0 , s0 ) and s(b) = sb (here of course we interpret sb = α s0 but this is
not necessary for the formulation of the problem).
If we choose for specificity p(k) = π k, f (k) = α k + β , g(k) = γ k and C(c) = 12 c2
with all of the coefficients positive constants, the above calculus of variations
problem becomes one in which the objective functional is quadratic and the
differential side constraint becomes linear.
To apply our theory we consider the family of unconstrained variational problems
(Pλ ) of minimizing
b
1
[k̇(t) − α k(t)) − β ]2 − π k(t)2 + λ [ṡ(t) + μ s(t) − γ k(t)]2 dt, (8.10)
0 2
over all piecewise continuous (k(·), s(·)) : [0, b] → R2 satisfying the end conditions
1
Lλ ((k, s), (p, q)) = [p − α k − β ]2 − π k2 + λ [q + μ s − γ k]2.
2
Further we note that since k(b) is unspecified the solution (kλ (·), sλ (·)) must satisfy
the transversality condition
∂ Lλ
= k̇λ (b) − α kλ (b) − β = 0.
∂ p ((kλ (b),sλ (b)),(k̇λ (b),ṡλ (b)))
This supplies us with the terminal condition for the state kλ (t) at t = b.
174 D.A. Carlson and G. Leitmann
d
[k̇(t) − α k(t) − β ] = −α (k̇(t) − α k(t) − β ) − 2π k(t)
dt
−2λ γ (ṡ(t) + μ s(t) − γ k(t))
d
2λ [ṡ(t) + μ s(t) − γ k(t)] = 2μλ (ṡ(t) + μ s(t) − γ k(t))).
dt
For a solution (kλ (·), sλ (·)) of the above system define Λ (·) = ṡ(·) + μ s(·) − γ k(·)
and observe that the second equation becomes
d
Λ (t) = μΛ (t), t ∈ (0, b),
dt
which has the general solution Λ (t) = Λ0λ eμ t , where Λ0λ is a constant to be
determined. Observe that the constant Λ0λ does depend on λ since in general kλ (·)
will. Substituting Λ (·) for (ṡλ (t) + μ sλ (t) − γ kλ (t)) into the first equation gives us
the uncoupled equation,
d λ
[k̇ (t) − α kλ (t) − β ] = −α (k̇λ (t) − α kλ (t) − β ) − 2π kλ (t) − 2λ γΛ0λ eμ t ,
dt
or after simplifying becomes
The general solution of this equation has the form kλ (t) = kc (t)+k1 (t)+k2 (t) where
kc (·) is the general solution of the homogeneous equation k̈(t) + (2π − α 2 )k(t) = 0,
k1 (·) solves the nonhomogeneous equation k̈(t) + (2π − α 2 )k(t) = αβ and k2 (·)
solves the nonhomogeneous equation k̈(t) + (2π − α 2 )k(t) = −2λ γΛ0λ eμ t . Each
of these equations are easy to solve. For simplicity we assume α 2 − 2π > 0 (to
insure real roots of the characteristic equation). Using elementary techniques we
have
αβ 2λ γΛ0λ
kc (t) = A∗ ert + B∗e−rt , k1 (t) = , and k2 (t) = eμt .
α 2 − 2π α − 2π − μ 2
2
giving us
αβ 2λ γΛ0λ
kλ (t) = − + eμ t + A∗ ert + B∗ e−rt ,
α 2 − 2π α 2 − 2π − μ 2
√
in which r = α 2 − 2π . Using the initial value for k(0) = k0 and the transversality
condition we obtain the following two equations for A∗ and B∗ .
8 A Penalty Method Approach for Open-Loop Variational Games 175
αβ 2λ γΛ0λ
A ∗ + B ∗ = k0 + −
α 2 − 2π α 2 − 2π − μ 2
α 2β 2λ γΛ (μ − α ) μ b
(r − α ) erb A∗ + (−r − α ) e−rb B∗ = β − − 2 0λ e .
α − 2π
2 α − 2π − μ 2
Using Cramer’s rule we get the following expressions for A∗ and B∗ :
1 αβ
A∗ = k0 (−r − α ) e−rb + 2 (−r − α ) e−rb + α
Δ α − 2π
2λ γΛ0λ
μb −rb
+ 2 (μ − α ) e − (−r − α ) e
α − 2π − μ 2
∗ 1 αβ
B = β − k0 (r − α ) erb − 2 α + (r − α ) erb
Δ α − 2π
2λ γΛ0λ
μb
(r − α ) e rb
− ( μ − α ) e ,
α 2 − 2π − μ 2
αβ
kλ (t) = − + EΛ0λ λ eμ t + (A + BΛ0λ λ ) ert + (C + DΛ0λ λ ) e−rt ,
α 2 − 2π
in which E is also a constant that is independent of Λ0λ and λ .
We now determine sλ (·). To this end we use the definition of Λ (t) = Λ0λ eμ t to
obtain the differential equation
αβ 1
sλ (t) = s0 e−μ t − (1 − e−μ t ) + (EΛ0λ λ + Λ0λ )(eμ t − e−μ t )
μ (α 2 − 2π ) 2μ
1 1
(A + BΛ0λ λ )(ert − e−μ t ) + (C + DΛ0λ λ )(e−rt − e−μ t ).
r+μ −r + μ
176 D.A. Carlson and G. Leitmann
lim ṡλ (t) + μ sλ (t) − γ kλ (t) = lim Λ0λ eμ t = 0, for all t ∈ [a, b],
λ →+∞ λ →+∞
which implies that in the limit the constraint is satisfied. Further, we notice that
Λ0λ λ → F /H as λ → +∞ from which one easily sees that (kλ (·), sλ (·)) →
(k∗ (·), s∗ (·)) as λ → +∞ where
αβ EF μ t BF DF
k∗ (t) = − + e + A + e rt
+ C + e−rt
α 2 − 2π H H H
αβ F E μt
s∗ (t) = s0 e−μ t − (1 − e−μ t ) + (e − e−μ t )
μ (α 2 − 2π ) 2μ H
1 BF 1 DF
A+ (ert − e−μ t ) + C+ (e−rt − e−μ t ).
r+μ H −r + μ H
If we extend the above example to two players each firm produces an equivalent
good whose quantity at time t ∈ [a, b] is given by a production process
8 A Penalty Method Approach for Open-Loop Variational Games 177
over all inventory streams t → k j (t) satisfying a given initial level k j (a) = ka j . This
gives a simple unconstrained variational game. The production processes of each
firm generate the same pollutant s(t) = s1 (t)+ s2 (t) at each time t (here si (t) denotes
the pollutant level due to the ith player) which the government has mandated must
be reduced to a fraction of its initial level (i.e. to α s(a), α ∈ (0, 1)) over the time
interval [a, b]. That is, s(b) = α s(a). Each firm generates pollution according to the
following process:
8.6 Conclusions
In this paper we explored the use of a penalty method to find open-loop Nash
equilibria for a class of variational games. We showed that using classical as-
sumptions, with roots in the calculus of variations, it was possible to establish our
results. We presented an example of a single-player game in detail and gave some
indication of the difficulties encountered when treating the multi-player case. Our
analysis suggests that a new extension of Leitmann’s direct method to problems
with unspecified right endpoint conditions could prove useful in using this penalty
method to determine Nash equilibria.
Acknowledgements The authors would like to dedicate this paper to the memory of our friend
and colleague Thomas L. Vincent. Additionally, the first author would like to offer his best wishes
to his friend and co-author George Leitmann on the occasion of his eighty-fifth birthday.
References
1. Carlson, D.A., Leitmann, G.: An equivalent problem approach to absolute extrema for calculus
of variations problems with differential constraints. Dyn. Contin. Discrete Impuls. Syst. Ser. B
Appl. Algorithms 18(1), 1–15 (2011)
2. Cesari, L.: Optimization-theory and applications: problems with ordinary differential equa-
tions. In: Applications of Applied Mathematics, vol. 17. Springer, New York (1983)
3. Dockner, E.J., Leitmann, G.: Coordinate transformation and derivation of open-loop Nash
equilibrium. J. Optim. Theory Appl. 110(1), 1–16 (2001)
4. Leitmann, G.: A note on absolute extrema of certain integrals. Int. J. Non-Linear Mech. 2,
55–59 (1967)
5. Roos, C.F.: A mathematical theory of competition. Am. J. Math. 46, 163–175 (1925)
6. Roos, C.F.: Dynamical economics. Proc. Natl. Acad. Sci. 13, 145–150 (1927)
7. Roos, C.F.: A dynamical theory of economic equilibrium. Proceedings of the National
Academy of Sciences 13, 280–285 (1927)
8. Roos, C.F.: A dynamical theory of economics. J. Polit. Econ. 35(5), 632–656 (1927)
9. Roos, C.F.: Generalized Lagrange problems in the calculus of variations. Trans. Am. Math.
Soc. 30(2), 360–384 (1928)
10. Roos, C.F.: Generalized Lagrange problems in the calculus of variations. Trans. Am. Math.
Soc. 31(1), 58–70 (1929)
11. Wagener, F.O.O.: On the Leitmann equivalent problem approach. J. Optim. Theory Appl.
142(1), 229–242 (2009)
Chapter 9
Nash Equilibrium Seeking for Dynamic Systems
with Non-quadratic Payoffs
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 179
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 9,
© Springer Science+Business Media New York 2012
180 P. Frihauf et al.
9.1 Introduction
Consider a noncooperative game with N players and a dynamic mapping from the
players’ actions ui to their payoff values Ji , which the players wish to maximize.
Specifically, we consider a general nonlinear model,
Assumption 9.2. For each u ∈ RN , the equilibrium x = l(u) of the system (9.1) is
locally exponentially stable.
Hence, we assume that for any action by the players, the plant is able to stabilize
the equilibrium. We can relax the requirement for each u ∈ RN as we need to only be
concerned with the action sets of the players, namely, u ∈ U = U1 × · · · ×UN ⊂ RN .
The following assumptions are central to our Nash seeking scheme as they ensure
that at least one stable Nash equilibrium exists.
Assumption 9.3. There exists at least one, possibly multiple, isolated stable Nash
equilibria u∗ = [u∗1 , . . . , u∗N ] such that
∂ (hi ◦ l) ∗
(u ) = 0, (9.4)
∂ ui
∂ 2 (hi ◦ l) ∗
(u ) < 0, (9.5)
∂ u2i
dx
ω = f (x, u∗ + ũ + μ (τ )), (9.10)
dτ
dũi
= ε Ki μi (τ )hi (x), (9.11)
dτ
For the averaging analysis, we first “freeze” x in (9.10) at its quasi-steady state
x = l(u∗ + ũ + μ (τ )) (9.12)
dũi
= ε Ki μi (τ )pi (u∗ + ũ + μ (τ )), (9.13)
dτ
where pi (u∗ + ũ + μ (τ )) = (hi ◦ l)(u∗ + ũ + μ (τ )). This system’s form allows for the
use of general averaging theory [12, 21] and leads to the result:
Theorem 9.1. Consider the system (9.13) for an N-player game under Assumptions
9.3 and 9.4 and where ωi = ω j , ωi
= ω j + ωk , 2 ωi
= ω j + ωk , and ωi = 2 ω j + ωk
for all distinct i, j, k ∈ {1, . . . , N}. There exist M, m > 0 and ε̄ , ā such that, for all
ε ∈ (0, ε̄ ) and ai ∈ (0, ā), if |Δ (0)| is sufficiently small, then for all τ ≥ 0,
where
Δ (τ ) = ũ1 (τ ) − ∑Nj=1 c1j j a2j , . . . , ũN (τ ) − ∑Nj=1 cNjj a2j , (9.15)
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 185
and
⎡ ⎤
∂ 3 p1
⎡ ⎤ ∂ u1 ∂ u2j
(u∗ )
⎢ ⎥
c1j j ⎢ .. ⎥
⎢ . ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ∂ 3 p j−1 ∗ ⎥
⎢ ⎥
⎢ j−1 ⎥
⎢c j j ⎥ ⎢ ∂ u j−1 ∂ u2j (u )⎥
⎢ j ⎥ ⎢ ⎥
⎢ c ⎥ = − 1 Λ −1 ⎢ 1 ∂ pj ∗ ⎥
3
⎢ jj ⎥ ⎢ (u ) ⎥. (9.16)
⎢ j+1 ⎥ 4 ⎢ 2 ∂uj 3
⎥
⎢c j j ⎥ ⎢ ⎥
⎢ ∂ 2 p j+1 (u∗ )⎥
3
⎢ . ⎥ ⎢ ∂ ∂ ⎥
⎢ . ⎥ u
⎢ j j+1
u
⎥
⎣ . ⎦ ⎢ .. ⎥
N ⎢ . ⎥
cjj ⎣ 3 ⎦
∂ pN ∗
∂u ∂u
2 (u )
j N
Proof. As already noted, the form of (9.13) allows for the application of general
averaging theory, which yields the average system,
T
dũave 1
i
= ε Ki lim μi (τ )pi (u∗ + ũave + μ (τ )) dτ . (9.17)
dτ T →∞ T 0
for all i ∈ {1, . . . , N}, and we postulate that ũe has the form,
N N N
ũei = ∑ bij a j + ∑∑ cijk a j ak + O max ai .
3
i
(9.19)
j=1 j=1 k≥ j
k
Dα pi (u∗ ) e
pi (u∗ + ũe + μ (τ )) = ∑ α !
(ũ + μ (τ ))α
|α |=0
Dα pi (ζ ) e
+ ∑ α !
(ũ + μ (τ ))α ,
|α |=k+1
k
Dα pi (u∗ ) e
= ∑ α! (ũ + μ (τ ))α
+ O max
i
a k+1
i , (9.20)
|α |=0
186 P. Frihauf et al.
where ζ is a point on the line segment that connects the points u∗ and u∗ + ũe + μ (τ ).
In (9.20), we have used multi-index notation, namely, α = (α1 , . . . , αN ), |α | = α1 +
· · · + αN , α ! = α1 ! · · · αN !, uα = uα1 1 · · · uαNN , and Dα (hi ◦ l) = ∂ |α | pi /∂ uα1 1 · · · ∂ uαNN .
The second term on the last line of (9.20) follows by substituting the postulated form
of ũe (9.19).
We select k = 3 to capture the effect of the third order derivative on the system
as a representative case. The effect of higher-order derivatives can be studied if
the third order derivative is zero. Substituting (9.20) into (9.18) and computing the
average of each term gives
a2i e ∂ 2 pi ∗ N
∂ 2 pi
0=
2
ũi
∂ ui
2
(u ) + ∑ ũej
∂ u i ∂ u j
(u∗ )
=i
j
1 e 2 a2i ∂ 3 pi ∗ N
∂ 3 pi
+ (ũi ) + (u ) + ũei ∑ ũej 2 (u∗ )
2 8 ∂ ui 3
=i
j ∂ u i ∂ u j
1 e
2 a j
2
N
∂ 3 pi
+∑ ũ j + (u∗ )
=i
j
2 4 ∂ ui ∂ u2j
N N
∂ 3 pi
+ ∑ ∑ ũ j ũk e e ∗
(u ) + O(max a5i ), (9.21)
=i
j k > j ∂ u i ∂ u j ∂ u k i
k
=i
where we have noted (9.4), utilized (9.19), and computed the integrals shown in the
appendix.
Substituting (9.19) into (9.21) and matching first order powers of ai gives
⎡ ⎤ ⎡ 1⎤ ⎡ 1⎤
0 b1 b1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ = a1 Λ ⎣ . ⎦ + · · · + aN Λ ⎣ . ⎦ , (9.22)
0 bN1 bNN
which implies that bij = 0 for all i, j since Λ is nonsingular by Assumption 9.4.
Similarly, matching second order terms of ai , and substituting bij = 0 to simplify the
resulting expressions, yields
⎛ ⎡ 3 ⎤⎞
∂ p1 ∗
(u )
⎢ ∂ u1 ∂ u j
2
⎜ ⎥⎟
⎜ ⎢ .. ⎥⎟
⎜ ⎢ ⎥⎟
⎜ ⎢ . ⎥⎟
⎡ ⎤ ⎜ ⎢ ∂ 3 p j−1 ∗ ⎥⎟
⎡ ⎤ ⎜ ⎡ ⎤ ⎢ ⎥⎟
0 c1jk ⎜ c1j j ⎢ ∂ u j−1 ∂ u2j (u )⎥⎟
⎢ .. ⎥ N N ⎢ . ⎥ N 2 ⎜ ⎢ . ⎥ 1 ⎢ 1 ∂3pj ∗ ⎥
⎜ ⎢ ⎟
⎥⎟
⎣ . ⎦ = ∑ ∑ a j ak Λ ⎢ . ⎥ + ∑ a ⎜Λ ⎣ . ⎦ + ⎢
⎣ . ⎦ j=1 j ⎜ . (u ) ⎥⎟. (9.23)
4 ⎢ 2 ∂uj ⎥⎟
3
j=1 k> j
⎜ ⎢ ⎥⎟
⎢ ∂ 2 p j+1 (u∗ )⎥⎟
N N 3
0 c jk ⎜ cjj
⎜ ⎢ ∂ u j ∂ u j+1 ⎥⎟
⎜ ⎢ ⎥⎟
⎜ ⎢ .. ⎥⎟
⎜ ⎢ . ⎥⎟
⎝ ⎣ 3 ⎦⎠
∂ pN ∗
∂u ∂u2 (u )
j N
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 187
N
ũei = ∑ cij j a2j + O max ai .
i
3
(9.24)
j=1
By again utilizing a Taylor polynomial approximation, one can show that the
Jacobian Ψ ave = [ψi, j ]N×N of (9.17) at ũe has elements given by
T
1 ∂ pi ∗
ψi, j = ε Ki lim μi (τ )
(u + ũe + μ (τ )) dτ ,
T →∞ T 0 ∂ uj
2 ∂ pi
1 2
∗
= ε Ki ai (u ) + O ε max ai ,3
(9.25)
2 ∂ ui ∂ u j i
and is Hurwitz by Assumptions 9.3 and 9.4 for sufficiently small ai , which implies
that the equilibrium (9.24) of the average system (9.17) is exponentially stable, i.e.,
there exist constants M, m > 0 such that
From Theorem 9.1, we see that u of reduced system (9.13) converges to a region
that is biased away from the Nash equilibrium u∗ . This bias is in proportion to the
perturbation magnitudes ai and the third derivatives of the payoff functions, which
are captured by the coefficients cij j . Specifically, ûi of the reduced system converges
to u∗i + ∑Nj=1 cij j a2j + O(ε + maxi a3i ) as t → ∞.
For a two-player game, Theorem 9.1 holds with the obvious change to omit any
reference to ωk and with the less obvious inclusion of the requirement ωi = 3ω j . The
requirement ωi = 3ω j is not explicitly stated in Theorem 9.1 since the combination
of ωi = ω j and ωi = 2ω j + ωk for all distinct i, j, k implies that ωi
= 3ω j . If the
payoff functions were quadratic, rather than non-quadratic, the requirements for the
perturbation frequencies would be simply, ωi = ω j , ωi
= ω j + ωk for the N-player
game, and ωi = ω j , ωi = 2ω j for the two-player game.
188 P. Frihauf et al.
We analyze the full system (9.10)–(9.11) in the time scale τ = ω t using singular
perturbation theory. First, we note that by [12, Theorem 14.4] and Theorem 9.1 there
exists an exponentially stable almost periodic solution ũa = [ũa1 , . . . , ũaN ] such that
dũai
= ε Ki μi (τ )pi (u∗ + ũa + μ (τ )). (9.28)
dτ
Moreover, ũa is unique within a neighborhood of the average solution ũave [16].
We define zi = ũi − ũai and obtain
dzi
= ε Ki μi (τ ) [hi (x) − pi (u∗ + ũa + μ (τ ))] , (9.29)
dτ
dx
ω = f (x, u∗ + z + ũa + μ (τ )), (9.30)
dτ
which from Assumption 9.1, the quasi-steady state is
dzi
= ε Ki μi (τ ) [pi (u∗ + z + ũa + μ (τ )) − pi(u∗ + ũa + μ (τ ))] , (9.32)
dτ
which has an equilibrium at z = 0 that is exponentially stable for sufficiently small
ai as shown in Sect. 9.4.
To formulate the boundary layer model, let y = x − l(u∗ + z + ũa + μ (τ )), and
then in the time scale t = τ /ω ,
dy
= f (y + l(u∗ + z + ũa + μ (τ )), u∗ + z + ũa + μ (τ )),
dt
= f (y + l(u), u), (9.33)
From the convergence properties of ũ(τ ) and because y(t) is exponentially decaying,
x(τ ) − l(u∗ ) exponentially converges to an O(ω + ε + maxi ai )-neighborhood of
the origin. Thus, Ji = hi (x) exponentially converges to an O(ω + ε + maxi ai )-
neighborhood of the payoff value (hi ◦ l)(u∗ ).
We summarize with the following theorem:
Theorem 9.2. Consider the system (9.1)–(9.2), (9.7)–(9.8) for an N-player game
under Assumptions 9.1–9.4 and where ωi = ω j , ωi = ω j + ωk , 2 ωi
= ω j + ωk , and
= 2ω j + ωk for all distinct i, j, k ∈ {1, . . . , N}. There exists ω ∗ > 0 and for any
ωi
ω ∈ (0, ω ∗ ) there exist ε ∗ , a∗ > 0 such that for the given ω and any ε ∈ (0, ε ∗ ),
maxi ai ∈ (0, a∗ ), the solution (x(t), u1 (t), . . . , uN (t)) converges exponentially to
an O(ω + ε + maxi ai )-neighborhood of the point (l(u∗ ), u∗1 , . . . , u∗N ), provided the
initial conditions are sufficiently close to this point.
Due to the Nash seeking strategy’s continual perturbation of the players’ actions,
we achieve exponential convergence to a neighborhood of u∗ , rather than u∗ itself.
The size of this neighborhood depends directly on the selected Nash seeking
parameters, as seen by Theorem 9.2. Thus, smaller parameters lead to a smaller
convergence neighborhood, but they also lead to slower convergence rates. (The
reader is referred to [41] for detailed analysis of this design trade-off for extremum
seeking controllers.) If another algorithm were used in parallel to detect convergence
of a player’s actions on the average, a player could either decrease the size of its
perturbation, making the convergence neighborhood smaller, or choose a constant
action based on its convergence detection. However, with a constant action, a player
will not be able to adapt to any future changes in the game.
By achieving exponential convergence, the players are able to achieve con-
vergence in the presence of a broader class of perturbations to the game than if
convergence were merely asymptotic (see [21, Chap. 9]).
For an example game with players that employ the extremum seeking strategy
(9.7)–(9.8), we consider the system,
190 P. Frihauf et al.
4u1 1
x̄1 = , x̄2 = u2 . (9.40)
16 − u2 4
which is Hurwitz for u2 < 16. Thus, (x̄1 , x̄2 ) is locally exponentially stable, but
not for all (u1 , u2 ) ∈ R2 , violating Assumption 9.2. However, as noted earlier, this
restrictive requirement of local exponential stability for all u ∈ RN was done merely
for notational convenience; we actually only require this assumption to hold for the
players’ action sets. In this example, we restrict the players’ actions to the set
which, on U, yield a Nash equilibrium: (u∗1 , u∗2 ) = (1, 1). At u∗ , this game satisfies
Assumptions 9.3 and 9.4, implying the stability of the reduced model average
system, which can be found explicitly according to (9.17) to be
dũave 1 ave
1
= ε K1 a1 −ũ1 + ũ2
2 ave
, (9.45)
dτ 2
dũave 2 3 ave ∗ ave 3 ave 2 3 2
2
= ε K2 a2 ũ − 3u2ũ2 − (ũ2 ) − a2 , (9.46)
dτ 2 1 2 8
with equilibria,
1 1
ũe1 = (1 − 4u∗2) ± (1 − 4u∗2)2 − 4a22, (9.47)
8 8
ũe2 = 2ũe1 , (9.48)
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 191
action û i
1
0.8 û 1
1
0.6
0.98
0.4
0.96
900 950 1000
0.2
0 200 400 600 800 1000
time (sec)
4
ũe,p
2 = a2 + O(max a3i ). (9.50)
1 − 4u∗2 2 i
For sufficiently small a2 , ũe ≈ (0, 0), (−3/4, −3/2), whereas the postulated form
ũe,p ≈ (0, 0) only. The equilibrium at (−3/4, −3/2) corresponds to the point
(1/4, −1/2), which lies outside of U and is an intersection of the extremals
∂ J1 /∂ u1 = 0, ∂ J2 /∂ u2 = 0.
The Jacobian Ψ ave of the average system is
⎡ ⎤
1
⎢−κ1 κ1 ⎥
Ψ ave = ⎣ 3 2 ⎦, (9.51)
∗
κ2 −3κ2 (ũ2 + u2)
e
2
where κ1 = ε K1 a21 and κ2 = ε K2 a22 , and its characteristic equation is given by,
1
λ 2 + (κ1 + 3κ2 (ũe2 + u∗2)) λ + 3κ1κ2 ũe2 + u∗2 − = 0.
4
α1
α2
Thus, Ψ ave is Hurwitz if and only if α1 and α2 are positive. For sufficiently small a2
(so that ũe ≈ (0, 0)), α1 , α2 > 0, which implies that u∗ is a stable Nash equilibrium.
For the simulations, we select k1 = 1.5, k2 = 2, a1 = 0.09, a2 = 0.03, ω1 = 0.5,
and ω2 = 1.3, where the parameters are chosen to be small, in particular the
perturbation frequencies ωi , since the perturbation must occur at a time scale
that is slower than fast time scale of the nonlinear system. Figures 9.2 and 9.3
192 P. Frihauf et al.
û 1 û 2
action û i
0.6
1
0.4 0.98
0.96
0.2
2900 2950 3000
0
0 500 1000 1500 2000 2500 3000
time (sec)
depict the evolution of the players’ actions û1 and û2 initialized at (u1 (0), u2 (0)) =
(û1 (0), û2 (0)) = (0.25, 1.5) and (0.1, 0.05). The state (x1 , x2 ) is initialized at the
origin in both cases. We show ûi instead of ui to better illustrate the convergence of
the players’ actions to a neighborhood about the Nash strategies since ui contains
the additive signal μi (t).
The slow initial convergence in Fig. 9.3 can best be explained by examining
the phase portrait of the average of the reduced model û-system, which can be
shown to be
2 1 1 ave
1 = ε K1 a1 − û1 + û2 ,
˙ûave ave
(9.52)
2 2
2 3 ave 3 ave 2 3 2
2 = ε K2 a2 û − (û2 ) − a2 ,
˙ûave (9.53)
2 1 2 8
For a2 = 0.03, (ûe1 , ûe2 ) = (0.9999, 0.9998) and (0.2501, −0.4998). Figure 9.4 is the
phase portrait of this system with the stable Nash equilibrium represented by a green
circle, and the point (1/4, −1/2) a red square, which is an unstable equilibrium in
the phase portrait. The boundary of U is denoted by dashed red lines. The initial
condition for Fig. 9.3 lies near the unstable point, so the trajectory travels almost
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 193
û 2ave
0
2
2 1 0 1 2 3
û ave
1
entirely along the eigenvector that points towards the stable equilibrium. We also
see that the trajectories remain in U for initial conditions suitably close to the Nash
equilibrium.
9.7 Conclusions
Acknowledgements This research was made with Government support under and awarded by
DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate
(NDSEG) Fellowship, 32 CFR 168a, and by grants from National Science Foundation, DOE, and
AFOSR.
194 P. Frihauf et al.
Appendix
The following integrals are computed to obtain (9.21), where we have assumed the
frequencies satisfy ωi = ω j , 2 ωi = ω j , 3 ωi = ω j , ωi
= ω j + ωk , ωi
= 2 ω j + ωk ,
2 ωi
= ω j + ωk , for distinct i, j, k ∈ {1, . . . , N} and defined γi = ωi / mini {ωi }:
T
T
1 ai
lim μi (τ ) dτ = lim sin(γi τ + ϕi ) dτ
T →∞ T 0 T →∞ T 0
= 0, (9.55)
1 T a2i T
lim μi2 (τ ) dτ = lim [1 − cos(2γi τ + 2ϕi )] dτ ,
T →∞ T 0 T →∞ 2T 0
a2i
= , (9.56)
2
T
T
1 a3i
lim μi3 (τ ) dτ = lim [3 sin(γi τ + ϕi ) − sin(3γi τ + 3ϕi )] dτ ,
T →∞ T 0 T →∞ 4T 0
= 0, (9.57)
T
1 T a4i
lim μi4 (τ ) dτ = lim [3 − 4 cos(2γi τ + 2ϕi )
T →∞ T 0 T →∞ 8T 0
+ cos(4γi τ + 4ϕi )] dτ ,
3a4i
= , (9.58)
8
T
T
1 ai a j
lim μi (τ )μ j (τ ) dτ = [cos((γi − γ j )τ + ϕi − ϕ j )
T →∞ T 0 2T 0
− cos((γi + γ j )τ + ϕi + ϕ j )] dτ ,
= 0, (9.59)
T
1 T a2i a j
lim μi2 (τ )μ j (τ ) dτ = [sin(γ j τ + ϕ j )
T →∞ T 0 2T 0
− cos(2γi τ + 2ϕi ) sin(γ j τ + ϕ j )] dτ ,
T
a2i a j
= [2 sin(γ j τ + ϕ j )
4T 0
− sin((2γi + γ j )τ + 2ϕi + ϕ j )
+ sin((2γi − γ j )τ + 2ϕi − ϕ j )] dτ ,
= 0, (9.60)
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 195
T
T
1 a3i a j
lim μi3 (τ )μ j (τ ) dτ = [3 sin(γi τ + ϕ j ) sin(γ j τ + ϕ j )
T →∞ T 0 4T 0
− sin(3γi τ + 3ϕi ) sin(γ j τ + ϕ j )] dτ ,
T
a3i a j
= [3 cos((γi − γ j )τ + ϕi − ϕ j )
8T 0
−3 cos((γi + γ j )τ + ϕi + ϕ j )
− cos((3γi − γ j )τ + 3ϕi − ϕ j )
+ cos((3γi + γ j )τ + 3ϕi + ϕ j )] dτ ,
= 0, (9.61)
a2i a2j
T
1 T
lim μi2 (τ )μ 2j (τ ) dτ = lim [2 − 2 cos(2γi τ + 2ϕi )
T →∞ T 0 T →∞ 8T 0
−2 cos(2γ j τ + 2ϕ j )
+ cos(2(γi − γ j )τ + 2(ϕi − ϕ j ))
+ cos(2(γi + γ j )τ + 2(ϕi + ϕ j ))] dτ ,
a2i a2j
= , (9.62)
4
T
T
1 ai a j ak
lim μi (τ )μ j (τ )μk (τ ) dτ , = lim [cos((γi − γ j )τ + ϕi − ϕ j )
T →∞ T 0 T →∞ 2T 0
− cos((γi + γ j )τ + ϕi + ϕ j )] sin(γk τ + ϕk ) dτ ,
ai a j ak
= lim
T →∞ 4T
T
× [sin((γi − γ j + ωk )τ + ϕi − ϕ j + ϕk )
0
− sin((γi − γ j − γk )τ + ϕi − ϕ j − ϕk )
− sin((γi + γ j + γk )τ + ϕi + ϕ j + ϕk )
+ sin((γi + γ j − γk )τ + ϕi + ϕ j − ϕk )] dτ ,
= 0, (9.63)
T ai a2j ak
T
1
lim μi (τ )μ 2j (τ )μk (τ ) dτ , = lim sin(γi τ + ϕi )
T →∞ T 0 T →∞ 2T 0
×(1 − cos(2γ j τ + 2ϕ j )) sin(γk τ + ϕk ) dτ ,
ai a2j ak
T
= lim [cos((γi − γk )τ + ϕi − ϕk )
T →∞ 4T 0
196 P. Frihauf et al.
− cos((γi + γk )τ + ϕi + ϕk )]
×(1 − cos(2γ j τ + 2ϕ j )) dτ
ai a2j ak
T
= lim [2 cos((γi − γk )τ + ϕi − ϕk )
T →∞ 8T 0
− cos((γi − 2γ j − γk )τ + ϕi − 2ϕ j − ϕk )
− cos((γi + 2γ j − γk )τ + ϕi + 2ϕ j − ϕk )
+ cos((γi − 2γ j + γk )τ + ϕi − 2ϕ j + ϕk )
+ cos((γi + 2γ j + γk )τ + ϕi + 2ϕ j + ϕk )
−2 cos((γi + γk )τ + ϕi + ϕk )] dτ ,
= 0. (9.64)
References
1. Altman, E., Başar, T., Srikant, R.: Nash equilibria for combined flow control and routing in
networks: asymptotic behavior for a large number of users. IEEE Trans. Autom. Control 47,
917–930 (2002)
2. Apostol, T.M.: Mathematical Analysis, 2nd ed. Addison-Wesley, Reading (1974)
3. Ariyur, K.B., Kristic, M.: Real-Time Optimization by Extremum-Seeking Control. Wiley-
Interscience, Hoboken (2003)
4. Başar, T.: Control and game-theoretic tools for communication networks (overview). Appl.
Comput. Math. 6, 104–125 (2007)
5. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. SIAM, Philadelphia
(1999)
6. Bauso, D., Giarré, L., Pesenti, R.: Consensus in noncooperative dynamic games: a multiretailer
inventory application. IEEE Trans. Autom. Control 53, 998–1003 (2008)
7. Becker, R., King, R., Petz, R., Nitsche, W.: Adaptive closed-loop separation control on a high-
lift configuration using extremum seeking, AIAA J. 45, 1382–1392 (2007)
8. Carnevale, D., Astolfi, A., Centioli, C., Podda, S., Vitale, V., Zaccarian, L.: A new extremum
seeking technique and its application to maximize RF heating on FTU. Fusing Eng. Design 84,
554–558 (2009)
9. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press,
New York (2006)
10. Cochran, J., Kanso, E., Kelly, S.D., Xiong, H., Krstic, M.: Source seeking for two nonholo-
nomic models of fish locomotion. IEEE Trans. Robot. 25, 1166–1176 (2009)
11. Cochran, J., Krstic, M.: Nonholonomic source seeking with tuning of angular velocity. IEEE
Trans. Autom. Control 54, 717–731 (2009)
12. Fink, A.M.: Almost Periodic Differential Equations, Lecture Notes in Mathematics, vol. 377.
Springer, New York (1974)
13. Foster, D.P., Young, H.P.: Regret testing: learning to play Nash equilibrium without knowing
you have an opponent. Theor. Econ. 1, 341–367 (2006)
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 197
14. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. The MIT Press, Cambridge
(1998)
15. Guay, M., Perrier, M., Dochain, D.: Adaptive extremum seeking control of nonisothermal
continuous stirred reactors. Chem. Eng. Sci. 60, 3671–3681 (2005)
16. Hale, J.K.: Ordinary Differential Equations. Wiley-Interscience, New York (1969)
17. Hart, S., Mansour, Y.: How long to equilibrium? The communication complexity of uncoupled
equilibrium procedures. Games Econ. Behav. 69, 107–126 (2010)
18. Hart, S., Mas-Colell, A.: Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ.
Rev. 95, 1830–1836 (2003)
19. Hart, S., Mas-Colell, A.: Stochastic uncoupled dynamics and Nash equilibrium. Games Econ.
Behav. 57, 286–303 (2006)
20. Jafari, A., Greenwald, A., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and
Nash equilibrium. In: Proceedings of the 18th International Conference on Machine Learning
(2001)
21. Khalil, H.K.: Nonlinear Systems, 3rd ed. Prentice Hall, Upper Saddle River (2002)
22. Killingsworth, N.J., Aceves, S.M., Flowers, D.L., Espinosa-Loza, F., Krstic, M.: HCCI engine
combustion-timing control: optimizing gains and fuel consumption via extremum seeking.
IEEE Trans. Control Syst. Technol. 17, 1350–1361 (2009)
23. Krstic, M., Frihauf, P., Krieger, J., Başar, T.: Nash equilibrium seeking with finitely- and
infinitely-many players. In: Proceedings of the 8th IFAC Symposium on Nonlinear Control
Systems, Bologna (2010)
24. Li, S., Başar, T.: Distributed algorithms for the computation of noncooperative equilibria.
Automatica 23, 523–533 (1987)
25. Luenberger, D.G.: Complete stability of noncooperative games. J. Optim. Theory Appl. 25,
485–505 (1978)
26. Luo, L., Schuster, E.: Mixing enhancement in 2D magnetohydrodynamic channel flow by
extremum seeking boundary control. In: Proceedings of the American Control Conference,
St. Louis (2009)
27. MacKenzie, A.B., Wicker, S.B.: Game theory and the design of self-configuring, adaptive
wireless networks. IEEE Commun. Mag. 39, 126–131 (2001)
28. Marden, J.R., Arslan, G., Shamma, J.S.: Cooperative control and potential games. IEEE Trans.
Syst. Man Cybern. B Cybern. 39, 1393–1407 (2009)
29. Moase, W.H., Manzie, C., Brear, M.J.: Newton-like extremum-seeking part I: theory. In:
Proceedings of the IEEE Conference on Decision and Control, Shanghai (2009)
30. Naimzada, A.K., Sbragia, L.: Oligopoly games with nonlinear demand and cost functions: two
boundedly rational adjustment processes. Chaos Solitons Fract. 29, 707–722 (2006)
31. Nešić, D., Tan, Y., Moase, W.H., Manzie, C.: A unifying approach to extremum seeking:
adaptive schemes based on estimation of derivatives. In: Proceedings of the IEEE Conference
on Decision and Control, Atlanta (2010)
32. Peterson, K., Stefanopoulou, A.: Extremum seeking control for soft landing of an electrome-
chanical valve actuator. Automatica 29, 1063–1069 (2004)
33. Rao, S.S., Venkayya, V.B., Khot, N.S.: Game theory approach for the integrated design of
structures and controls. AIAA J. 26, 463–469 (1988)
34. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave N-person games.
Econometrica 33, 520–534 (1965)
35. Scutari, G., Palomar, D.P., Barbarossa, S.: The MIMO iterative waterfilling algorithm. IEEE
Trans. Signal Process. 57, 1917–1935 (2009)
36. Semsar-Kazerooni, E., Khorasani, K.: Multi-agent team cooperation: a game theory approach.
Automatica 45, 2205–2213 (2009)
37. Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and distributed
convergence to Nash equilibria. IEEE Trans. Autom. Control 53, 312–327 (2005)
38. Sharma, R., Gopal, M.: Synergizing reinforcement learning and game theory—a new direction
for control. Appl. Soft Comput. 10, 675–688 (2010)
198 P. Frihauf et al.
39. Stanković, M.S., Johansson, K.H., Stipanović, D.M.: Distributed seeking of Nash equilibria
in mobile sensor networks. In: Proceedings of the IEEE Conference on Decision and Control,
Atlanta (2010)
40. Stanković, M.S., Stipanović, D.M.: Extremum seeking under stochastic noise and applications
to mobile sensors. Automatica 46, 1243–1251 (2010)
41. Tan, Y., Nešić, D., Mareels, I.: On non-local stability properties of extremum seeking control.
Automatica 42, 889–903 (2006)
42. Young, H.P.: Learning by trial and error. Games Econ. Behav. 65, 626–643 (2009)
43. Zhang, C., Arnold, D., Ghods, N., Siranosian, A., Krstic, M.: Source seeking with nonholo-
nomic unicycle without position measurement and with tuning of forward velocity. Syst.
Control Lett. 56, 245–252 (2007)
44. Zhu, M., Martı́nez, S.: Distributed coverage games for mobile visual sensors (I): Reaching
the set of Nash equilibria. In: Proceedings of the IEEE Conference on Decision and Control,
Shanghai, China (2009)
Chapter 10
A Uniform Tauberian Theorem in Optimal
Control
10.1 Introduction
Finite horizon problems of optimal control have been studied intensively since the
pioneer work of Stekhov, Pontryagin, Boltyanskii [27], Hestenes [18], Bellman
[9] and Isaacs [19, 20] during the cold war—see for instance [7, 22, 23] for major
references, or [14] for a short, clear introduction. A classical model considers the
following controlled dynamic over R+
M. Oliu-Barton ()
Institut Mathématique de Jussieu, UFR 929, Université Paris 6, Paris, France
e-mail: [email protected]
G. Vigeral
CEREMADE, Université Paris-Dauphine, Paris, France
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 199
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 10,
© Springer Science+Business Media New York 2012
200 M. Oliu-Barton and G. Vigeral
y (s) = f (y(s), u(s))
(10.1)
y(0) = y0
It is quite natural to define, whenever the trajectories considered are infinite, for any
discount factor λ > 0, the λ -discounted value of the optimal control problem, as
+∞
Wλ (y0 ) = inf λ e−λ s g(y(s, u, y0 ), u(s)) ds. (10.3)
u∈U 0
In this framework the problem was initially to know whether, for a given finite
horizon T and a given starting point y0 , a minimizing control u existed, solution
of the optimal control problem (T, y0 ). Systems with large, but fixed horizons
were considered and, in particular, the class of “ergodic” systems (that is, those
in which any starting point in the state space Ω is controllable to any point in Ω )
has been thoroughly studied [2, 3, 5, 6, 8, 11, 25]. These systems are asymptotically
independent of the starting point as the horizon goes to infinite. When the horizon is
infinite, the literature on optimal control has mainly focussed on properties of given
trajectories as the time tends to infinity. This approach corresponds to the uniform
approach in a game theoretical framework and is often opposed to the asymptotic
approach (described below), which we have considered in what follows, and which
has received considerably less attention.
In a game-theoretical, discrete time framework, the same kind of problem was
considered since [29], but with several differences in the approach: (1) the starting
point may be chosen at random (a probability μ may be given on Ω , which
randomly determines the point from which the controller will start the play); (2)
the controllability-ergodicity condition is generally not assumed; (3) because of the
inherent recursive structure of the process played in discrete time, the problem is
generally considered for all initial states and time horizons.
For these reasons, what is called the ”asymptotic approach”—the behavior of
Vt (·) as the horizon t tends to infinity, or of Wλ (·) as the discount factor λ tends
to zero—has been more studied in this discrete-time setup. Moreover, when it is
10 A Uniform Tauberian Theorem in Optimal Control 201
considered in Optimal Control, in most cases [4, 10] an ergodic assumption is made
which not only ensures the convergence of Vt (y0 ) to some V , but also forces the limit
function V to be independent of the starting point y0 . The general asymptotic case,
in which no ergodicity condition is assumed, has been to our knowledge studied
for the first time recently. In [11, 28] the authors prove in different frameworks the
convergence of Vt (·) and Wλ (·) to some non-constant function V (y0 ).
Some important, closely related questions are the following : does the con-
vergence of Vt (·) imply the convergence of Wλ (·)? Or vice versa? If they both
converge, does the limit coincide? A partial answer to these questions goes back
to the beginning of the twentieth century, when Hardy and Littlewood proved (see
[17]) that for any sequence of bounded real numbers, the convergence of the Cesaro
means is equivalent to the convergence of their Abel means, and that the limits are
then the same :
Theorem 10.1 ([17]). For any bounded sequence of reals {an }n≥1 , define Vn =
+∞
n ∑i=1 ai and Wλ = λ ∑i=1 (1 − λ )
1 n i−1 a . Then,
i
Moreover, if the central inequality is an equality, then all inequalities are equalities.
Noticing that {an} can be viewed as a sequence of costs for some deterministic
(uncontrolled) dynamic in discrete-time, this results gives the equivalence between
the convergence
of Vt and the convergence
of Wλ , to the same limit. In 1971, setting
Vt = 1t 0t g(s) ds and Wλ = λ 0+∞ e−λ s g(s) ds, for a given Lebesgue-measurable,
bounded, real function g, Feller proved that the same result holds for continuous-
time uncontrolled dynamics (particular case of Theorem 2, p. 445 in [15]).
Theorem 10.2 ([15]).
Moreover, if the central inequality is an equality, then all inequalities are equalities.
In 1992, Lehrer and Sorin [24] considered a discrete-time controlled dynamic,
defined by a correspondence Γ : Ω ⇒ Ω , with nonempty values, and by g, a bounded
real cost function defined on Ω . A feasible play at z ∈ Ω is an infinite sequence
y = {yn }n≥1 such that y1 = z and yn+1 ∈ Γ (yn ). The average and discounted
value functions are defined respectively by Vn (z) = inf 1n ∑ni=1 g(yi ) and Wλ (y0 ) =
inf λ ∑+∞
i=1 (1 − λ )
i−1 g(y ), where the infima are taken over the feasible plays at z.
i
the general case where the limit may depend on the starting point y0 . The uniform
condition is necessary: in the same article, the authors provide an example where
only pointwise convergence holds and the limits differs.
In 1998, Arisawa (see [4]) considered a continuous-time controlled dynamic and
proved the equivalence between the uniform convergence of Wλ and the uniform
convergence of Vt in the specific case of limits independent of the starting point.
Theorem 10.4 ([4]). Let d ∈ R, then
This does not settle the general case, in which the limit function may depend on the
starting point.1 For a continuous-time controlled dynamic in which Vt (y0 ) converges
to some function V (y0 ), dependent on the state variable y0 , as t goes to infinity, we
prove the following
Theorem 10.5. Vt (y0 ) converges to V (y0 ) uniformly on Ω , if and only if Wλ (y0 )
converges to V (y0 ) uniformly on Ω .
In fact, we will prove this result in a more general framework, as described
in Sect. 10.2. Some basic lemmas which occur to be important tools will also be
proven on that section. Section 10.3 will be devoted to the proof of our main result.
Section 10.4 will conclude by pointing out, via an example, the fact that uniform
convergence is a necessary requirement for the Theorem 10.5 to hold. A very simple
dynamic is described, in which the pointwise limits of Vt (·) and Wλ (·) exist but
differ. It should be noted that our proofs (as well as the counterexample in Sect. 10.4)
are adaptations in this continuous-time framework of ideas employed in a discrete-
time setting in [24]. In the appendix we also point out that an alternative proof of our
theorem is obtained using the main theorem in [24] as well as a discrete/continuous
equivalence argument.
For completeness, let us mention briefly this other approach, mentioned
above as the uniform approach, and which has also been deeply studied, see
for exemple [12, 13, 16]. In these models, the optimal average cost value is not
taken over a finite period of time [0,t], which is then studied for t growing
to infinite, as in [4, 15, 17, 24, 28] or in our framework. On the contrary, only
infinite trajectories
are considered, among which the value Vt is defined as
infu∈U supτ ≥t τ1 0τ g(y(s, u, y0 ), u(s)) ds, or some other closely related variation.
The asymptotic behavior, as t tends to infinity, of the function Vt has also been
studied in that framework. In [16], both λ -discounted and average evaluations of an
infinite trajectory are considered and their limits are compared. However, we stress
1 Lemma 6 and Theorem 8 in [4] deal with this general setting, but we believe them to be incorrect
since they are stated for pointwise convergence and, consequently, are contradicted by the example
in Sect. 10.4.
10 A Uniform Tauberian Theorem in Optimal Control 203
out that the asymptotic behavior of those quantities is in general2 not related to the
asymptotic behavior of Vt and Wλ .
Finally, let us point out that in the framework of zero-sum differential games, that
is when the dynamic is controlled by two players with opposite goals, a Tauberian
theorem is given in the ergodic case by Theorem 2.1 in [1]. However, to our
knowledge the general, non ergodic case is still an open problem.
10.2 Model
This is defined for t, λ ∈]0, +∞[. Naturally, we define the values as:
2 The reader may verify that this is indeed not the case in the example of Sect. 10.4.
204 M. Oliu-Barton and G. Vigeral
We follow the ideas of [24], and start by proving two simple lemmas yet important
tools, that will be used in the proof. The first establishes that the value increases
along the trajectories. Then, we prove a convexity result linking the finite horizon
average payoffs and the discounted evaluations on any given trajectory.
Lemma 10.1. Monotonicity (compare with Proposition 1 in [24]). For all X ∈ T ,
for all s ≥ 0, we have
Proof. Set y := X(s) and x := X(0). For ε > 0, take T ∈ R+ such that s+T s
< ε.
Let t > T and take an ε -optimal trajectory for Vt , i.e. Y ∈ Γ (y) such that γt (Y ) ≤
Vt (y) + ε . Define the concatenation of X and Y at time s as in (10.4), where X ◦s Y
is in ∈ Γ (x) by assumption. Hence
s t
Vt+s (x) ≤ γt+s (X ◦s Y ) = γs (X) + γt (Y )
t +s t +s
≤ ε + γt (Y )
≤ 2ε + Vt (y).
10 A Uniform Tauberian Theorem in Optimal Control 205
Let λ ∈]0, λ0 ] and take Y ∈ Γ (y) an ε -optimal trajectory for Wλ (y). Then:
s +∞
Wλ (x) ≤ νλ (X ◦s Y ) = λ e−λ r g(X(r)) dr + λ e−λ r g(Y (r − s)) dr
0 s
≤ ε + e−λ sνλ (Y )
≤ 2ε + Wλ (y).
Again, this is true for any λ ∈]0, λ0 ], and the result follows.
Lemma 10.2. Convexity (compare with Eq. (10.1) in [24]). For any play X ∈ T ,
for any λ > 0: +∞
νλ (X) = γs (X)μλ (s) ds, (10.12)
0
where μλ (s) ds := λ 2 s e−λ s ds is a probability density on [0, +∞].
Proof. It is enough to notice that the following relation holds, by integration by
parts:
+∞ +∞ s
1
νλ (X) = λ e−λ s g(X(s)) ds = λ 2 se−λ s g(X(r)) dr ds,
0 0 s 0
and that 0+∞ λ 2 se−λ s ds = 1.
The probability measure μλ plays an important role in the rest of the paper.
Denoting
β
M(α , β ; λ ) := μλ (s) ds = e−λ α (1 + λ α ) − e−λ β (1 + λ β ),
α
we prove here two estimates that will be helpful in the next section.
Lemma 10.3. The two following results hold (compare with Lemma 3 in [24]):
ε
(i) ∀t > 0, ∃ε0 such that ∀ε ≤ ε0 , M (1 − ε )t,t;
t ≥ 2e .
1
(ii) ∀δ > 0, ∃ε0 such that ∀ε ≤ ε0 , ∀t > 0, M ε t, (1 − ε )t; t √1 ε ≥ 1 − δ .
10.3.1 From Vt to Wλ
Assume (B) : Vt (·) converges to some V (·) as t goes to infinity, uniformly on Ω . Our
proof follows Proposition 4 and Lemmas 8 and 9 in [24].
Proposition 10.1. For all ε > 0, there exists λ0 > 0 such that Wλ (x) ≥ V (x) − ε for
every x ∈ Ω and for all λ ∈]0, λ0 ].
Proof. Let T be such that Vt −V ∞ ≤ ε /2 for every t ≥ T . Choose λ0 > 0 such that
+∞
ε
λ2 se−λ s ds = 1 − (1 + λ T)e−λ T ≥ 1 −
T 4
for every λ ∈]0, λ0 ]. Fix λ ∈]0, λ0 ] and take a play Y ∈ Γ (x) which is ε /4-optimal
for Wλ (x). Since γs (X) ≥ 0, the convexity formula (10.12) from Lemma 10.2 gives:
+∞
ε
Wλ (x) + ≥ νλ (Y ) ≥ 0 + λ 2 se−λ s γs (Y ) ds
4 T
+∞
≥ λ2 se−λ sVs (x) ds
T
ε ε
≥ 1− V (x) −
4 2
ε ε ε2
= V (x) − V (x) − +
4 2 8
3ε
≥ V (x) − .
4
Lemma 10.4. ∀ε > 0, ∃M such that for all t ≥ M, ∀x ∈ Ω , there is a play X ∈ Γ (x)
such that γs (X) ≤ V (x) + ε for all s ∈ [ε t, (1 − ε )t].
Proof. By (B) there exists M such that Vr −V ≤ ε 2 /3 for all r ≥ ε M. Given t ≥ M
and x ∈ Ω , let X ∈ Γ (x) be a play (from x) such that γt (X) ≤ Vt (x) + ε 2 /3. For any
s ≤ (1 − ε )t, we have that t − s ≥ ε t ≥ ε M so Proposition 10.1 (Monotonicity)
imply that
ε2 ε2
Vt−s (X(s)) ≥ V (X(s)) − ≥ V (x) − . (10.13)
3 3
Since V (x) + ε 2 /3 ≥ Vt (x), we also have:
2ε 2 ε2
t V (x) + ≥ t Vt (x) +
3 3
s t
≥ t γt (X) = g(X(r)) dr + g(X(r)) dr
0 s
10 A Uniform Tauberian Theorem in Optimal Control 207
tε2
γs (X) ≤ V (x) +
s
≤ V (x) + ε , for s/ε ≥ t,
Proposition 10.2. ∀δ > 0, ∃λ0 such that ∀x ∈ Ω , for all λ ∈]0, λ0 ], we have
Wλ (x) ≤ V (x) + δ .
Proof. By Lemma 10.3(ii), one can choose ε small enough such that
1 δ
M ε t, (1 − ε )t; √ ≥ 1− ,
t ε 2
for any t. In particular, we can take ε ≤ δ /2. Using Lemma 10.4 with δ /2, we get
that for t ≥ t0 (and thus for λ (t) := t √1 ε ≤ t √
1
) and for any x ∈ Ω , there exists a
0 ε
play X ∈ Γ (x) such that
(1−ε )t
δ
νλ (t) (X) ≤ + λ (t)2 sesλ (t) γs (X) ds
2 εt
δ δ
≤ + V (x) + .
2 2
Propositions 10.1 and 10.2 establish the first part of Theorem 10.5: (B) ⇒ (A).
10.3.2 From Wλ to Vt
Proof. Fix Y ∈ Γ (x) some ε /2-optimal play for Vt (x). The function s → γs (Y ) is
continuous on ]0,t] and satisfies γt (Y ) ≤ Vt (x) + ε /2. The bound on g implies that
γr (Y ) ≤ Vt (x) + ε for all r ∈ [t(1 − ε /2),t].
Consider now the set {s ∈]0,t] | γs (Y ) > Vt (x) + ε }. If this set is empty, then take
L = 0 and observe that for any r ∈]0,t],
r
1
g(Y (s)) ds ≤ Vt (x) + ε .
r 0
Otherwise, let L be the superior bound of this set. Notice that L < t(1 − ε /2) and
that by continuity γL (Y ) = Vt (x) + ε . Now, for any T ∈ [0,t − L],
Vt (x) + ε ≥ γL+T (Y )
L+T
L T 1
= γL (Y ) + g(Y (s)) ds
L+T L+T T L
L+T
L T 1
= (Vt (x) + ε ) + g(Y (s)) ds
L+T L+T T L
Proposition 10.3. ∀ε > 0, ∃T such that for all t ≥ T we have Vt (x) ≥ W (x) − ε , for
all x ∈ Ω .
Proof. Let λ be such that Wλ − W ≤ ε /8, and T such that
+∞
ε
λ 2
se−λ s ds < .
T ε /4 8
Proceed by contradiction and suppose that ε > 0 is such that for every T , there exists
t0 ≥ T and a state x0 ∈ Ω such that Vt0 (x0 ) < W (x0 ) − ε .
Using Lemma 10.5 with ε /2, we get a play Y ∈ Γ (x0 ) and a time L ∈ [0,t0
(1 − ε /4)] such that, ∀s ∈ [0,t0 − L] (and, in particular, ∀s ∈ [0,t0 ε /4]),
L+s
1 ε ε
g (Y (r)) dr ≤ Vt0 (x0 ) + < W (x0 ) − .
s L 2 2
Thus,
ε
W (Y (L)) − ≤ Wλ (Y (L))
8
+∞
≤λ e−λ s g(Y (L + s)) ds
0
t0 ε /4 L+s
1 ε
≤ λ2 se−λ s g (Y (r)) dr ds +
0 s L 8
10 A Uniform Tauberian Theorem in Optimal Control 209
ε ε
≤ W (x0 ) − +
2 8
3ε
= W (x0 ) − .
8
Proposition 10.4. ∀ε > 0, ∃T such that for all t ≥ T we have Vt (x) ≤ W (x) + ε , for
all x ∈ Ω .
Proof. Otherwise, ∃ε > 0 such that ∀T, ∃t ≥ T and x ∈ Ω with Vt (x) >
W (x) + ε . For any X ∈ Γ (x) consider the (continuous in s) payoff function
γs (X) = 1s 0s g(X(r)) dr. Of course, γt (X) ≥ Vt (x) > W (x)+ ε . Furthermore, because
of the bound on g,
ε
holds. We set δ := 4e . By Proposition 10.3, there is a K such that Vt ≥ W (x) − δ8ε ,
for all t ≥ K. Fix K and consider
γs (X)|K ≥ 0
δε
γs (X)|(K∪R)c ≥ W (x) −
8
ε
γs (X)|R ≥ W (x) + .
2
210 M. Oliu-Barton and G. Vigeral
This is true for any play, so its infimum also satisfies Wλ̃ (x) ≥ W (x) + δ4ε , which is
a contradiction, for we assumed that Wλ̃ (x) < W (x) + δ5ε .
Propositions 10.3 and 10.4 establish the second half of Theorem 10.5: (A) ⇒ (B).
3 We thank Marc Quincampoix for pointing out this example to us, which is simpler that our original
one.
10 A Uniform Tauberian Theorem in Optimal Control 211
⎧
⎪
⎪ if y0 > 0 or x0 > 2
⎨1
W (x0 , y0 ) = 0 if y0 = 0 and 1 ≤ x0 ≤ 2
⎪
⎪
⎩1 − (1−x0 )1−x0 if y0 = 0 and x0 < 1.
(2−x0 )2−x0
Here we only prove that V (0, 0) = 12 and W (0, 0) = 34 ; the proof for y0 = 0 and
0 < x0 < 1 is similar and the other cases are easy.
First of all we prove that for any t or λ and any admissible trajectory (that is,
any function X(t) = (x(t), y(t)) compatible with a control u(t)) starting from (0, 0),
γt (X) ≥ 12 and νλ (X) ≥ 34 . This is clear if x(t) is identically 0, so consider this is not
the case. Since the speed y(t) is increasing, we can define t1 and t2 as the times at
which x(t1 ) = 1 and x(t2 ) = 2 respectively, and moreover we have t2 ≤ 2t1 . Then,
min(t,t1 ) t
1
γt (X) = ds + ds
t 0 min(t,t2 )
t t
1 2
= 1 + min 1, − min 1,
t t
t t
2 2
≥ 1 + min 1, − min 1,
2t t
1
≥
2
and
t1 +∞
−λ s
νλ (X) = λe ds + λ e−λ s ds
0 t2
= 1 − exp(−λ t1 ) + exp(−λ t2 )
≥ 1 − exp(−λ t1 ) + exp(−2λ t1)
≥ min{1 − a + a2}
a>0
3
= .
4
On the other hand, one can prove [28] that lim supVt (0, 0) ≤ 1/2 : in the problem
with horizon t, consider the control “u(s) = 1 until s = 2/t and then 0”. Similarly
one proves that lim supWλ (0, 0) ≤ 3/4: in the λ -discounted problem, consider the
control “u(s) = 1 until s = λ /ln 2 and then 0”.
So the functions Vt and Wλ converge pointwise on Ω , but their limits V and
W are different, since we have just shown V (0, 0) = W (0, 0). One can verify that
neither convergence is uniform on Ω by considering Vt (1, ε ) and Wλ (1, ε ) for small
positive ε .
212 M. Oliu-Barton and G. Vigeral
Remark 10.1. One may object that this example is not very regular since the payoff
g is not continuous and the state space is not compact. However a related, smoother
example can easily be constructed:
1. The set of controls is still [0, 1].
2. The continuous cost g(x) is equal to 1 outside the segment [0.9,2.1], to 0 on [1,2],
and linear on the two remainings intervals. √ √
3. The compact state space is Ω = {(x, y)|0 ≤ y ≤ 2x ≤ 2 2}.
4. The dynamic is the same that in the original example for x ∈ [0, 3], and f (x, y, u) =
((4 − x)y, (4 − x)u) for 3 ≤ x ≤ 4. The inequality y(t)y (t) ≤ x (t) is thus satisfied
on any trajectory, which implies that Ω is forward invariant under this dynamic.
With these changes the values Vt (·) and Wλ (·) still both converge pointwise on Ω to
some V (·) and W
(·) respectively, and V (0, 0)
=W (0, 0).
• We considered the finite horizon problem and the discounted one, but it should
be possible to establish similar Tauberian theorems for other, more complex,
evaluations of the payoff. This was settled in the discrete time case in [26].
• It would be very fruitful to establish necessary or sufficient conditions for
uniform convergence to hold. In this direction we mention [28] in which
sufficient conditions for the stronger notion of Uniform Value (meaning that
there are controls that are nearly optimal no matter the horizon, provided it is
large enough) are given in a general setting.
• In the discrete case an example is constructed in [26] in which there is no uniform
value despite uniform convergence of the families Vt and Wλ . It would be of
interest to construct such an example in continuous time, in particular in the
framework of Sect. 10.1.
• It would be very interesting to study Tauberian theorems for dynamic systems
that are controlled by two conflicting controllers. In the framework of differential
games this has been done recently (Theorem 2.1 in [1]): an extension of
Theorem 10.4 has been accomplished for two player games in which the limit
of VT or Wλ is assumed to be independent of the starting point. The similar result
in the discrete time framework is a consequence of Theorems 1.1 and 3.5 in [21].
Existence of Tauberian theorems in the general setup of two-persons zero-sum
games with no ergodicity condition remains open in both the discrete and the
continuous settings.
Acknowledgements This article was done as part of the PhD of the first author. Both authors wish
to express their many thanks to Sylvain Sorin for his numerous comments and his great help. We
also thank Hélène Frankowska and Marc Quincampoix for helpful remarks on earlier drafts.
10 A Uniform Tauberian Theorem in Optimal Control 213
Appendix
We give here another proof4 of Theorem 10.5 by using the analoguous result in
discrete time [24] as well as an argument of equivalence between discrete and
continuous dynamic.
Consider a deterministic dynamic programming problem in continuous time as
defined in Sect. 10.2.1, with a state space Ω , a payoff g and a dynamic Γ . Recall
that, for any ω ∈ Ω , Γ (ω ) is the non empty set of feasible trajectories, starting
from ω . We construct an associated deterministic dynamic programming problem
in discrete time as follows.
Let Ω = Ω × [0, 1] be the new state space and let g be the new cost function,
given by g(ω , x) = x. We define a multivalued-function with nonempty values Γ :
Ω ⇒Ω by
1
(ω , x) ∈ Γ(ω , x ) ⇐⇒ ∃X ∈ Γ (ω ), with X(1) = ω and g(X(t)) dt = x.
0
1 n
vn (ω
) = inf ∑ g(ωi )
n i=1
+∞
wλ (ω
) = inf λ ∑ (1 − λ )i−1g(ωi )
i=1
where the infima are taken over the set of sequences {ω i }i∈N such that ω
0 = ω
and
ωi+1 ∈ Γ(ω i ) for every i ≥ 0.
Theorem 10.5 is then the consequence of the following three facts.
Firstly, the main theorem of Lehrer and Sorin in [24], which states that uniform
convergence (on Ω ) of vn to some v is equivalent to uniform convergence of wλ to
the same v.
Secondly, the concatenation hypothesis (10.4) on Γ implies that for any
(ω , x)∈Ω
vn (ω , x) = Vn (ω )
where Vt (ω ) = infX∈Γ (ω ) 1t 0n g(X(s)) ds, as defined in equation (10.7). Conse-
quently, because of the bound on g, for any t ∈ R+ we have
2
|Vt (ω ) − vt (ω , x)| ≤
t
Hence, by equation (10.8) and the bound on the cost function, for any λ ∈]0, 1],
+∞
|Wλ (ω ) − wλ (ω , x)| ≤ λ (1 − λ )t − e−λ t dt
0
converges to 0 as λ tends to 0.
Proof. Since λ 0+∞ (1 − λ )t = λ 0+∞ e−λ t dt = 1, for any λ > 0, the lemma is
equivalent to the convergence to 0 of
+∞
E(λ ) := λ (1 − λ )t − e−λ t dt
0 +
where [x]+ denotes the positive part of x. Now, from the relation 1 − λ ≤ e−λ ,
true for any λ , one can easily deduce that, for any λ > 0, t ≥ 0, the relation
(1 − λ )t eλ t ≤ eλ holds. Hence,
+∞
E(λ ) = λ e−λ t (1 − λ )t eλ t − 1 dt
0 +
+∞
≤λ e−λ t (eλ − 1) dt
0
= eλ − 1
References
1. Alvarez, O., Bardi, M.: Ergodic Problems in Differential Games. Advances in Dynamic Game
Theory, pp. 131–152. Ann. Int’l. Soc. Dynam. Games, vol. 9, Birkhäuser Boston (2007)
2. Alvarez, O., Bardi, M.: Ergodicity, stabilization, and singular perturbations for Bellman-Isaacs
equations. Mem. Am. Math. Soc. 960(204), 1–90 (2010)
3. Arisawa, M.: Ergodic problem for the Hamilton-Jacobi-Bellman equation I. Ann. Inst. Henri
Poincare 14, 415–438 (1997)
10 A Uniform Tauberian Theorem in Optimal Control 215
4. Arisawa, M.: Ergodic problem for the Hamilton-Jacobi-Bellman equation II. Ann. Inst. Henri
Poincare 15, 1–24 (1998)
5. Arisawa, M., Lions, P.-L.: On ergodic stochastic control. Comm. Partial Diff. Eq. 23(11–12),
2187–2217 (1998)
6. Artstein, Z., Gaitsgory, V.: The value function of singularly perturbed control systems. Appl.
Math. Optim. 41(3), 425–445 (2000)
7. Bardi, M., Capuzzo-Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton-Jacobi-
Bellman Equations. Systems & Control: Foundations & Applications. Birkhäuser Boston, Inc.,
Boston, MA (1997)
8. Barles, G.: Some homogenization results for non-coercive Hamilton-Jacobi equations.
Calculus Variat. Partial Diff. Eq. 30(4), 449–466 (2007)
9. Bellman, R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci. U.S.A, 38,
716–719 (1952)
10. Bettiol, P.: On ergodic problem for Hamilton-Jacobi-Isaacs equations. ESAIM: COCV 11,
522–541 (2005)
11. Cardaliaguet, P.: Ergodicity of Hamilton-Jacobi equations with a non coercive non convex
Hamiltonian in R2 /Z2 . Ann. l’Inst. Henri Poincare (C) Non Linear Anal. 27, 837–856 (2010)
12. Carlson, D.A., Haurie, A.B., Leizarowitz, A.: Optimal Control on Infinite Time Horizon.
Springer, Berlin (1991)
13. Colonius, F., Kliemann, W.: Infinite time optimal control and periodicity. Appl. Math. Optim.
20, 113–130 (1989)
14. Evans, L.C.: An Introduction to Mathematical Optimal Control Theory. Unpublished Lecture
Notes, U.C. Berkeley (1983). Available at http:/math.berkeley.edu/∼evans/control.course.pdf
15. Feller, W.: An Introduction to Probability Theory and its Applications, vol. II, 2nd ed. Wiley,
New York (1971)
16. Grune, L.: On the Relation between Discounted and Average Optimal Value Functions. J. Diff.
Eq. 148, 65–99 (1998)
17. Hardy, G.H., Littlewood, J.E.: Tauberian theorems concerning power series and Dirichlet’s
series whose coefficients are positive. Proc. London Math. Soc. 13, 174–191 (1914)
18. Hestenes, M.: A General Problem in the Calculus of Variations with Applications to the
Paths of Least Time, vol. 100. RAND Corporation, Research Memorandum, Santa Monica,
CA (1950)
19. Isaacs, R.: Games of Pursuit. Paper P-257. RAND Corporation, Santa Monica (1951)
20. Isaacs, R.: Differential Games. A Mathematical Theory with Applications to Warfare and
Pursuit, Control and Optimization. Wiley, New York (1965)
21. Kohlberg, E., Neyman, A.: Asymptotic behavior of nonexpansive mappings in normed linear
spaces. Isr. J. Math. 38, 269–275 (1981)
22. Kirk, D.E.: Optimal Control Theory: An Introduction. Englewood Cliffs, N.J. Prentice Hall
(1970)
23. Lee, E.B., Markus, L.: Foundations of Optimal Control Theory. SIAM, Philadelphia (1967)
24. Lehrer, E., Sorin, S.: A uniform Tauberian theorem in dynamic programming. Math. Oper.
Res. 17, 303–307 (1992)
25. Lions, P.-L., Papanicolaou, G., Varadhan, S.R.S.: Homogenization of Hamilton-Jacobi
Equations. Unpublished (1986)
26. Monderer, M., Sorin, S.: Asymptotic Properties in Dynamic Programming. Int. J. Game Theory
22, 1–11 (1993)
27. Pontryiagin, L.S., Boltyanskii, V.G., Gamkrelidge: The Mathematical Theory of Optimal
Processes. Nauka, Moskow (1962) (Engl. Trans. Wiley)
28. Quincampoix, M., Renault, J.: On the existence of a limit value in some non expansive optimal
control problems. SIAM J. Control Optim. 49, 2118–2132 (2011)
29. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39, 1095–1100 (1953)
Chapter 11
E-Equilibria for Multicriteria Games
11.1 Introduction
We know that Game Theory studies conflicts, behaviour and decisions of more
than one rational agent. Players choose their actions in order to achieve preferred
outcomes. Often players have to “optimize” not one but more than one objective and
these are often not comparable, so multicriteria games help us to make decisions in
multi-objective problems. The first observation to make in studying these topics is
that in general there is not an optimal solution from all points of view.
Let us consider, for example, an interactive decision between a seller and a buyer.
The latter wishes to buy a car and he chooses among many models. He has to take
into account the price, the power, the petrol consumption and the dimension of the
car: it must be large enough to be comfortable for his family but not too large to
L. Pusillo ()
Dima – Department of Mathematics, University of Genoa, Via Dodecaneso 35,
16146 Genoa, Italy
e-mail: [email protected]
S. Tijs
CentER and Department of Econometrics and Operations Research, Tilburg University,
P.O. Box 90153, 5000 LE Tilburg, The Netherlands
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 217
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 11,
© Springer Science+Business Media New York 2012
218 L. Pusillo and S. Tijs
have parking problems. The seller wishes to maximize his gain and to sell a good
and useful car in order to satisfy the buyer so, in future, he will come back or he
will recommend this car-dealer to his friends and thus the seller will have more
buyers. We can consider this interaction as a game where the two players have to
“optimize” many criteria. Starting from Vector Optimization (see [3–5]) the theory
of multi-objective games helps us in these multi-objective situations.
Shapley, in 1959, gave a generalization of the classical definition of Nash
equilibrium (the most widely accepted solution for non cooperative scalar games), to
Pareto equilibrium (weak and strong), for non cooperative games with many criteria.
Intuitively a feasible point in Rn is a Pareto equilibrium if there is no other feasible
point which is larger in every coordinate (we will be more precise in the subsequent
pages).
Let us consider the following game in matrix form, where player I has one
criterion and player II has two criteria:
C D
A (2) (5, 0) (0) (0, 1)
B (0) (−1, 0) (0) (0, 1)
(1, 2) (1, 2 + 12 ) (1, 2 + 34 ) (1, 2 + n−1
n ) ···
(0, 0) (0, 0) (0, 0) (0, 0) ···
In this game there are no Nash equilibria (player II does not reach the payoff 3)
but there is an infinite number of NE. Intuitively, approximate equilibria mean that
deviations improve by at most .
There are also games without approximate equilibria, for example the game:
(1, 2) (1, 3) (1, 4) (1, 5) · · ·
(0, 0) (0, 1) (0, 2) (0, 3) · · ·
This game has neither NE nor NE.
Many auction situations lead to games without NE but with approximate NE (see
[6]); in [18], the author proved some theorems about approximate solutions.
11 E-Equilibria for Multicriteria Games 219
a b ⇔ ai ≥ bi ∀ i = 1, . . . , n;
a ≥ b ⇔ a b and a
= b;
a > b ⇔ ai > bi ∀i = 1, . . . , n.
Obviously we mean:
Given a game G = (Xi )i∈N , (ui )i∈N , a strategy profile x ∈ X = ∏k∈N Xk , let (
xi , x−i )
denote the profile (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ).
We recall the definitions of weak and strong Pareto equilibrium:
Definition 11.1. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria game with n players,
Xi is the strategy space for player i ∈ N, X = ∏k∈N Xk , ui : X → Rmi is the utility
function for player i who have mi criteria “to optimize”. A strategy profile x ∈ X
is a
1. Weak Pareto equilibrium if
A potential function is
1, 0, 0 0, 2, 1
P:
0, 5, 1 0, 2, 1
Given the game G = (Xi )i∈N , (ui )i∈N , we mean GP = (Xi )i∈N , P), that is the
game where the utility function is P for all players.
For a finite multicriteria potential game G the set of weak Pareto equilibria is not
empty, wPE(G) = 0/ (see [13]).
Let us give the definition of optimal points of a function with respect to an
improvement set.
Definition 11.6. Given a function P : ∏ Xi → Rm , X = Xi × X−i , we say that
a ∈ OE (P) that is a is an optimal point for the function P with respect to the
improvement set E if (a + E) ∩ P(X) = 0.
/
Definition 11.7. A strategy profile x ∈ X is an E-equilibrium for the multicriteria
game G, where E = (E1 , . . . , Ei , . . . , En ), Ei is the improvement set for player i, if for
each player i and for each xi ∈ Xi it turns out ui (xi , x −i ) ∈
/ ui (
x) + E i .
We write x ∈ OE (G).
Theorem 11.1. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria potential game and
suppose that the potential functions are upper bounded. Let us suppose that there is
a hyperplane which separates E from 0.
Then OE (P(X)) = 0/
Proof. The proof follows by considering the known results about separation
theorems (see for example [3])
Remark that the condition “P upper bounded” is not equivalent to ui upper
bounded as the following example shows.
Example 11.3. Let us consider G = (R, R, u1 , u2 ) where u1 (x, y) = min{x, y} − y =
u2 (y, x)). So u1 (x, y) ≤ 0 and u2 (x, y) ≤ 0, but the potential function P(x, y) =
min{x, y} is not upper bounded on R.
For multicriteria potential games we have the following existence theorem:
Theorem 11.2. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria potential game and
suppose that the potential function P is w-upper bounded. Furthermore w strongly
separates E from {0}. Then OE (G) = 0.
/
Proof. To make the notations easier, we write the proof only in the case of two
players and two objectives, but the proof is analogous for n players with more then
two objectives.
Let y = P(
x) with w, y ≥ sup{w, a; a ∈ P(X)} − t/2.
I want to prove that x = (x1 , x2 ) ∈ OE (G).
224 L. Pusillo and S. Tijs
In this section we study E-optimal points for multicriteria games. In general it is not
easy to find the E-optimal points of a multicriteria game G, but in some important
class of problems we can reduce the research of E-optimal points to search for the
classical equilibria as shown in the following example.
Example 11.4. (a) If E1 = E2 = (0, +∞), m1 = m2 = 1 (that is the game is for one
criterion), it turns out that the E-optimal points are the Nash equilibria for the
game G, for short OE (G) = NE.
(b) If E1 = (, +∞) = E2 , > 0, It turns out that the E-optimal points are the
approximate Nash equilibria, for short OE (G) = NE.
(c) If E1 = Rn+ ; E2 = Rn+ , then the E-optimal points are the strong Pareto equilibria
of the game G, for short OE (G) = sPE(G).
11 E-Equilibria for Multicriteria Games 225
(d) If E1 = Rn++ ; E2 = Rn++ , then the E-optimal points are the weak Pareto
equilibria of the game G, for short OE (G) = wPE(G).
(e) In the paper [13], the improvement set considered for multicriteria potential
games is E = R2+ \ [0, ] × [0, ] (see Fig. 11.3).
(f) In the paper [11] the improvement set considered for multicriteria games is
E = {x ∈ R2 : xi i , i ∈ R++ , i = 1, 2} (see Fig. 11.4).
226 L. Pusillo and S. Tijs
∀ xi ∈ Xi ,C ∩ (ui (
x) + Ei ) = 0/ where C = {z = ui (xi , x
−i )} ⊂ R .
n
Proof. The proof follows from the definition of E-optimal point of a set.
We remind that a relation >E defined on a set E is a preorder if the transitivity
property is valid:
Proof. The proof follows from the definitions and the properties given.
11 E-Equilibria for Multicriteria Games 227
References
1. Borm, P.E.M., Tijs, S.H., van den Aarssen, J.C.M.: Pareto equilibria in multiobjective games.
Methods Oper. Res. 60, 303–312 (1989)
2. Chicco, M., Mignanego, F., Pusillo, L., Tijs, S.: Vector optimization problems via improvement
sets. JOTA 150(3), 516–529 (2011)
3. Ehrgott, M.: Multicriteria Optimization, 2nd ed. Springer, Berlin (2005)
4. Gutierrez, C., Jimenez, B., Novo, V.: A unified approach and optimatlity conditions for
approximate solutions of vector optimization problems. Siam J. Optim. 17(3), 688–710 (2006)
5. Loridan, P.: Well posedness in vector optimization. In: Lucchetti, R., Revalski, J. (eds.)
Recent Developments in Well Posed-Variational Problems, pp. 171–192. Kluwer Academic,
Dordrecht, The Netherland (1995)
6. Klemperer, P.: Auctions: Theory and Practice. Princeton University Press, Princeton, USA
(2004)
7. Mallozzi, L., Pusillo, L., Tijs, S.: Approximate equilibria for Bayesian games. JMAA 342(2),
1098–1102 (2007)
8. Margiocco, M., Pusillo, L.: Stackelberg well posedness and hierarchical potential games. In:
Jorgensen, S., Quincampoix, M., Vincent, T. (eds.) Advances in Dynamic Games – Series:
Annals of the International Society of Dynamic Games, pp. 111–128. Birkhauser, Boston
(2007)
9. Margiocco, M., Pusillo, L.: Potential games and well-posedness. Optimization 57, 571–579
(2008)
10. Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14, 124–143 (1996)
228 L. Pusillo and S. Tijs
11. Morgan, J.: Approximations and well-posedness in multicriteria games. Ann. Oper. Res. 137,
257–268 (2005)
12. Owen, G.: Game Theory, 3rd ed. Academic, New York (1995)
13. Patrone, F., Pusillo, L., Tijs, S.: Multicriteria games and potentials. TOP 15, 138–145 (2007)
14. Pieri, G., Pusillo, L.: Interval values for multicriteria cooperative games. Auco Czech Econ.
Rev. 4, 144–155 (2010)
15. Puerto Albandoz, J., Fernandez Garcia F.: Teoria de Juegos Multiobjetivo, Imagraf Impresores
S.A., Sevilla (2006)
16. Pusillo, L.: Interactive decisions and potential games. J. Global Optim. 40, 339–352 (2008)
17. Shapley, L.S.: Equilibrium points in games with vector payoffs. Naval Res. Logistic Quart. 6,
57–61 (1959)
18. Tijs S.H., equilibrium point theorems for two persons games. Methods Oper. Res. 26,
755–766 (1977)
19. Voorneveld, M.: Potential games and interactive decisions with multiple criteria. Dissertation
series n. 61. CentER of Economic Research- Tilburg University (1999)
Chapter 12
Mean Field Games with a Quadratic
Hamiltonian: A Constructive Scheme
Olivier Guéant
Abstract Mean field games models describing the limit case of a large class of
stochastic differential games, as the number of players goes to +∞, were introduced
by Lasry and Lions [C R Acad Sci Paris 343(9/10) (2006); Jpn. J. Math. 2(1)
(2007)]. We use a change of variables to transform the mean field games equations
into a system of simpler coupled partial differential equations in the case of a
quadratic Hamiltonian. This system is then used to exhibit a monotonic scheme
to build solutions of the mean field games equations.
12.1 Introduction
Mean field games (MFG) equations were introduced by Lasry and Lions [4–6]
to describe the dynamic equilibrium of stochastic differential games involving a
continuum of players.
Formally, we consider a continuum of agents, each agent being described by a
position Xt ∈ Ω [Ω being typically (0, 1)d ] following a stochastic process dXt =
at dt + σ dWt . In this stochastic process, at is controlled by the agent and Wt is
a Brownian motion specific to the agent under investigation—this independence
hypothesis being central in the sequel. These agents will interact in a mean field
O. Guéant ()
UFR de Mathématiques, Laboratoire Jacques-Louis Lions, Université Paris-Diderot,
175 rue du Chevaleret, 75013 Paris, France
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 229
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 12,
© Springer Science+Business Media New York 2012
230 O. Guéant
fashion, and we denote by m(t, ·) the probability distribution function describing the
distribution of the agents at time t.1
Each agent optimizes the same objective function, though possibly starting from
a different position x0 :
T
sup E ( f (Xt , m(t, Xt )) − C(at )) dt + uT (XT ) X0 = x0 ,
(at )t 0
where f and uT are functions whose regularity will be described subsequently and
where C is a convex (cost) function.
To this problem we associate the so-called MFG equations. These equations
consist in a backward Hamilton–Jacobi–Bellman (HJB) equation coupled with a
forward Kolmogorov (K) transport equation:
with prescribed initial condition m(0, ·) = m0 (·) ≥ 0 and terminal condition u(T, ·) =
uT (·), where H is the Legendre transform of the cost function C.
2
In this paper, we focus on the particular case of the quadratic cost C(a) = a2 and
2
hence a quadratic Hamiltonian H(p) = p2 . In this special case, a change of variables
was introduced by Guéant et al. in [3] to write the MFG equations as two coupled
heat equations with similar source terms. If indeed we introduce φ = exp σu2 and
ψ = m exp − σu2 , then the system reduces to
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ ,
2 σ
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ ,
2 σ
with φ (T, ·) = exp uσT (·)
2
0 (·)
and ψ (0, ·) = φm(0,·) .
We use this system to exhibit a constructive scheme for solutions to the MFG
equations. This constructive scheme, proposed by Lasry and Lions in [7], starts
1
with ψ 0 = 0 and builds recursively two sequences (φ n+ 2 )n and (ψ n+1 )n using the
following equations:
1 In our case, this assumption consists only in assuming that the initial datum is a probability
distribution function m0 .
12 Mean Field Games with a Quadratic Hamiltonian: A Constructive Scheme 231
1 σ2 1 1 1 1
∂t φ n+ 2 + Δ φ n+ 2 = − 2 f (x, φ n+ 2 ψ n )φ n+ 2 ,
2 σ
σ2 1 1
∂t ψ n+1 − Δ ψ n+1 = 2 f (x, φ n+ 2 ψ n+1 )ψ n+1 ,
2 σ
with φ n+ 2 (T, ·) = exp uTσ(·) and ψ n+1 (0, ·) = n+m01(·) .
1
2
φ 2 (0,·)
1
Then, φ and ψ are obtained as the monotonic limit of the two sequences (φ n+ 2 )n
and (ψ n )n under the usual assumptions on f .
In Sect. 12.2, we recall the change of variables and derive the associated system
of coupled parabolic equations. Section 12.3 is devoted to the introduction of the
functional framework, and we prove the main monotonicity properties of the system.
Section 12.4 presents a constructive scheme and proves that we can have two
monotonic sequences converging toward φ and ψ . Section 12.5 then gives additional
properties on the constructive scheme regarding the absence of mass conservation.
σ2 1
(HJB) ∂t u + Δ u + |∇u|2 = − f (x, m),
2 2
σ2
(K) ∂t m + ∇ · (m∇u) = Δ m,
2
with:
• Boundary conditions: ∂∂ nu = ∂∂ mn = 0 on (0, T ) × ∂ Ω ;
• Terminal condition: u(T, ·) = uT (·), a given payoff whose regularity is to be
specified;
• Initial condition: m(0, ·) = m0 (·) ≥ 0, a given positive function in L1 (Ω ),
typically a probability distribution function.
The change of variables introduced in [3] is recalled in the following proposition:
Proposition 12.1. Let us consider a smooth solution (φ , ψ ) of the following system
(S), with φ > 0:
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ (Eφ ),
2 σ
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ (Eψ ),
2 σ
232 O. Guéant
with:
∂φ ∂ψ
• Boundary conditions: = 0 on (0, T ) × ∂ Ω .
=
∂n
∂n
• Terminal condition: φ (T, ·) = exp uσT (·)
2 .
m0 (·)
• Initial condition: ψ (0, ·) = φ (0,·) .
∂t φ ∇φ Δφ |∇φ |2
∂t u = σ 2 , ∇u = σ 2 Δu = σ2 − σ2 2 .
φ φ φ φ
Hence
σ2 1 ∂t φ σ 2 Δ φ
∂t u + Δ u + |∇u|2 = σ 2 +
2 2 φ 2 φ
σ2 1
= − 2 f (x, φ ψ )φ
φ σ
= − f (x, m).
∂t m = ∂t φ ψ + φ ∂t ψ ∇ · (∇um) = σ 2 ∇ · (∇φ ψ ) = σ 2 [Δ φ ψ + ∇φ · ∇ψ ]
Δ m = Δ φ ψ + 2∇φ · ∇ψ + φ Δ ψ .
Hence
∂t m + ∇ · (∇um) = ∂t φ ψ + φ ∂t ψ + σ 2 [Δ φ ψ + ∇φ · ∇ψ ]
= ψ ∂t φ + σ 2 Δ φ + φ ∂t ψ + σ 2 ∇φ · ∇ψ
2
σ 1
=ψ Δ φ − 2 f (x, φ ψ )φ
2 σ
2
σ 1
+φ Δ ψ + 2 f (x, φ ψ )ψ + σ 2 ∇φ · ∇ψ
2 σ
σ2 σ2
= Δ φ ψ + σ 2 ∇φ · ∇ψ + φ Δ ψ
2 2
σ2
= Δ m.
2
12 Mean Field Games with a Quadratic Hamiltonian: A Constructive Scheme 233
This proves the result since the boundary conditions and the initial and terminal
conditions are coherent.
Now we will focus our attention on the study of the preceding system of
equations (S) and use it to design a constructive scheme for the couple (φ , ψ ) and
thus for the couple (u, m) under regularity assumptions.
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ (Eφ ),
2 σ
with ∂∂ φn = 0 on (0, T ) × ∂ Ω and φ (T, ·) = exp uTσ(·)
2 .
Hence Φ : ψ ∈ P0 → φ ∈ P is well defined.
Moreover, ∀ψ ∈ P0 , φ = Φ (ψ ) ∈ P for = exp − σ12 (uT ∞ + f ∞ T ) .
2 Interms of the initial MFG problem, the optimal control ∇u and the subsequent distribution m are
not changed if we subtract f ∞ to f .
234 O. Guéant
12.3.1 Compactness
Common energy estimates [1] give that there exists a constant C that only depends
on uT ∞ , σ , and f ∞ such that ∀(ψ , ϕ ) ∈ P0 × L2 (0, T, L2 (Ω ))
Hence Fψ maps the closed ball BL2 (0,T,L2 (Ω )) (0,C) to a compact subset of
BL2 (0,T,L2 (Ω )) (0,C).
12.3.2 Continuity
compactness result that we can extract from (φn )n a new sequence denoted (φn )n
that converges in the L2 (0, T, L2 (Ω )) sense toward a function φ . To prove that Fψ is
continuous, we then need to show that φ cannot be different from Fψ (ϕ ).
Now, because of the energy estimates, we know that φ is in P and that we can
extract another subsequence (still denoted (φn )n ) such that
• φn → φ in the L2 (0, T, L2 (Ω )) sense;
• ∇φn ∇φ weakly in L2 (0, T, L2 (Ω ));
• ∂t φn ∂t φ weakly in L2 (0, T, H −1 (Ω ));
and
• ϕn → ϕ almost everywhere.
By definition, we have that ∀w ∈ L2 (0, T, H 1 (Ω )):
T T
σ2
∂t φn (t, ·), w(t, ·)H −1 (Ω ),H 1 (Ω ) dt − ∇φn (t, x) · ∇w(t, x) dx dt
0 2 0 Ω
T
1
=− 2 f (x, ϕn (t, x)ψ (t, x))φn (t, x)w(t, x) dx dt.
σ 0 Ω
Hence φ = Fψ (ϕ ).
I (t) = − ∂t φ (t, x)φ (t, x)− dx
Ω
1
=− ∇φ (t, x) · ∇(φ (t, x)− ) −
f (x, φ (t, x)ψ (t, x))φ (t, x)φ (t, x)− dx
Ω σ2
1
=− −|∇φ (t, x)|2 1φ (t,x)≤0 + 2 f (x, φ (t, x)ψ (t, x))(φ (t, x)− )2 dx
Ω σ
1
= |∇φ (t, x)| 1φ (t,x)≤0 −
2
f (x, φ (t, x)ψ (t, x))(φ (t, x)− )2 dx
Ω Ω σ2
≥ 0.
1
+ 2 f (x, φ (t, x)ψ (t, x))φ (t, x)(φ (t) − φ (t, x))+ dx
σ
1
≥ 2 f ∞ φ (t) + f (x, φ (t, x)ψ (t, x))φ (t, x) (φ (t) − φ (t, x))+ dx
σ Ω
1
≥ 2 ( f ∞ + f (x, φ (t, x)ψ (t, x))) φ (t, x)(φ (t) − φ (t, x))+ dx
σ Ω
≥ 0.
1
≥ ( f (x, φ1 (t, x)ψ1 (t, x))φ1 (t, x) − f (x, φ2 (t, x)ψ2 (t, x))φ2 (t, x))
σ2 Ω
×(φ2 (t, x) − φ1 (t, x))+ dx
1
≥ ( f (x, φ1 (t, x)ψ2 (t, x))φ1 (t, x) − f (x, φ2 (t, x)ψ2 (t, x))φ2 (t, x))
σ2 Ω
×(φ2 (t, x) − φ1 (t, x))+ dx
≥ 0.
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ (Eψ )
2 σ
1
The scheme we consider involves two sequences (φ n+ 2 )n and (ψ n )n that are built
using the following recursive equations:
ψ 0 = 0,
1
∀n ∈ N, φ n+ 2 = Φ (ψ n ),
1
∀n ∈ N, ψ n+1 = Ψ (φ n+ 2 ).
Proof. By immediate induction, we obtain from Propositions 12.2 and 12.4 that the
two sequences are well defined and in the appropriate spaces.
1
Now, as far as monotonicity is concerned, we have that ψ 1 = Ψ (φ 2 ) ≥ 0 = ψ 0 .
Hence, if for a given n ∈ N we have ψ n+1 ≥ ψ , then Proposition 3 gives
n
3 1
φ n+ 2 = Φ (ψ n+1 ) ≤ Φ (ψ n ) = φ n+ 2 .
1 T 1 1
=− f (x, φ n + 2 (t, x)ψ n (t, x))φ n + 2 (t, x)w(t, x) dx dt.
σ 0 Ω
2
T
1
=− f (x, φ (t, x)ψ (t, x))φ (t, x)w(t, x) dx dt.
σ2 0 Ω
240 O. Guéant
In this chapter, we exhibited a monotonic way to build a solution to the system (S).
To understand well the nature of the change of variables and of the constructive
1
scheme we used, let us introduce the sequence (mn+1 )n , where mn+1 = φ n+ 2 ψ n+1 .
From Theorem 12.1, we know that (m )n converges almost everywhere and in L1
n+1
toward the function m = φ ψ for which we have the conservation of mass along the
trajectory. However, this property is not true for mn+1 as it is stated in the following
proposition.
Proposition 12.6. Let us consider n ∈ N and let us denote by M n+1 (t) = Ω mn+1
(t, x) dx the total mass of mn+1 at date t.
Then, there may be a loss of mass along the trajectory in the sense that:
d n+1 1 1
M (t) = ψ n+1 (t, x)φ n+ 2 (t, x) × f x, ψ n+1 (t, x)φ n+ 2 (t, x)
dt Ω
1
− f x, ψ n (t, x)φ n+ 2 (t, x) dx ≤ 0.
d n+1 1
M (t) = ∂t φ n+ 2 (t, ·), ψ n+1 (t, ·)H −1 (Ω ),H 1 (Ω )
dt
1
+∂t ψ n+1 (t, ·), φ n+ 2 (t, ·)H −1 (Ω ),H 1 (Ω )
12 Mean Field Games with a Quadratic Hamiltonian: A Constructive Scheme 241
σ2 1
= ∇φ n+ 2 (t, x) · ∇ψ n+1(t, x) dx
2 Ω
1 1 1
− 2 φ n+ 2 (t, x)ψ n+1 (t, x) f x, φ n+ 2 (t, x)ψ n (t, x) dx
σ Ω
σ2 1
− ∇φ n+ 2 (t, x) · ∇ψ n+1 (t, x) dx
2 Ω
1 1 1
+ 2 φ n+ 2 (t, x)ψ n+1 (t, x) f x, φ n+ 2 (t, x)ψ n+1 (t, x) dx
σ Ω
1
= ψ n+1 (t, x)φ n+ 2 (t, x)
Ω
1
1
× f x, ψ n+1 (t, x)φ n+ 2 (t, x) − f x, ψ n (t, x)φ n+ 2 (t, x)
≤ 0.
This property shows that the constructive scheme is rather original since it
basically consists in building probability distribution functions using sequences
of functions in L1 that only have the right total mass asymptotically. Despite this
absence of mass conservation, a discrete counterpart of this constructive scheme is
developed in a work in progress [2] to numerically compute approximations of the
solutions.
References
1. Evans, L.C.: Partial Differential Equations (Graduate Studies in Mathematics, vol. 19).
American Mathematical Society, Providence, RI (2009)
2. Guéant, O.: Mean field games equations with quadratic Hamiltonian: a specific approach. Math.
Models Methods Appl. Sci., 22, (2012)
3. Guéant, O., Lasry, J.M., Lions, P.L.: Mean field games and applications. In: Paris Princeton
Lectures on Mathematical Finance (2010)
4. Lasry, J.-M., Lions, P.-L.: Jeux champ moyen. I. Le cas stationnaire. C. R. Acad. Sci. Paris
343(9), 619–625
5. Lasry, J.-M., Lions, P.-L.: Jeux champ moyen. II. Horizon fini et contrôle optimal. C. R. Acad.
Sci. Paris 343(10), 679–684 (2006)
6. Lasry, J.-M., Lions, P.-L.: Mean field games. Jpn. J. Math. 2(1), 229–260 (2007)
7. Lasry, J.-M., Lions, P.-L.: Cours au collège de france: théorie des jeux champs moyens. http://
www.college-de-france.fr/default/EN/all/equ der/audio video.jsp. (2008)
Chapter 13
Differential Game-Theoretic Approach
to a Spatial Jamming Problem
13.1 Introduction
In the past few years, considerable research has been done to deploy multiple
UAVs in a decentralized manner to carry out tasks in military as well as civilian
scenarios. UAVs have shown promise in a wide range of applications. The recent
availability of low-cost UAVs suggests the use of teams of vehicles to perform
various tasks such as mapping, surveillance, search and tracking operations [10,43].
For these applications, there has been much focus to deploy teams of multiple UAVs
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 245
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 13,
© Springer Science+Business Media New York 2012
246 S. Bhattacharya and T. Başar
In this section, we first introduce a communication model between two mobile nodes
in the presence of a jammer. Then we present the mobility models for the nodes. We
conclude the section by formally formulating the problems we study in the paper.
248 S. Bhattacharya and T. Başar
Consider a mobile node (receiver) receiving messages from another mobile node
(transmitter) at some frequency. Both communicating nodes are assumed to be
lying on a plane. Consider a third node that is attempting to jam the communication
channel in between the transmitter and the receiver by sending a high power noise
at the same frequency. This kind of jamming is referred to as trivial jamming. Two
other types of jamming are:
1. Periodic jamming: This refers to a periodic noise pulse being generated by the
jammer irrespective of the packets that are put on the network.
2. Intelligent jamming: In this mode of jamming a jammer is put in a promiscuous
mode to destroy primarily the control packets.
A variety of metrics can be used to compare the effectiveness of various jamming
attacks. Some of these metrics are energy efficiency, low probability of detection,
and strong denial of service [27, 29]. In this paper, we use the ratio of the jamming-
power to the signal-power (JSR) as the metric. From [32], we have the following
models for the JSR (ξ ) at the receiver’s antenna.
1. Rn model
PJT GJR GRJ n log10 ( DDTR )
ξ= 10 JR
PT GTR GRT
3. Nicholson
PJT GJR GRJ 4 log10 ( DDTR )
ξ= 10 JR
PT GTR GRT
where PJT is the power of the jammer transmitting antenna, PT is the power of the
transmitter, GTR is the antenna gain from transmitter to receiver, GRT is the antenna
gain from receiver to transmitter, GJR is the antenna gain from jammer to receiver,
GRJ is the antenna gain from receiver to jammer, hJ is the height of the jammer
antenna above the ground, hT is the height of the transmitter antenna above the
ground, DTR is the Euclidean distance between the transmitter and the receiver, and
DJR is the Euclidean distance between the jammer and the transmitter. All the above
models are based on the propagation loss depending on the distance of the jammer
and the transmitter from the receiver. In all the above models JSR is dependent on
the ratio DDTR
JR
.
For digital signals, the jammer’s goal is to raise the ratio to a level such that the
bit error rate [33] is above a certain threshold. For analog voice communication,
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 249
xi
y
i
O x
the goal is to reduce the articulation performance so that the signals are difficult to
understand. Hence we assume that the communication channel between a receiver
and a transmitter is considered to be jammed in the presence of a jammer if ξ ≥ ξtr
where ξtr is a threshold determined by many factors including application scenario
and communication hardware. If all the parameters except the mutual distances
between the jammer, transmitter and receiver are kept constant, we can conclude the
following from all the above models: If the ratio DDTR JR
≥ η then the communication
channel between a transmitter and a receiver is considered to be jammed. Here η is
a function of ξ , PJT , PT , GTR , GRT , GJR , GRJ and DTR . Hence if the transmitter is not
within a disc of radius η DJR centered around the receiver, then the communication
channel is considered to be jammed. We call this disc as the perception range. The
perception range for any node depends on the distance between the jammer and the
node. For effective communication between two nodes, each node should be able
to transmit as well as receive messages from the other node. Hence two nodes can
communicate if they lie in each other’s perception range.
We will adopt the above jamming and communication model, for the rest of the
paper.
We now describe the kinematic model of the nodes. In our analysis, each node
is a UAV. We assume that the UAVs are having a constant altitude flight. This
assumption helps to simplify our analysis to a planar case. Referring to Fig. 13.1,
the configuration of the ith UAV in the network can be expressed in terms of the
variables (xi , yi , φi ) in the global coordinate frame. The pair (xi , yi ) represents the
position of a reference point on UAVi with respect to the origin of the global
250 S. Bhattacharya and T. Başar
reference frame and φi denotes the instantaneous heading of the UAVi in the global
reference frame. Hence the state space for UAVi is Xi R2 × S1 . In our analysis, we
assume that the UAVs are a kinematic system and hence the dynamics of the UAVs
are not taken into account in the differential equation governing the evolution of the
system. The kinematics of the UAVs are assumed to be the following:
⎡ ⎤ ⎡ ⎤
ẋ cos φi
Ẋi := ⎣ ẏ ⎦ = ⎣ sin φi ⎦ =: fi (xi , σi1 ) (13.1)
φ̇ σi1
where, σi1 is the angular speed of UAVi . We assume that σi1 ∈ Ui {φ : [0,t] →
[−1, +1] | φ (·) is measurable}. Since the jammer is also an aerial vehicle we
model its kinematics as a UAV. The motion of the jammer is governed by a set of
equations similar to (13.1). The configuration of each jammer is expressed in terms
of the variables (xi , yi , φi ) and the kinematics is given by the following equation:
⎡ ⎤ ⎡ ⎤
ẋi cos φi
Ẋi := ⎣ ẏi ⎦ = ⎣ sin φi ⎦ =: fi (xi , σi2 )
φ˙i σi2
where, σi2 ∈ Ui as defined earlier. The state space for jammer i is Xi R2 × S1 . The
state space of the entire system is X = X1 × · · · × Xn × X1 × · · · × Xm R2(n+m) ×
(S1 )n+m . We use the following notation in order to represent the dynamics of the
entire system:
From the communication model presented in the previous section, the connectivity
of the network of UAVs depends on their position relative to the jammers. Given
m UAVs in the network, we define a graph G on m vertices as follows. The
vertex corresponding to UAVi is labeled as i. An edge exists between vertices i
and j iff there is a communication link between UAVi and UAV j . We define the
communication network to be disconnected when G has more than one component.
In this problem, the existence of a communication link between two nodes depends
on the relative distance of the two nodes from the jammers. Using the above model
for the communication network, we present the following problem statement.
Assume that G is initially connected. The jammers intend to move in order
to disconnect the communication network in minimum amount of time possible.
The UAVs must move in order to maintain the connectivity of the network for the
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 251
maximum possible amount of time. We want to compute the motion strategies for
the UAVs in the network. Our interest lies in understanding the spatial reconfigura-
tion of the formation so that the jammers can be evaded.
In this section, we introduce the concept of optimal strategies for the vehicles.
Given the control histories of the vehicles, {σi1 (·)}ni=1 , {σi2 (·)}m
i=1 , the outcome
of the game is denoted by π : X × Uin+m → R and is defined as the time of
termination of the game:
where t f denotes the time of termination of the game when the players play
({σi1 (·)}ni=1 , {σi2 (·)}m
i=1 ) starting from an initial point x0 ∈ X. The game terminates
when the communication network gets disconnected. The objective of the jammer
is to minimize the termination time and the objective of the UAVs is to maximize it.
Since the objective function of the team of UAVs is in conflict with that of the
team of jammers, the problem can be formulated as a multi-player zero sum team
game. A play ({σi1∗ (·)}ni=1 , {σi2∗ (·)}m i=1 ) is said to be the optimal for the players if
it satisfies the following conditions:
{σi2∗ }m
i=1 = arg min π [x0 , {σi (·)}i=1 , {σi (·)}i=1 ]
1∗ n 2 m
{σi2 }m
i=1
In the above expressions σ−1 j is used to represent the controls of all the UAVs
except UAV j . Similarly, σ−2 j is used to represent the controls of all the jammers
except the jth jammer. From (13.5), we can conclude that there is no motivation for a
player to deviate from its equilibrium strategy. In general, there may be multiple sets
of strategies for the players that are in Nash equlibrium. Assuming the existence of
a value, as defined in (13.4) and the existence of a unique Nash equilibrium, we can
conclude that the Nash equilibrium concept of person-by-person optimality given in
(13.5) is a necessary condition to be satisfied for the set of optimal strategies for the
players and furthermore, computing the set of strategies that are in Nash equilibrium
also gives us the set of optimal strategies. In the following analysis, we assume the
aforementioned conditions in order to compute the optimal strategies.
The following theorem provides a relation between the optimal strategy of each
player and the gradient of the value function, ∇J.
Theorem 13.1. Assuming that J(x) is a smooth function of x, the optimal strategies
({σi1∗ }ni=1 , {σi2∗ }m
i=1 ) satisfy the following condition:
Proof. The proof essentially follows the two-player version as provided in [19].
Let us consider a play after time t has elapsed from the beginning of the game at
which point the players are at a state x. The outcome functional is provided by the
following expression:
t+h
π (x(t), {σi1(·)}ni=1 , {σi2 (·)}m
i=1 ) = dt + J(x(t + h))
t
Using Taylor series approximation of J we obtain the following relation:
(13.7)
where (h) is a vector with each entry belonging to o(h). Let δ be defined as follows:
Using the Taylor series approximation for J around the point x(t), we get the
following expression for the RHS of (13.7):
n+m
= ∑ ∇J · δ + |δ |o(|δ |)
i=1
n m
=h ∑ (Jxi cos φi + Jyi sin φi + Jφi σi1 ) + ∑ (Jxi cos φi + Jyi sin φi + Jφi σi2 ) + α (h)
i=1 i=1
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 253
(13.8)
First, let us consider the controls of the jammer. From the Nash property, we can
conclude that if σ 1j = σ 1∗
j , ∀ j ∈ [1, · · · , n] and σ−i = σ−i then σi = σi minimizes
2 2∗ 2 2∗
the left hand side of the above equation. Therefore, we can conclude the following:
1. The optimal control satisfies the following condition:
In a similar manner the controls of the UAVs satisfy the following condition:
2. In the case, when σij = σij∗ ∀i, j = 1, 2 in (13.8), we obtain the following
relation:
n
π x(t), {σi }i=1 , {σi }i=1 = J(x(t)) + h 1 + ∑ (Jxi cos φi + Jyi sin φi + Jφi σi1 )
1∗ n 2∗ m
i=1
J(x(t))
m
+ ∑ (Jxi cos φi + Jyi sin φi + Jφi σi2 ) + α (h)
i=1
n
⇒ h 1+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi1
i=1
m
+∑ (Jxi cos φi +Jyi sin φi +Jφi σi2 )+α (h) =0
i=1
n
⇒ 1+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi1
i=1
m
+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi2 +α (h)=0
i=1
254 S. Bhattacharya and T. Başar
(13.9), (13.10) and (13.11) extend the Isaacs’ conditions that provide the optimal
controls for two-player zero-sum differential games to the case of three-player zero-
sum differential games.
From [19], the Hamiltonian of the system is given by the following expression:
n m n
H x, ∇J, σi1∗ i=1 , σi2∗ i=1 = 1 + ∑ Jxi cos φi + Jyi sin φi + Jφi σi1
i=1
m
+ ∑ Jxi cos φi + Jyi sin φi + Jφi σi2
i=1
which is the left side of (13.11). Hence, (13.11) can equivalently be expressed as:
In addition to the above conditions, the value function also satisfies the PDE
given in the following theorem.
Theorem 13.2. The value function follows the following partial differential equa-
tion (PDE) along the optimal trajectory, namely the retrogressive path equation
(RPE)
˚ = ∂H
(∇J) (13.13)
∂x
J˚xi = 0, J˚yi = 0
J˚φi = −Jxi sin φi + Jyi cos φi (13.14)
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 255
J˚xi = 0, J˚yi = 0
In many problems the value functions are not smooth enough to satisfy the Isaacs
equations. Many papers have worked around this difficulty, especially Fleming
[15, 16], Friedman [17], Elliott and Kalton [11, 12], Krassovski and Subbotin [21],
and Subbotin [41]. In [9], the authors present a new notion of “viscosity“ solution
for Hamilton-Jacobi equations and prove the uniqueness of such solutions in a
wide variety of situations. In [24], the author shows that the dynamic programming
optimality condition for the value function in differential control theory problems
implies that this value function is the viscocity solution of the associated HJB PDE.
The foregoing conclusions turn out to extend to differential game theory. In [36], the
authors show that in the context of differential games, the dynamic programming
optimality conditions imply that the values are viscosity solutions of appropriate
partial differential equations. In [13], the authors present a simplification of the
previous work. This work is based on the smoothness assumption of the value
function.
In the next section, we discuss the computation of the terminal manifold for the
game.
G̃ = {G|λ2 (L(G)) = 0}
G̃ = {G | Fλ (λ , G)|λ =0 = 0}
where Mii is the minor of L(G) corresponding to the diagonal element in the ith row.
Substituting the above relation in Theorem 13.3 leads to the following equation for
the variable {ai j }m
i, j=1 at the terminal manifold:
m
∑ Mii = 0 (13.16)
i=1
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 257
gi j (xi , yi , x j , y j , {xk }m m
k=1 , {yk }k=1 ) ≥ 0
The set of states of the UAVs and the jammers that represent a disconnected
communication network is given by the following expression:
R= (gi j (xi , yi , x j , y j , {xk }m m
k=1 , {yk }k=1 ) ≥ 0)
i, j
The terminal manifold of the game is given by the boundary of the region R,
∂ R. The above expression characterizes the terminal manifold of the game. The
value of the game at termination is identically zero. In this analysis, we compute
the gradiant of the value at an interior point in a connected component of the
terminal manifold. Since the terminal manifold is discontinuous, optimal trajectories
emanating from them will give rise to singular surfaces [1, 23] which is a topic
of ongoing research. Assuming that a single connected component of the terminal
manifold, M, is a hypersurface in R3(n+m) we can parametrize it using 3(m + n) − 1
variables. Therefore, the tangent space at any point on M has a basis containing
3(m + n) − 1 elements, ti . Since J ≡ 0 along M we obtain the following set of
3(m + n) − 1 equations:
∇J 0 · ti = 0 ∀ i (13.17)
(13.18)
Given a termination condition, we can compute ∇J 0 from (13.17) and (13.18). This
provides the boundary conditions for the RPE presented in the previous section.
In the next section, we present some examples.
13.5 Examples
In this section, we compute the optimal trajectories for the aerial vehicles in two
scenarios involving different numbers of UAVs and jammers.
258 S. Bhattacharya and T. Başar
u J
X o
o u = u max sign Jxi fi (x) Jxi = u Jxi fi (x)
x = f (x ,u ,t) i i xi
13.5.1 n = 3, m = 1
First, we consider the case when a single jammer tries to disconnect the communi-
cation network in a team containing three UAVs. As stated in the previous section,
the cost function of the game is the time of termination. The equations of motion of
the vehicles are given as follows:
1. UAVs
2. Jammer
Under appropriate assumptions on the value function, let J(x) represent the value at
the point x in the state space.
From Theorem 13.1, the expression for the optimal controls can be given as
follows:
σi1∗ = sign(Jφi ), i = 1, 2, 3
σ 2∗ = −sign(Jφ ) (13.20)
This control is then fed into the plant of the respective UAV. The plant updates the
state variables based on the kinematic equations governing the UAV. Finally the
sensors feed back the state variables into the controllers. In this case the sensors
measure the position and the orientation of each UAV (Table 13.1).
The Laplacian of the graph representing the connectivity of the communication
network is given by the following matrix:
⎡ ⎤
−(a12 + a13) a12 a13
L(G) = ⎣ a12 −(a12 + a23) a23 ⎦
a13 a23 −(a13 + a23)
From the above form of the Laplacian, we obtain the following expression for
F(λ , G).
The set of triples (a12 , a23 , a13 ) that satisfy Eq. (13.22) are (1, 0, 0), (0, 1, 0),
(0, 0, 1) and (0, 0, 0); see Table 13.1. The first three values represent the situation
in which the communication exists only between a pair of UAVs at termination
(Fig. 13.3). The last triple represents the scenario in which there is no communica-
tion between any pair of UAVs (Fig. 13.4).
Let us consider a termination situation corresponding to the triple (1, 0, 0). Let Dir
represent the closed disk of radius r centered at UAVi . Let ∂ R denote the boundary
of a region R. From the jamming model, we can infer that the jammer must lie
in region R1 = (D3η r31 ∪ D1η r31 ) ∩ (D3η r32 ∪ D2η r32 )/(D1η r12 ∪ D2η r12 ). The termination
manifold is represented by the hypersurfaces ∂ R1 . An example of such a situation is
when the jammer is on the boundary, ∂ D3η r , where r = min{r31 , r32 }. The terminal
manifold is characterized by the following equation.
(x − x3 )2 + (y − y3 )2 = r (13.23)
260 S. Bhattacharya and T. Başar
1
2
3
3
2
Fig. 13.3 The jammers can lie in the shaded region for a network graph of the form shown on
the right hand side. (a) Single-pixel camera developed at Rice University. (b) Images taken by a
single-pixel camera
3 1
2
3
Fig. 13.4 The jammers can lie in the shaded region for a network graph of the form shown on
the right hand side. (a) Single-pixel camera developed at Rice University. (b) Images taken by a
single-pixel camera
∂ y x − x3
=
∂ x3 r2 − (x − x3)2
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 261
∂ y
=1
∂ y3
∂ y x − x3
= (13.25)
∂x
r − (x − x3)2
2
Substituting the above values of the gradients in Eq. (13.23), we get the following
expression for Jy0 :
1
Jy0 = (13.26)
cos φ3 ∂∂ xy + sin φ3 ∂∂ yy + cos φ ∂∂ yx − sin φ
3 3
1
Jy0 = (13.27)
√ 2 x −x 3 2 (cos φ3 + cos φ ) + (sin φ3 − sin φ )
r −(x −x3 )
For the triple (0, 0, 0), the jammer must lie in the region R2 = (D3η r31 ∪ D1η r31 ) ∩
(Dη r32 ∪D2η r32 )∩(D1η r12 ∪D2η r12 ). An analysis similar to the above can be carried out
3
in order to compute the trajectories emanating back from the terminal conditions.
Figure 13.5 shows a simulation of the trajectories of the UAVs and the jammers
from a terminal state. The final states (x, y, φ ) of the three UAVs is given by
(20, −10, 0), (40, 30, 0.14) and (−20, 10, 0.15). The final state of the jammer is
given by (50, −10, −0.17). The figure on the left shows the trajectories of the UAVs.
The jammer traces the path shown on the extreme right. The figure on the right
shows the connectivity of the UAVs. The network of UAVs is initally connected. At
termination, the jammers disconnect the network by isolating one of the UAVs.
Next, we consider the case when there is a couple of jammers trying to disconnect
a team of four UAVs.
262 S. Bhattacharya and T. Başar
13.5.2 n = 4, m = 2
As in the earlier section, the equations of motion of the vehicles are given as follows:
1. UAVs
ẋi = cos φi , ẏi = sin φi , φ̇ = σi1 , i = 1, 2, 3, 4
2. Jammer
ẋi = cos φi , ẏi = sin φi , φ̇i = σi2 , i = 1, 2
UAVi is used to represent the ith UAV in the formation and UAVJi is used to
represent the ith jammer in the formation. Under appropriate assumptions of the
value function as discussed in Sect. 13.3, let J(x) represent the value at the point x
in the state space.
From Theorem 13.1, the expression for the optimal controls can be given as
follows:
σi1∗ = sign(Jφi ), i = 1, 2, 3
σ 2∗
= −sign(Jφ ) (13.28)
where “ ˚ ” denotes derivative with respect to retrograde time. The Laplacian of the
adjacency matrix is given by the following:
⎡ ⎤
−(a12 + a13 + a14 ) a12 a13 a14
⎢ ⎥
⎢ a12 −(a12 + a23 + a24 ) a23 a24 ⎥
L(G) = ⎢ ⎥
⎣ a13 a23 −(a13 + a23 + a34 ) a34 ⎦
a14 a24 a34 −(a14 + a24 + a34 )
From the above form of the Laplacian, we obtain the following expression for
F(λ , x).
Table 13.2 Table shows the a12 a13 a14 a23 a24 a34 Fλ (0, x)
value of Fλ (0.G) for all
possible combinations of 0 0 0 0 0 0 0
(a12 , a13 , a23 , a24 , a34 ) 1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
1 1 0 0 0 0 0
1 0 1 0 0 0 0
1 0 0 1 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
0 1 1 0 0 0 0
0 1 0 1 0 0 0
0 1 0 0 1 0 0
0 1 0 0 0 1 0
0 0 1 1 0 0 0
0 0 1 0 1 0 0
0 0 1 0 0 1 0
0 0 0 1 1 0 0
0 0 0 1 0 1 0
0 0 0 0 1 1 0
0 0 0 1 1 1 0
1 1 1 0 0 1 0
1 0 1 0 1 0 0
1 1 0 1 0 0 0
+ 4λ (a12a13 a14 + a12a13 a24 + a12a13 a34 + a12a14 a23 + a12 a14 a34
+ a12 a23 a24 + a12 a23 a34 + a12 a24 a34 + a13 a14 a23 + a13 114 a24 + a13a23 a24
+ a13 a24 a34 + a14a23 a24 + a14a23 a34 + a14a24 a34 )
Fλ (0, G) = 4(a12 a13 a14 + a12a13 a24 + a12a13 a34 + a12 a14 a23 + a12a14 a34
+a12 a23 a24 + a12 a23 a34 + a12a24 a34 + a13a14 a23 + a13a14 a24
+a13 a23 a24 + a13 a24 a34 + a14a23 a24 + a14a23 a34 + a14a24 a34 ) = 0
Table 13.2 enumerates all the combinations of (a12 , a13 , a14 , a23 , a24 , a34 ) for
which Fλ (0, G) = 0. From these values of the edge variables, we can construct all
the graphs that are disconnected. Figure 13.6 shows all the equivalence classes of
disconnected graphs under the equivalence relation of isomorphism.
264 S. Bhattacharya and T. Başar
Fig. 13.6 All the equivalence classes of graphs that are disconnected under the equivalence
relation of isomorphism
1 2 2
3 4
3
Fig. 13.7 The jammers can lie in the shaded region for a network graph of the form shown on
the right hand side. (a) Single-pixel camera developed at Rice University. (b) Images taken by a
single-pixel camera
Now we consider a situation in which the network graph G has the edge structure
as shown in Fig. 13.6(4) at termination time. In order to attain the target network
structure, at least one of the jammers has to lie in the shaded region shown in
Fig. 13.7. Let R = (D1η r12 ∩ D2η r12 ∩ D3η r34 ∩ D4η r34 )/(D1η r13 ∪ D2η r24 ∪ D3η r13 ∪ D4η r24 ).
Consider a termination situation in which UAVJ1 ∈ ∂ D2η r12 ∩ ∂ R and UAVJ2 ∈ /
(D1η r12 ∪ D2η r12 ∪ D3η r34 ∪ D4η r34 ). In other words, only UAVJ1 is responsible for
disconnecting the communication network.
The terminal manifold is characterized by the following equation:
(x1 − x2 )2 + (y1 − y2 )2 = r (13.30)
(13.31)
From (13.30), we obtain the following expressions for the derivatives of the
dependent variable y1 :
∂ y1 x − x2
= 1
∂ x2 r2 − (x1 − x3 )2
∂ y1
=1
∂ y2
∂ y1 x − x2
= 1 (13.32)
∂ x1 r2 − (x1 − x2 )2
Substituting the above values of the gradients in Eq. (13.11), we get the following
expression for Jy0 :
1
1
Jy0 = ∂ y ∂ y ∂ y
(13.33)
1
cos φ2 ∂ x1 + sin φ2 ∂ y1 + cos φ1 ∂ x1 − sin φ1
2 2 1
1
Jy0 = x1 −x2
(13.34)
1
√2 (cos φ2 + cos φ1 ) + (sin φ2 − sin φ1 )
r −(x1 −x2 )2
Figure 13.8 shows a simulation of the trajectories of the UAVs and the jammers
from a terminal state. The final states (x, y, φ ) of the four UAVs are given by
(40, 10, 0), (20, 20, 0.14), (20, −20, −0.25) and (40, −10, −0.55). The final states
of the four jammers are given by (30, 0, 0.15) and (10, 0, −0.17). The figure on
the left shows the trajectories of the UAVs. The two UAVs on the extreme right
represent the jammers. The figure on the right shows the connectivity of the UAVs.
The network of UAVs is initally connected. At termination, the jammers disconnect
the network into two disjoint components.
266 S. Bhattacharya and T. Başar
13.6 Conclusions
References
1. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. SIAM Series in
Classics in Applied Mathematics, Philadelphia (1999)
2. Bhattacharya, S., Başar, T.: Game-theoretic analysis of an aerial jamming attack on a UAV
communication network. In: Proceedings of the American Control Conference, pp. 818–823.
Baltimore, Maryland (2010)
3. Bhattacharya, S., Başar, T.: Graph-theoretic approach to connectivity maintenance in mobile
networks in the presence of a jammer. In: Proceedings of the IEEE Conference on Decision
and Control, pp. 3560–3565. Atlanta (2010)
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 267
4. Bhattacharya, S., Başar, T.: Optimal strategies to evade jamming in heterogeneous mobile
networks. Autonomous Robots, 31(4), 367–381 (2011)
5. Biggs, N.: Algebraic Graph Theory. Cambridge University Press, Cambridge, U.K., 1993.
6. Breakwell, J.V., Hagedorn, P.: Point capture of two evaders in succession. J. Optim. Theory
Appl. 27, 89–97 (1979)
7. Cagalj, M., Capcun, S., Hubaux, J.P.: Wormhole-based anti-jamming techniques in sensor
networks. IEEE Trans. Mobile Comput. 6, 100–114 (2007)
8. Chen, L.: On selfish and malicious behaviours in wireless networks – a non-cooperative game
theoretic approach. PhD Thesis. Ecole Nationale Superieure des Telecommunications, Paris
(2008)
9. Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton–Jacobi equations. Trans. Am.
Math. Soc. 277(1), 1–42 (1976)
10. Ding, X.C., Rahmani, A., Egerstedt, M.: Optimal multi-UAV convoy protection. Conf. Robot
Commun. Config. 9, 1–6 (2009)
11. Elliott, R.J., Kalton, N.J.: Boundary value problems for nonlinear partial differential operators.
J. Math. Anal. Appl. 46, 228–241 (1974)
12. Elliott, R.J., Kalton, N.J.: Cauchy problems for certain Isaacs–Bellman equations and games
of survival. Trans. Am. Math. Soc. 198, 45–72 (1974)
13. Evans, L.C., Souganidis, P.E.: Differential games and representation formulas for solutions of
Hamilton–Jacobi–Isaacs equations. Indiana Univ. Math. J. 33(5), 773–797 (1984)
14. Fax, J.A., Murray, R.M.: Information flow and cooperative control of vehicle formations. IEEE
Trans. Autom. Control 9, 1465–1474 (2004)
15. Fleming, W.: The Cauchy problem for degenerate parabolic equations. J. Math. Mech. 13,
987–1008 (1964)
16. Fleming, W.H.: The convergence problem for differential games. Ann. Math. 52, 195–210
(1964) [Princeton University Press, Princeton]
17. Friedman, A.: Differential Games. Wiley, New York (1971)
18. Hagedorn, P., Breakwell, J.V.: A differential game of approach with two pursuers and one
evader. J. Optim. Theory Appl. 18, 15–29 (1976)
19. Isaacs, R.: Differential Games. Wiley, New York (1965)
20. Karp, R.: Reducibility among combinatorial problems. Complexity of Computer Computa-
tions, pp. 85–103. Plenum, New York (1972)
21. Krassovski, N., Subbottin, A.: Jeux Differentiels. Mir Press, Moscow (1977)
22. Levchenkov, A.Y., Pashkov, A.G.: Differential game of optimal approach of two inertial
pursuers to a noninertial evader. J. Optim. Theory Appl. 65, 501–518 (1990)
23. Lewin, J.: Differential Games: Theory and Methods for Solving Game Problems with Singular
Surfaces. Springer, London (1994)
24. Lions, P.L.: Generalized Solutions of Hamilton–Jacobi Equations. Pitman, Boston (1982)
25. Mesbahi, M.: On state-dependent dynamic graphs and their controllability properties. IEEE
Trans. Autom. Control 50, 387–392 (2005)
26. Mitchell, I.M., Tomlin, C.J.: Overapproximating reachable sets by Hamilton–Jacobi projec-
tions. J. Sci. Comput. 19, 323–346 (2003)
27. Noubir, G., Lin, G.: Low power denial of service attacks in data wireless lans and counter-
measures, ACM SIGMOBILE Mobile Computing and Communications Review, 7(3), 29–30
(2003)
28. Olfati-Saber, R., Murray, R.M.: Consensus problems in networks of agents with switching
topology and time delay. IEEE Trans. Autom. Control, 49(9), 1520–1533 (2004)
29. Papadimitratos, P., Haas, Z.J.: Secure routing for mobile ad hoc networks In: Proceedings of
SCS Communication Networks and Distributed Systems Modeling and Simulation Conference
(CNDS 2002), San Antonio, Texas, 1, 27–31 (2002)
30. Pashkov, A.G., Terekhov, S.D.: A differential game of approach with two pursuers and one
evader. J. Optim. Theory Appl. 55, 303–311 (1987)
31. Pavone, M., Savla, K., Frazzoli, E.: Sharing the load: mobile robotic networks in dynamic
environments. IEEE Robot. Autom. Mag. 16, 52–61 (2009)
268 S. Bhattacharya and T. Başar
32. Poisel, R.A.: Modern Communication Jamming Principles and Techniques. Artech, Norwood
(2004)
33. Proakis, J.J., Salehi, M.: Digital Communications. McGraw-Hill, New York (2007)
34. Samad, T., Bay, J.S., Godbole, D.: Network-centric systems for military operations in urban
terrian: the role of UAVs. Proc. IEEE, 95(1), 92–107 (2007)
35. Shankaran, S., Stipanović, D., Tomlin, C.: Collision avoidance strategies for a three player
game. In: Preprints of the 9th International Symposium on Dynamic Games and Applications.
Wroclaw, Poland (2008)
36. Souganidis, P.E.: Approximation schemes for viscosity solutions of Hamilton–Jacobi equa-
tions. PhD Thesis. University of Wisconsin-Madison (1983)
37. Starr, A.W., Ho, Y.C.: Nonzero-sum differential games. J. Optim. Theory Appl. 3, 184–206
(1969)
38. Stipanović, D.M., Hwang, I., Tomlin, C.J.: Computation of an over-approximation of the
backward reachable set using subsystem level set functions. In: Dynamics of Continuous,
Discrete and Impulsive Systems, Series A: Mathematical Analysis, 11, 399–411 (2004)
39. Stipanović, D.M., Shankaran, S., Tomlin, C.: Strategies for agents in multi-player pursuit-
evasion games. In: Preprints of the Eleventh International Symposium on Dynamic Games
and Applications. Tucson, Arizona (2004)
40. Stipanović, D.M., Melikyan, A.A., Hovakimyan, N.V.: Some sufficient conditions for multi-
player pursuit evasion games with continuous and discrete observations. Ann. Int. Soc. Dyn.
Games 10, 133–145 (2009)
41. Subbottin, A.: A generalization of the basic equation of the theory of differential games. Soviet
Math. Doklady 22, 358–362 (1980)
42. Tanner, A.J.H., Jadbabaie, A., Pappas, G.: Flocking in fixed and switching networks. IEEE
Trans. Autom. Control 5, 863–868 (2007)
43. Tisdale, J., Kim, Z., Hedrick, J.: Autonomous UAV path planning and estimation. IEEE Robot.
Autom. Mag. 16, 35–42 (2009)
44. Vaisbord, E.M., Zhukovskiy, V.I.: Introduction to Multi-player Differential Games and Their
Applications. Gordon and Breach, New York (1988)
45. Wood, A.D., Stankovic, J.A., Son, S.H.: Jam: a jammed-area mapping service for sensor
networks. In: Proceedings of 24th IEEE Real-Time Systems Symposium (RTSS 03), pp. 286–
297, December 2003
46. Wood, A.D., Stankovic, J.A., Zhou, G.: Deejam: defeating energy-efficient jamming in IEEE
802.15.4 basedwireless networks. In: Proceedings of 4th Annual IEEE Conference on Sensor,
Mesh and Ad Hoc Communications and Networks (SECON 07), pp. 60–69 (2007)
47. Wu, X., Wood, T., Trappe, W., Zhang, Y.: Channel surfing and spatial retreats: defenses against
wireless denial of service. In: Proceedings of 3rd ACM Workshop on Wireless Security (WiSe
04), pp. 80–89 (2004)
48. Zhukovskiy, V.I., Salukvadze, M.E.: The Vector Valued Maxmin, vol. 193. Academic,
San Diego (1994)
Chapter 14
Study of Linear Game with Two Pursuers
and One Evader: Different Strength of Pursuers
Abstract The paper deals with a problem of pursuit-evasion with two pursuers and
one evader having linear dynamics. The pursuers try to minimize the final miss
(an ideal situation is to get exact capture), the evader counteracts them. Results of
numerical construction of level sets (Lebesgue sets) of the value function are given.
A feedback method for producing optimal control is suggested. The paper includes
also numerical simulations of optimal motions of the objects in various situations.
14.1 Introduction
Group pursuit-evasion games (several pursuers and/or several evaders) are studied
intensively in the theory of differential games [2, 4, 6, 7, 11, 16, 17, 20].
From a general point of view, a group pursuit-evasion game (without any hi-
erarchy among players) can be often treated as an antagonistic differential game
where all pursuers are joined into one player, whose objective is to minimize some
functional, and, similarly, all evaders are joined into another player, who is the
opponent to the first one. The theory of differential games gives an existence theo-
rem for the value function of such a game. But, usually, any more concrete results
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 269
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 14,
© Springer Science+Business Media New York 2012
270 S.A. Ganebny et al.
In Fig. 14.1, one can see a possible initial location of the pursuers and evader when
they move towards each other. Also, the evader can move from both pursuers, or
from one of them, but towards another pursuer.
Let us assume that the initial velocities are parallel and quite large, and control
accelerations affect only lateral components of object velocities. Thus, one can
suppose that instants of passages of the evader by each of the pursuers are fixed.
Below, we call them termination instants and denote by T f 1 and T f 2 , respectively.
We consider both cases of equal and different termination instants. The players’
controls define the lateral deviations of the evader from the first and second pursuers
at the termination instants. Minimum of absolute values of these deviations is called
the resulting miss. The objective of the pursuers is minimization of the resulting
miss, the evader maximizes it. The pursuers generate their controls by a coordinated
effort (from one control center).
In the relative linearized system, the dynamics is the following (see [12, 13]):
Here, y1 and y2 are the current lateral deviations of the evader from the first and
second pursuers; aP1 , aP2 , aE are the lateral accelerations of the pursuers and evader;
u1 , u2 , v are the players’ command controls; AP1 , AP2 , AE are the maximal values
of the accelerations; lP1 , lP2 , lE are the time constants describing the inertiality of
servomechanisms.
The controls have bounded absolute values:
τi = T f i − t, h(α ) = e−α + α − 1.
We have xi (T f i ) = yi (T f i ).
Passing to a new dynamics in “equivalent” coordinates x1 and x2 (see [12, 13]),
we obtain
Join both pursuers P1 and P2 into one player which will be called the first player.
The evader E is the second player. The first player governs the controls u1 and u2 ;
the second one governs the control v. We introduce the following payoff functional:
ϕ x1 (T f 1 ), x2 (T f 2 ) = min x1 (T f 1 ), x2 (T f 2 ) . (14.3)
It is minimized by the first player and maximized by the second one. Thus, we get
a standard antagonistic game with dynamics (14.2) and payoff functional (14.3).
This game has [1, 8–10] the value function V (t, x), where x = (x1 , x2 ). For each
initial position (t0 , x0 ), the value V (t0 , x0 ) of the value function V equals the pay-
off guaranteed for the first (second) player by its optimal feedback control. Each
level set
272 S.A. Ganebny et al.
Fig. 14.2 Various variants of the stable bridge evolution in an individual game
Wc = (t, x) : V (t, x) ≤ c
of the value function coincides with the maximal stable bridge (see [9, 10]) built
from the target set
Mc = (t, x) : t = T f 1 , |x1 | ≤ c ∪ (t, x) : t = T f 2 , |x2 | ≤ c .
The set Wc can be treated as the solvability set for the pursuit-evasion game with the
result c.
When c = 0, we have the situation of the exact capture. The exact capture implies
equality to zero of at least one of yi at the instant T f i , i = 1, 2.
The works [12, 13] consider only cases with the exact capture, and pursuers
“stronger” than the evader. The latter means that the parameters APi , AE , and lPi ,
lE (i = 1, 2) are such that the maximal stable bridges in the individual games (P1 vs.
E and P2 vs. E) grow monotonically in the backward time.
Considering individual games of each pursuer vs. the evader, one can introduce
parameters [18] μi = APi /AE and εi = lE /lPi . They and only they define the structure
of the maximal stable bridges in the individual games. Namely, depending on values
of μi and μi εi , there are four cases of the bridge evolution (see Fig. 14.2):
• Expansion in the backward time (a strong pursuer)
• Contraction in the backward time (a weak pursuer)
• Expansion of the bridge until some backward time instant and further contraction
• Contraction of the bridge until some backward time instant and further expansion
(if the bridge still has not broken).
Respectively, given combinations of pursuers’ capabilities and individual games
durations (equal/different), there are significant number of variants for the problem
with two pursuers and one evader. Some of them are considered below.
14 Study of Linear Game with Two Pursuers and One Evader 273
The main objective of this paper is to construct the sets Wc for typical cases
of the game under consideration. The difficulty of the problem is that the time
sections Wc (t) of these sets are non-convex. Constructions are made by means of
an algorithm for constructing maximal stable bridges worked out by the authors for
problems with two-dimensional state variable. The algorithm is similar to one used
in [15]. Another objective is to build optimal feedback controls of the first player
(that is, of the pursuers P1 and P2) and the second one (the evader E).
As it was mentioned above, a level set Wc of the value function is the maximal
stable bridge for dynamics (14.2) built in the space t, x from the target set Mc . A time
section Wc (t) of the bridge Wc at the instant t is a set in the plane of two-dimensional
variable x.
To be definite, let T f 1 ≥ T f 2 . Then for any t ∈ (T f 2 , T f 1 ], the set Wc (t) is a vertical
strip around the axis x2 . Its width along the axis x1 equals the width of the bridge in
the individual game P1 vs. E at the instant τ = T f 1 − t of the backward time. At the
instant t = T f 1 , the half-width of Wc (T f 1 ) is equal to c.
Denote by Wc (T f 2 + 0) the right limit of the set Wc (t) as t → T f 2 + 0. Then the
set Wc (T f 2 ) is cross-like obtained by union of the vertical strip Wc (T f 2 + 0) and a
horizontal one around the axis x1 with the width equal 2c along the axis x2 .
When t ≤ T f 2 , the backward construction of the sets Wc (t) is made starting from
the set Wc (T f 2 ).
The algorithm which is suggested by the authors for constructing the appro-
ximating sets W c (t), uses a time grid in the interval [0, T f 1 ]: tN = T f 1 , tN−1 , . . . ,
tS = T f 2 , tS−1 ,tS−2 , . . . . For any instant tk from the taken grid, the set W c (tk ) is
built on the basis of the previous set W c (tk+1 ) and a dynamics obtained from (14.2)
by fixing its value at the instant tk+1 . So, dynamics (14.2) which varies in the
interval (tk ,tk+1 ] is changed by a dynamics with simple motions [8]. The set W c (tk )
is regarded as a collection of all positions at the instant tk , wherefrom the first player
guarantees guiding the system to the set W c (tk+1 ) under “frozen” dynamics (14.2)
and discrimination of the second player, that is, when the second player announces
its constant control v, |v| ≤ 1, in the interval [tk ,tk+1 ].
Due to symmetry of dynamics (14.2) and the sets Wc (T f 1 ), Wc (T f 2 ) with respect
to the origin, one gets that for any t ≤ T f 1 the time section Wc (t) is symmetric also.
Up to now, different workgroups suggested many algorithms for constructing
the value function in differential games of quite generic type (see, for example,
[3, 5, 14, 21]). The problem under consideration has linear dynamics and the second
order on the phase variable. Due to this, we use a specific method. This allows us to
make very fast computations of many variants of the game.
274 S.A. Ganebny et al.
Fig. 14.3 Two strong pursuers, equal termination instants: time sections of the bridge W
14 Study of Linear Game with Two Pursuers and One Evader 275
target set M0 which is the union of two coordinate axes. Further, at the instants
t = 4, 2, 0, the cross thickens, and two triangles are added to it. The widths of
the vertical and horizontal parts of the cross correspond to sizes of the maximal
stable bridges in the individual games with the first and second pursuers. These
triangles are located in the II and IV quadrants (where the signs of x1 and x2 are
different, in other words, when the evader is between the pursuers) and give the
zone where the capture is possible only under collective actions of both pursuers
(trying to avoid one of the pursuer, the evader is captured by another one).
These additional triangles have a simple explanation from the point of view of
problem (14.1). Their hypotenuses have slope equal to 45◦ , that is, are described
by the equation |x1 | + |x2 | = const. Consider the instant τ when the hypotenuse
reaches a point (x1 , x2 ). It corresponds to the instant when the pursuers cover
together the distance |x1 (0)| + |x2 (0)| which is between them at the initial in-
stant t = 0. Therefore, at this instant, both pursuers come to the same point. Since
the evader was initially between the pursuers, it is captured at this instant.
The set W (maximal stable bridge) built in the coordinates of system (14.2)
coincides with the description of the solvability set obtained analytically in [12,
13]. The solvability set for system (14.1) is defined as follows: if in the current
position of system (14.1) at the instant t, the forecasted coordinates x1 , x2 are
inside the time section W (t), then under the controls u1 , u2 the motion is guided
to the target set M0 ; on the contrary, if the forecasted coordinates are outside the
set W (t), then there is an evader’s control v which deviates system (14.2) from
the target set. Therefore, there is no exact capture in the original system (14.1).
Time sections Wc (t) of other bridges Wc , c > 0, have the shape similar to W (t).
In Fig. 14.4, one can see the sections Wc (t) at t = 2 (τ = 4) for a collection {Wc }
corresponding to some series of values of the parameter c. For other instants t,
the structure of the sections Wc (t) is similar. The sets Wc (t) describe the value
function x → V (t, x).
2. Feedback control of the first player. Rewrite system (14.2) as
We see that the vector D1 (t) (D2 (t)) is directed along the horizontal (vertical)
axis; when T f 1 = T f 2 , the angle between the axis x1 and the vector E(t) equals
45◦ ; when T f 1
= T f 2 , the angle changes in time.
Analyzing the change of the value function V along a horizontal line in the
plane x1 , x2 for a fixed instant t, one can conclude that the minimum of the
function is reached in the segment of intersection of this line and the set W (t).
276 S.A. Ganebny et al.
Fig. 14.4 Two strong pursuers, equal termination instants: level sets of the value function, t = 2
With that, the function is monotonic at both sides of the segment. For points at
the right (at the left) from the segment, the control u1 = 1 (u1 = −1) directs the
vector D1 (t)u1 to the minimum.
Splitting the plane into horizontal lines and extracting for each line the seg-
ment of minimum of the value function, one can gather these segments into a set
in the plane and draw a switching line through this set which separates the plane
into two parts at the instant t. At the right from this switching line, we choose the
control u1 = 1, and at the left the control is u1 = −1. On the switching line, the
control u1 can be arbitrary obeying the constraint |u1 | ≤ 1. The easiest way is to
take the vertical axis x2 as the switching line.
In the same way, using the vector D2 (t), we can conclude that the horizontal
axis x1 can be taken as the switching line for the control u2 .
Thus,
⎧
⎪
⎪ 1, if xi > 0,
⎨
∗
ui (t, x) = −1, if xi < 0, (14.4)
⎪
⎪
⎩
any ui ∈ [−1, 1], if xi = 0.
The switching lines (the coordinate axes) at any t divide the plane x1 , x2 into
4 cells. In each of these cells, the optimal control (u∗1 , u∗2 ) of the first player is
constant.
14 Study of Linear Game with Two Pursuers and One Evader 277
The vector control u∗1 (t, x), u∗2 (t, x) is applied in a discrete scheme (see [9,
10]) with some time step Δ : a chosen control is kept constant during a time
step Δ . Then, on the basis of the new position, a new control is chosen, etc.
When Δ → 0, this control guarantees to the first player a result not greater
than V (t0 , x0 ) for any initial position (t0 , x0 ).
3. Feedback control of the second player. Now let us describe the optimal control
of the second player. When T f 1 = T f 2 , the vectogram E(t)v : v ∈ [−1, 1] of the
second player in system (14.2) is a segment parallel to the diagonal of I and III
quadrants. Thus, the second player can shift the system along this line only.
Using the sets Wc (t) at some instant t, let us analyze the change of the func-
tion x → V (t, x) along the lines parallel to this diagonal. Consider an arbitrary
line from this collection such that it passes through the II quadrant. One can
see that local minima are attained at points where the line crosses the axes Ox1
and Ox2 , and a local maximum is in the segment where the line passes through the
rectilinear diagonal part of the boundary of some level set of the value function.
The situation is similar for lines passing through the IV quadrant.
Thus, the switching lines for the second player’s control v can be constructed
from three parts: the axes Ox1 and Ox2 , and some slope line Π (t). The latter
has two half-lines passing through the middles of the diagonal parts on the level
set boundaries in the II and IV quadrants. In our case, when the position of the
system is on the switching line, the control v can take arbitrary values |v| ≤ 1.
Inside each of 6 cells, to which the plane is divided by the switching lines, the
control is taken either v = +1, or v = −1. Such a control pulls the system towards
the points of maximum. Applying this control in a discrete scheme with time
step Δ , the second player guarantees that the result will be not less than V (t0 , x0 )
for any initial position (t0 , x0 ) as Δ → 0.
Note. Since W (t) = ∅, then the global minimum of the function x → V (t, x) is
attained at any x ∈ W (t) and equal 0. Thus, when the position (t, x) of the system
is such that x ∈ W (t), the players can choose, generally speaking, any controls
under their constraints. If x ∈ / W (t), the choices should be made according to the
rules described above and based on the switching lines.
4. Optimal motions. In Fig. 14.5, one can see the results of optimal motion simula-
tions. This figure contains time sections W (t) (thin solid lines; the same sections
as in Fig. 14.3), switching lines Π (0) at the initial instant and Π (6) at the ter-
mination instant of the direct time (dotted lines), and two trajectories for two
different initial positions: ξI (t) (thick solid line) and ξII (t) (dashed line). The
motion ξI (t) starts from the point x01 = 40, x02 = −25 (marked by a black circle)
which is inside the initial section W (0) of the set W . So, the evader is captured:
the endpoint of the motion (also marked by a black circle) is at the origin. The
initial point of the motion ξII (t) has coordinates x01 = 25, x02 = −50 (marked by a
star). This position is outside the section W (0), and the evader escapes from the
exact capture: the endpoint of the motion (also marked by a star) has non-zero
coordinates.
278 S.A. Ganebny et al.
Fig. 14.5 Two strong pursuers, equal termination instants: result of optimal motion simulation
Fig. 14.6 Two strong pursuers, equal termination instants: trajectories in the original space
Figure 14.6 gives the trajectories of the objects in the original space. Values of
longitudinal components of the velocities are taken such that the evader moves
towards the pursuers. For all simulations here and below, we take
Solid lines correspond to the first case when the evader is successfully captured
(at the termination instant, the positions of both pursuers are the same as the po-
sition of the evader). Dashed lines show the case when the evader escapes: at the
14 Study of Linear Game with Two Pursuers and One Evader 279
termination instant no one of the pursuers superposes with the evader. In this case,
one can see that the evader aims itself to the middle between the terminal positions
of the pursuers (this guarantees the maximum of the payoff functional ϕ ).
Take the parameters as in the previous section, except the termination instants. Now
they are T f 1 = 7 and T f 2 = 5. Investigation results are shown in Figs. 14.7–14.9.
The maximal stable bridge W = W0 for system (14.2) with the taken target set
M0 = {t = T f 1 , x1 = 0} ∪ {t = T f 2 , x2 = 0}
is built in the following way. At the instant τ1 = 0 (that is, t = T f 1 ), the section of the
bridge coincides with the vertical axis x1 = 0. At the instant τ1 = 2 (that is, t = T f 2 ),
we add the horizontal axis x2 = 0 to the bridge expanded during passed time period.
Further, the time sections of the bridge are constructed using standard procedure
under relation τ2 = τ1 − 2.
In the same way, bridges Wc , c > 0, corresponding to the target sets
Mc = {t = T f 1 , |x1 | ≤ c} ∪ {t = T f 2 , |x2 | ≤ c}
can be built.
Fig. 14.7 Two strong pursuers, different termination instants: the bridge W and optimal motions
280 S.A. Ganebny et al.
Fig. 14.8 Two strong pursuers, different termination instants: level sets of the value function, t = 2
Fig. 14.9 Two strong pursuers, different termination instants: trajectories in the original space
Results of construction of the set W are given in Fig. 14.7. When τ1 > 2, time
sections W (t) grow both horizontally and vertically; two additional triangles appear,
but now they are curvilinear. Analytical description of these curvilinear parts of the
boundary is difficult. Due to this, in [12, 13], there is only an upper estimation for
the solvability set for this variant of the game.
Total structure of the sections Wc (t) at t = 2 (τ1 = 5, τ2 = 3) is shown in Fig. 14.8.
Optimal feedback controls of the pursuers and evader are constructed in the same
way as in the previous example, except that the switch line Π (t) for the evader is
formed by the corner points of the additional curvilinear triangles of the sets Wc (t),
c ≥ 0.
14 Study of Linear Game with Two Pursuers and One Evader 281
In Fig. 14.7, the trajectory for the initial point x01 = 50, x02 = −25 is shown as a
solid line between two points marked by starts. The trajectories in the original space
are shown in Fig. 14.9. One can see that at the beginning the evader escapes from the
second pursuer and goes down, after that the evader’s control is changed to escape
from the first pursuer and the evader goes up.
Now we consider a variant of the game when both pursuers are weaker than the
evader. Let us take the parameters
AP1 = 0.9, AP2 = 0.8, AE = 1, lP1 = lP2 = 1/0.7, lE = 1,
Fig. 14.10 Two weak pursuers, different termination instants: time sections of the maximal stable
bridge W2.0
beginning of the pursuit, the evader closes to the first (lower) pursuer. It is done to
increase the miss from the second (upper) pursuer at the instant T f 2 . Further closing
is not reasonable, and the evader switches its control to increase the miss from the
first pursuer at the instant T f 1 .
14 Study of Linear Game with Two Pursuers and One Evader 283
Fig. 14.11 Two weak pursuers, different termination instants: switching lines and optimal controls
for the first player (the pursuers), t = 0
Let us change the parameters of the second pursuer in the previous example and
take the following parameters of the game:
Now the evader is more maneuverable than the second pursuer, and an exact capture
by this pursuer is unavailable. Assume T f 1 = 5, T f 2 = 7.
In Fig. 14.14, there are sections of the maximal stable bridge W5.0 (that is, for
c = 5.0) for six instants: t = 7.0, 5.0, 2.5, 1.4, 1.0, 0.0. The horizontal part of its
time section W5.0 (τ ) decreases with growth of τ , and breaks further. The vertical
part grows. Even after breaking the individual stable bridge of the second pursuer
(and respective collapse of the horizontal part of the cross), additional capture zones
still exist and are kept in time.
284 S.A. Ganebny et al.
Fig. 14.12 Two weak pursuers, different termination instants: switching lines and optimal controls
for the second player (the evader), t = 0
Fig. 14.13 Two weak pursuers, different termination instants: trajectories of the objects in the
original space
Switching lines of the first and second players for the instant t = 1 are given in
Figs. 14.15 and 14.16. These lines are obtained by processing collection Wc (t = 1)
computed for different values of c. In comparison with the previous case of two
weak pursuers, the switching lines for the first player have simpler structure.
14 Study of Linear Game with Two Pursuers and One Evader 285
Fig. 14.14 One strong and one weak pursuers, different termination instants: time sections of the
maximal stable bridge W5.0
Here, as in the previous section, the trajectories of the objects are drawn in the
original space only (see Fig. 14.17). For simulations, the initial lateral deviations are
taken as x01 = 20, x02 = −20. Longitudinal components of the velocities are such that
the evader moves towards one pursuer, but from another.
286 S.A. Ganebny et al.
Fig. 14.15 One strong and one weak pursuers, different termination instants: switching lines and
optimal controls for the first player (the pursuers), t = 1
Another interesting case is when the pursuers have equal capabilities such that, at
the beginning of the backward time, the bridges in the individual games contract
and further expand. That is, at the beginning of the direct time, the pursuers have
advantage over the evader, but at the final stage the evader is stronger.
Parameters of the game are taken as follows:
Fig. 14.16 One strong and one weak pursuers, different termination instants: switching lines and
optimal controls for the second player (the evader), t = 1
Fig. 14.17 One strong and one weak pursuers, different termination instants: trajectories of the
objects in the original space
Fig. 14.18 Varying advantage of the pursuers, equal termination instants: time sections of the
maximal stable bridge W1.5
only the triangles constitute the time section of the bridge (the central left subfigure).
Further, the triangles continue to contract, so they become two pentagons separated
by an empty space near the origin (the central right subfigure in Fig. 14.18). Trans-
formation to pentagons can be explained in the following way: the first player using
14 Study of Linear Game with Two Pursuers and One Evader 289
Fig. 14.19 Varying advantage of the pursuers, equal termination instants: switching lines and
optimal controls for the first player (the pursuers), t = 0
its controls expands the triangles vertically and horizontally, and the second player
contracts them in diagonal direction. So, vertical and horizontal edges appear, but
the diagonal part becomes shorter. Also, in general, size of each figure decreases
slowly.
Due to action of the second player, at some instant, the diagonal disappears, and
the pentagons convert to squares (this is not shown in Fig. 14.18). After that, the
pursuers take advantage, and total contraction is changed by growth: the squares
start to enlarge. When some time passes, due to the growth, the squares touch each
other at the origin (the lower-left subfigure in Fig. 14.18). Since the enlargement
continues, their sizes grow, and the squares start to overlap forming one “eight-like”
shape (the lower-right subfigure in Fig. 14.18).
Figures 14.19 and 14.20 show time sections of a collection of maximal stable
bridges and switching lines for the first and second players, respectively, for the
instant t = 0.
As above, the simulated trajectories are shown in the original space only. For
simulation, the following initial conditions are taken: x01 = 5, x02 = −20. Longitudi-
nal components of the velocities are such that the evader moves from both pursuers.
290 S.A. Ganebny et al.
Fig. 14.20 Varying advantage of the pursuers, equal termination instants: switching lines and
optimal controls for the second player (the evader), t = 0
Fig. 14.21 Varying advantage of the pursuers, equal termination instants: trajectories of the
objects in the original space
The computed trajectories are given in Fig. 14.21. As it was said earlier, since at
the final stage of interception the pursuers are weaker than the evader, they cannot
guarantee the exact capture but only some non-zero level of the miss.
14 Study of Linear Game with Two Pursuers and One Evader 291
14.9 Conclusion
Presence of two pursuers acting together and minimizing the miss from the evader
leads to non-convexity of time sections of the value function when the situation
is considered as a standard antagonistic differential game where both pursuers are
joined into one player. In the paper, results of numerical study of this problem are
given for several variants of the parameters. The structure of the solution depends
on the presence or absence of dynamic advantage of one or both pursuers over
the evader. Optimal feedback control methods of the pursuers and evader are built
by preliminary construction and processing the level (Lebesgue) sets of the value
function (maximal stable bridges) for some quite fine grid of values of the payoff.
Switching lines obtained for each scalar component of controls depend on time,
and only they, not the level sets, are used for generating controls. Optimal controls
are produced at any current instant depending on the location of the state point
respectively to the switching lines at this instant. Accurate proof of the suggested
optimal control method needs for some additional study.
Acknowledgements This work was supported by Program of Presidium RAS “Dynamic Systems
and Control Theory” under financial support of UrB RAS (project no. 12-Π -1-1002) and also by
the Russian Foundation for Basic Research under grants nos. 10-01-96006 and 11-01-12088.
References
1. Bardi, M., Capuzzo-Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton–Jacobi–
Bellman Equations. Birkhauser, Boston (1997)
2. Blagodatskih, A.I., Petrov, N.N.: Conflict Interaction Controlled Objects Groups. Udmurt State
University, Izhevsk, Russia (2009). (in Russian)
3. Cardaliaguet, P., Quincampoix, M., Saint-Pierre, P.: Set-valued numerical analysis for optimal
control and differential games. In: Bardi, M., Raghavan, T.E., Parthasarathy, T. (eds.) Annals of
the International Society of Dynamic Games, vol. 4, pp. 177–247. Birkhauser, Boston (1999)
4. Chikrii, A.A.: Conflict-Controlled Processes, Mathematics and its Applications, vol. 405.
Kluwer Academic Publishers Group, Dordrecht (1997)
5. Cristiani, E., Falcone, M.: Fully-discrete schemes for the value function of pursuit-evasion
games with state constraints. In: Annals of the International Society of Dynamic Games, vol.
10: Advances in Dynamic Games and Applications, pp. 177–206. Birkhauser, Boston (2009)
6. Grigorenko, N.L.: The problem of pursuit by several objects. In: Differential Games—
Developments in Modelling and Computation (Espoo, 1990), Lecture Notes in Control and
Inform. Sci., vol. 156, pp. 71–80. Springer, Berlin (1991)
7. Hagedorn, P., Breakwell, J.V.: A differential game with two pursuers and one evader. J. Optim.
Theory Appl. 18(2), 15–29 (1976)
8. Isaacs, R.: Differential Games. Wiley, New York (1965)
9. Krasovskii, N.N., Subbotin, A.I.: Positional Differential Games. Nauka, Moscow (1974).
(in Russian)
10. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer-Verlag,
New York (1988)
11. Levchenkov, A.Y., Pashkov, A.G.: Differential game of optimal approach of two inertial
pursuers to a noninertial evader. J. Optim. Theory Appl. 65, 501–518 (1990)
292 S.A. Ganebny et al.
12. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Abstracts of
13th International Symposium on Dynamic Games and Applications, pp. 149–151. Wroclaw
University of Technology, Wroclaw (2008)
13. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Annals of
the International Society of Dynamic Games, vol. 11: Advances in Dynamic Games and
Applications, pp. 209–226. Birkhauser, Boston (2011)
14. Mitchell, I.: Application of level set methods to control and reachability problems in continuous
and hybrid systems. PhD Thesis. Stanford University (2002)
15. Patsko, V.S., Turova, V.L.: Level sets of the value function in differential games with the
homicidal chauffeur dynamics. Int. Game Theory Rev. 3(1), 67–112 (2001)
16. Petrosjan L.A.: Differential Games of Pursuit, Leningrad University, Leningrad (1977).
(in Russian)
17. Pschenichnyi, B.N.: Simple pursuit by several objects. Kibernetika 3, 145–146 (1976).
(in Russian)
18. Shima, T., Shinar, J.: Time varying linear pursuit-evasion game models with bounded controls.
J. Guidance Control Dyn. 25(3), 425–432 (2002)
19. Shinar, J., Shima, T.: Non-orthodox guidance law development approach for the interception
of maneuvering anti-surface missiles. J. Guidance Control Dyn. 25(4), 658–666 (2002)
20. Stipanovic, D.M., Melikyan, A.A., Hovakimyan, N.: Some sufficient conditions for multiplayer
pursuit-evasion games with continuous and discrete observations. In: Bernhard, P., Gaitsgory,
V., Pourtallier, O. (eds.) Annals of the International Society of Dynamic Games, vol. 10:
Advances in Dynamic Games and Applications, pp. 133–145. Springer, Berlin (2009)
21. Taras’ev, A.M., Tokmantsev, T.B., Uspenskii, A.A., Ushakov, V.N.: On procedures for
constructing solutions in differential games on a finite interval of time. J. Math. Sci. 139(5),
6954–6975 (2006)
Chapter 15
Salvo Enhanced No Escape Zone
Stéphane Le Ménec
15.1 Introduction
This research program has focused on the problem of naval-based air defense
systems which must defend against attacks from multiple targets. Modern anti-air
warfare systems, capable of tackling the most sophisticated anti ship missiles are
S. Le Ménec ()
EADS/MBDA, 1 Avenue Réaumur, 92 358 Le Plessis-Robinson Cedex, France
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 293
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 15,
© Springer Science+Business Media New York 2012
294 S. Le Ménec
based on homing missiles which employ inertial navigation with low frequency or
no command update during their mid course phase before becoming autonomous,
employing an active seeker for the terminal phase. Technology developments in the
field of modular data links may allow the creation of a multi-link communication
network to be established between anti-air missiles and the launch platform. The
future prospect of such ad hoc networks makes it possible to consider cooperative
strategies for missile guidance. Many existing guidance schemes are developed on
the basis of one-on-one engagements which are then optimized for many-on-many
scenarios [6, 8]. A priori allocation rules and natural missile dispersion can allow a
salvo of missiles to engage a swarm of targets; however, this does not always avoid
some targets leaking through the salvo, whilst other targets may experience overkill.
Cooperative guidance combines a number of guidance technology strands and these
have been studied as part of the research program underline:
• Prediction of the target behavior;
• Mid-course guidance to place the missile in position to acquire and engage the
target;
• Allocation/re-allocation processes based on estimated target behavior and NEZ;
• Terminal homing guidance to achieve an intercept.
In the terminal phase, guidance has been achieved by handover to the DGL
guidance law [16] based on the differential game theory [7]. Two approaches to
missile allocation have been considered. The first one relates to Earliest Interception
Geometry (EIG) concepts [15]. This article focus on the second one exploiting the
NEZ defined by the linear differential game (DGL) guidance law which either acts to
define an Allocation Before Launch (ABL) plan or refine an earlier plan to produce
an In-Flight Allocation (IFA) plan.
A statement of the problem is given in Sect. 15.2 SENEZ Concept. In Sect. 15.6
Matrix Game Target Allocation Algorithm details of pre-flight and in-flight allo-
cation planning are described. Missile guidance, both mid-course and terminal, is
discussed in Sect. 15.7 Guidance Logics. The simulation results from a Simulink
6DoF (6 degree of freedom) model are reviewed in Sect. 15.9 SENEZ Results.
Sections 15.10 SENEZ Perspective and 15.11 Conclusion are about the study
conclusions and some remarks concerning the exploitation of these cooperative
guidance technologies. Finally, Sect. 15.12 Acronyms and notations summarizes
the meaning of the abbreviations we use and the variables we have in various
mathematical formulas.
There are occasions when the weapon system policy for defending against threats
involves firing two or more missiles at the same target. Without any action taken,
the missiles will naturally disperse en-route to the target, each arriving at the
point of homing with a slightly different geometry. In such a case, there will be
15 Salvo Enhanced No Escape Zone 295
Fig. 15.1 Example of multi-shoot in the SENEZ firing policy; we optimize the management of
missile salvos (target allocation process and the guidance laws) to cover uncertainties about the
evolution of targets; indeed, at the beginning of the flight the target positions are updated at a low
cadence; the missile control systems are provided with target measurements at high data rate only
after target acquisition by their on-board sensor; the missile seekers with limited range are depicted
by blue cones
a significant overlap of the NEZ. A SENEZ was introduced to optimized this type
of engagement, with the cooperating missiles increasing their chances of at least one
missile intercepting the target (Fig. 15.1).
In the naval or ground application, it is often the case that a number of assets
may be situated in close vicinity to each other. In this situation, it may be difficult
to predict which asset an inbound threat is targeting. In the case of air-to-air
engagements, there are various break manoeuvres which a target aircraft could
execute to avoid an interceptor. These paths can be partitioned into a small number
of bundles determined by the number of missiles in the salvo.
By selecting well chosen geometric paths it should be possible to direct the
defending missiles in such a way that each partition of the possible target trajectory
bundles falls within the NEZ of at least one missile. Consider a naval case
of a two missile salvo, and a threat that is initially heading straight towards
the launch vessel; there is a possibility that the threat may break left or right
at some point. One defending missile can be directed to cover the break right
and straight-on possibilities; the second missile would defend against the break
left and straight-on possibilities. By guiding to bundle partitions prior to the start
of homing, the NEZ of the firing is enhanced. At least one of the missiles will be
296 S. Le Ménec
able to intercept the target. This SENEZ firing policy differs from the more standard
shoot–look–shoot policy which considers the sequential firing of missiles where a
kill assessment is performed before firing each new missile launch.
Different approaches have been studied to predict target positions [14]. Results
detailed in the following are based on the version implementing the goal oriented
approach; which is based on the hypothesis that the target will guide to a goal.
The target trajectories have been classified into three categories: threats coming
from the left (with respect to the objective), from the front and from the right
(Fig. 15.2). We generate these three assumption target trajectories defining one way-
point per trajectory class. We compute the trajectories that lead to the threat object
passing by the way-points using Trajectory Shaping Guidance (TSG) [17]. The basic
TSG is similar to PN (Proportional Navigation) with a constraint on the final Line-
Of-Sight (LOS) angle in addition. This means that near impact, the LOS angle λ
equals a desired value λF . A 3D version of this law is applied from the threat’s
initial position to the way-point. When the way-point is reached, a switch is made
from TSG to standard PN to guide on the objective. The LOS final angle of the TSG
law is chosen to bring the threat aiming directly at the objective when it reaches
the way-point. Figure 15.3 illustrates how assumption target trajectories have been
generated.
A set of three way-points per target is defined using polar parameters (angle ψwpt
and radius Rwpt ). All way-points belong to a circle of radius Rwpt centered on the
supposed objective. Way-points are then spread with ψwpt as an angular gap, using
Fig. 15.3 2D target trajectory generation using way-points, TSG and PN as terminal homing
guidance
Fig. 15.5 Evolution of the engagement; way-points do not move; way-points trajectories become
impossible
assumed that these way-points do not change as the engagement evolves. Some
hypotheses will become progressively less likely to be true and others appear to
be a good approximation of reality. In due course, some hypotheses will become
unachievable and will be discarded during the cost computation process (Fig. 15.5).
The SENEZ target allocation algorithm is in charge of evaluating all missile-target-
hypothesis engagements [9, 11]. This means the algorithm must be able to tell for
each case if successful interceptions are possible and to give a cost on a scale that
enables comparisons.
Usage of the following letters is now reserved:
• W is the number of way-points considered;
• N is the number of defensive missile that can be allocated to a target (i.e. that are
not already locked on a target, or destroyed);
• P is the number of active and detected threats.
We will now use the following notation to name engagements (i.e. guidance
hypotheses):
Mi T j Hk (15.1)
15 Salvo Enhanced No Escape Zone 299
Table 15.1 Costs over target trajectory alternatives and defending missile beliefs
T j Hk (15.2)
This is used to name what the target does (in this case target j is following
hypothesis k). Based on the assumption that the target and missile may guide in
three different ways H1 , H2 and H3 , a three by three matrix leading to nine costs can
be presented in the following manner (Table 15.1). As the number of missiles and
targets increase, the size of the matrix will grow accordingly.
The target allocation algorithms developed during this study require the evaluation
of many tentative engagements, considering both various target behavior assump-
tions and different defending missile assignments. The mid course trajectories are
extrapolated using simulation models. However, after seeker acquisition, NEZ are
used for the homing engagement kinematic evaluation. The well known classical
DGL1 model [16] is used, except that time varying control bounds are considered
to account for defending missile drag. Acceleration control bounds have been
computed in accordance with 6DoF simulation runs. The time derivative of the
standard DGL1 NEZ boundary, as a function of the normalized time to go θ is
given by:
dZlimit θ
(θ ) = τP2 aE max ε0 h (θ ) − μ0 h (15.3)
dθ ε0
tgo
θ = (15.4)
τP
tgo = tF − t (15.5)
−α
h(α ) = e +α −1 (15.6)
300 S. Le Ménec
aP max
μ0 = (15.7)
aE max
τE
ε0 = (15.8)
τP
θ
Z(θ ) = y + ẏ τ − ÿP τP2 h(θ ) + ÿE τE2 h (15.9)
ε0
where y is the perpendicular miss respect to the initial Line Of Sight (LOS) direction
and ẏ is the first order time derivative (perpendicular velocity). ÿP and ÿE are
respectively the missile and target components of the acceleration perpendicular
to the initial LOS. Z is the ZEM (Zero Effort Miss), which is a well known concept
in missile guidance [17]. aP max is the maximum missile acceleration. aE max is the
maximum target acceleration. τP and τE are respectively the pursuer and the evader
time constants. μ0 is the pursuer to evader maneuvering ratio and ε0 the ratio of time
constants (evader to pursuer). t is the regular forward time and tF is the final time;
fixed terminal time defined by longitudinal (along the initial LOS) missile target
range equal to 0 (see [16] for more details).
By integration in backward time of Eq. (15.3) with initial condition Z (θ = 0) =
0, the NEZ limits can be computed as described by the upper and lower symmetric
boundaries of Fig. 15.6. Then, a simple model for the maneuverability has been
introduced as a linear function of θ .
μ (θ ) = μ0 + ν θ (ν ≤ 0) (15.10)
The meaning of this equation is that μ which is the ratio of the maximum
Pursuer acceleration aP max (θ ) over the maximum Evader acceleration aE max (θ )
is increasing as the time to go θ decreases (the missile gets nearer to the target; μ0
value of μ at t = tF ). If thinking about vehicle’s manoeuvring drag, then we assume
that this phenomenon has more impact on the Evader than on the Pursuer. After
integration of this equation, one obtains the new NEZ limits (upper positive limit,
the negative one is symmetric):
θ3 θ2
Zlimit (θ ) = Zlimit (μ0 , ε0 ) (θ ) + ν τP2 aE max − − (θ + 1) e−θ + 1 (15.11)
3 2
The term Zlimit (μ0 , ε0 ) on the left side is the standard DGL1 bound. The term on the
right side is the correcting term we obtain due to the linear variation of μ . When
ν < 0; this term actually closes the NEZ at a certain time as shown in Fig. 15.6.
There still exist other refinements for considering non constant velocity profiles
using DGL1 kinematics [12]. When running 3D Simulink simulations, the attain-
ability calculus is performed by considering two orthogonal NEZs, associated to the
horizontal and to the vertical planes.
15 Salvo Enhanced No Escape Zone 301
RMT
tgo = (15.12)
Vc
Where RMT is the missile-target distance and Vc the closing velocity. Then, for any
time t of the trajectory:
Where XY ZT are the target coordinates, in inertial frame. The PIP is assumed
to have both its velocity and acceleration equal to 0. For every time sample of
the target’s trajectory, the PIP coordinates are calculated, then the PN command
of the missile and finally integrating this command generates the missile states
at next sample time. For initial extrapolations, i.e. when missiles are not already
in flight, it is assumed that their velocity vector is aimed directly at the way-
point of the hypothesis chosen. This is also used in the model when actually
shooting missiles. PN on PIP objective makes use of the assumed knowledge of the
target’s behavior and allows the SENEZ target allocation algorithm to launch several
defending missiles against the same real threat following different mid course paths.
The SENEZ principle is indeed to shoot multiple missiles to anticipate target’s
behavior such as doglegs, and new target detections. Once missile trajectories have
been computed, the costs are evaluated. The NEZ concept is applied as well as
a modeling of the field of view of the missile’s seeker. Two zones are defined;
the first zone determines if a target can be locked by the seeker (information); the
second zone determines if the target can be intercepted (attainability). The cost is
simply the relative time when the target enters the intersection of both zones. If it
never happens, the cost value is infinite. If the threat is already in both zones at the
first sample time, the cost is zero.
When guiding on a hypothesis such as M1 T1 H1 , it is supposed that the seeker
always looks at the predicted position of threat T1 , hypothesis H1 . This gives at every
sample time the aiming direction of the seeker. This seeker direction is tested against
all other hypotheses to check if a target is within the field of view at this sample
time. If positive, an interception test using the NEZ evaluates whether interception
is possible. As soon as a target enters the field of view and becomes reachable for a
hypothesis, the cost is updated to the trajectory’s current time. The cost computation
concludes when all costs, i.e. of all hypotheses, have been computed, or when the
last trajectory sample has been reached.
This cost logic has been chosen because of the following:
• It takes into account what the missile can or cannot lock on (seeker cone).
• It takes into account the missile’s ability to reach the threats (NEZ).
• In most cases, it can be assumed that low costs imply short interception times.
After costs have been computed, the algorithm has to find the best possible
allocation plan. This means we need to construct allocation plans and combine
costs. The overall criterion for allocation plan discrimination is about minimizing
the time to intercept all the threats which is likely equivalent to maximize the range
between the area to protect and the closest location of threat interception. Consider
the following illustrative example. One threat T1 attacks one objective, with three
possible hypotheses H1 , H2 and H3 . Two missiles M1 and M2 are allocated to
15 Salvo Enhanced No Escape Zone 303
Where i, j are target way-point beliefs defining the defending missile strategies
(mid course trajectories) and k is the way-point number defining the threat strategies
(trajectories).
The best allocation plan of this simple case is thus M1 T1 H1 − M2 T1 H3 , (i∗ = 1;
∗
j = 3) which means guiding M1 based on hypothesis H1 of T1 and M2 on
hypothesis H3 of the same target. By playing this plan, the second hypothesis is
covered with a satisfactory cost of 1.8, and no additional missile is needed.
This algorithm could also be used to optimize the number of missile to be
involved. i.e. if no satisfactory solution exists, i.e. if the costs are higher than a
threshold, the procedure can re-start with an additional missile, three missiles in
this case.
The same principle applies when there are more than two missiles, and more
than one target (the SENEZ algorithm has been written and evaluated in general
scenarios). The mathematical formula for the construction and optimization of
allocation plans cost matrix then becomes as follow:
find(A, B) | min CA, B = min (max (min (CMk TA(k) HB(k) | Ti H j ))) (15.15)
A, B A, B i, j k
where
• k is the missile number (between 1 and N; maximum number of defending
missile).
• A (k) is the index of the allocated target (to missile k).
304 S. Le Ménec
The two diagrams Figs. 15.7 and 15.8 summarize the defending guidance phases
(mid-course, homing phase) and explain how the 6DoF Simulink Common Model
operates.
Fig. 15.7 During mid course, the guidance logics block extrapolates targets states and PIP
coordinates. It also determines if the seeker locks on one of the targets
15 Salvo Enhanced No Escape Zone 305
Fig. 15.8 In homing phase, guidance logics sends true states of locked target. The seeker block
applies noises for measurement computation. Kalman filter estimates target’s states. Finally a
DGL1 command is applied
Several scenarios for air defense in the ground and naval context have been defined.
A target allocation benchmark policy, with neither re-allocation, nor SENEZ
features, has been defined for comparison purpose. Scenario 3 (Fig. 15.9) deals with
ground defense where Air Defense Units (ADUs) are located around (Defended
Area, circle) the objective to be protected (RADAR, diamond mark in the center of
the Defended Area). A threat aircraft launches a single missile and then escapes the
radar zone. The aircraft and missile are supersonic.
The benchmark policy consists in launching a defending missile as soon as a
threat appears in the radar detection range. The benchmark algorithm starts by
launching one missile on the merged target. When both targets split, a second missile
is shot. This second defending missile will intercept the attacking missile. Due to
the sharp escape manoeuvres of the aircraft the first defending missile misses the
aircraft. After missing the aircraft, the benchmark algorithm launches a third missile
to chase the escaping aircraft. This last missile never reaches its target.
When the aircraft crosses the RADAR range the SENEZ algorithm launches two de-
fending missiles (Fig. 15.10). In ground scenarios, several ADUs are considered, the
algorithm automatically deciding by geometric considerations which ADU to use
when launching defending missiles. For simplicity, in naval and ground scenarios
only one location is considered as the final target goal (ground objective to protect,
306 S. Le Ménec
Fig. 15.9 Benchmark trajectories in scenario 3; the line coming from the right side which turns
on its right side is the trajectory of an aircraft which launches a missile towards the area to protect
(circle); the defense missiles are launched from the same launching base located on the border
of the area to protect at bottom of the figure; the first missile misses the aircraft; the second one
intercepts the threat/missile and the last one also misses reaching the aircraft
RADAR diamond mark). Simple waypoints are used to generate target trajectory
assumptions, even if it is possible to extend the concept to more sophisticated target
trajectory assumptions.
Figure 15.10 explains what happens when using the SENEZ algorithm and
what the improvements with respect to the benchmark policy are. The defending
missiles are M2 (on the left) and M3 (on the right). The aircraft trajectory is T1
turning on the right side. T2 is the missile launched by the aircraft. The defending
missiles intercept when the threat trajectories switch from plain to dot lines. The
dot lines describe what happens when using the benchmark policy in place of the
SENEZ algorithm. The remaining dot lines are the target trajectory assumptions
continuously refined during the engagement. A straight line assumption was
considered by the algorithm, however defended missiles assigned to the right and to
the left threats are enough to cover the three way-point assumptions elaborated when
the initial threat appears. The SENEZ algorithm intercepts the attacking missile
at longer distance than the benchmark algorithm, around a 1km improvement.
Moreover, SENEZ only launches two defending missiles and also intercepts the
launching aircraft which the benchmark algorithm fails to do. The fact that SENEZ
directs missiles to the left and right sides, plus the fact that SENEZ launches
earlier than the benchmark explains the SENEZ performance improvement. The
Monte Carlo simulations confirm these explanations. The benchmark strategy as the
15 Salvo Enhanced No Escape Zone 307
Fig. 15.10 SENEZ target allocation algorithm on scenario 3; the dashed lines are the target
assumption trajectories that the algorithm considers for launching of defense missiles; a third
hypothesis “flying straight” to counter the incoming aircraft is also taken into account, however
two defense missiles are sufficient to cover the three hypotheses; the assumption “straight line” is
then removed from the figure; the black crosses explain where the SENEZ interceptions occur; we
superimpose the reference trajectories for purposes of comparison
SENEZ algorithm are able to intercept the missile/threat. The interception of the
missile/threat occurs a little earlier in the SENEZ case. The main difference is that
with the SENEZ algorithm the aircraft has been intercepted in most of the cases
(see Fig. 15.11). The mean value for intercepting is around 20 s. with SENEZ and
never happens before 20 s. with the benchmark strategy. Moreover, intercepting the
aircraft often misses in the benchmark simulations (80 s is the maximum time of the
simulation).
Monte-Carlo runs have been executed for all the scenarios, comparing intercep-
tion times obtained with the benchmark model to those obtained with the SENEZ.
Disturbances for these runs were as follow:
• Seeker noise;
• Initial position of the targets (disturbance with standard deviation equal to 50 m);
• Initial Euler angles of the target (disturbance with standard deviation equal
to 2.5◦ ).
308 S. Le Ménec
Fig. 15.11 Monte Carlo results on scenario 3; mean (μ ) and standard deviation (σ ) for the
benchmark policy (top) and the SENEZ policy (down) when intercepting the target T1 (aircraft
threat)
Performance analyses have also been executed on various other scenarios for ground
and naval applications contexts. Moreover, parametric studies have been conducted
on the following aspects:
• Waypoint placements;
• The drag coefficient of the defensive missiles;
• The radius and range of the seekers;
• Plus some variations on the scenario definition as time of appearance of the
second target in scenario 3.
Attention is also paid to finding way-point placements that would be convenient for
all ground to air scenarios, or all surface to air scenarios. The optimal placement of
the way-points highly depends on the scenario. This tends to prove there would be
an advantage in increasing the number of way-points/ missiles corresponding to an
increased number of SENEZ hypotheses.
Potential benefits were first illustrated on all the scenarios considered against
targets performing highly demanding evasive manoeuvres as well as apparent single
targets that resolve into two splitting targets. The trajectories obtained gave a
better idea of the SENEZ behavior. However, the way that target hypotheses are
issued proved to be critical. This has been demonstrated by the parametric studies
as placement of the way-points changed greatly the results from one scenario to
another. The sensitivity to parameters such as drag and seeker features has also
been investigated. Results obtained during these parametric studies seem to show the
initial number of way-points/hypotheses per target chosen three might be too low.
15 Salvo Enhanced No Escape Zone 309
SENEZ guidance attempts to embed the future possible target behavior into the
guidance strategy by using goal oriented predictions of partitioned threat trajectories
to drive missile allocation and guidance commands. As such the SENEZ approach
offers an alternative to mid-course guidance schemes which guide the intercepting
missile or missiles towards a weighted track. The general application of SENEZ
would lead to a major change in weapon C2 philosophy for naval applications which
may not be justifiable.
The SENEZ engagement plan requires that a missile be fired at each partitioned
set of trajectories. This is different from many existing naval firing policies which
would fire a single missile to the target at long range and would delay firing another
missile until later when, if there were sufficient time, a kill assessment would be
undertaken before firing a second round. Depending on the evolution of target
behaviour, current C2 algorithms may fire a second missile before the potential
interception by the first missile. So existing systems tend to follow a more sequential
approach, the naval platform needing to preserve missile stocks so that salvo firings
are limited; unlike air platform the naval platform cannot withdraw rapidly from
an engagement. The proposed engagement plan is purely geometric in formation as
opposed to current schemes which use probabilities that the target is making for a
particular goal [2]. This latter type of engagement plan will generally result in fewer
missiles being launched. In the SENEZ scheme, a missile salvo will be fired more
often because the potential target trajectories are all equally likely. For instance,
when the target is at long range, it is likely that its choice of asset to attack is of
same probability, whereas at the inner range boundary, it is most likely that the
target is straight-flying towards its intended target.
Despite these potentially negative assessments of the SENEZ concept, there will
be occasions when current C2 algorithms will determine that it is necessary to
launch a salvo against a particular threat. For instance, a particularly high value
asset such as an aircraft carrier may be targeted and a high probability of successful
interception is required. In such circumstances there could be merit in the SENEZ
approach. Essentially, in the naval setting SENEZ may be considered as a possible
enhancement for the salvo firing determined by the engagement planning function
in existing C2 systems.
310 S. Le Ménec
For air-to-air systems the scope for considering a SENEZ form of guidance may
be greater. It is often policy for aircraft to fire two missiles at an opposing aircraft
engaged at medium range. With a two aircraft patrol, the leader and the wing aircraft
will each fire a missile at the target, there is an opportunity to shape the guidance
so that possible break manoeuvres are covered. With separate platforms firing the
missiles it would be necessary for inter-platform communication so that each missile
could be allocated to a unique trajectory partition.
By the way, many extensions could be addressed. First of all, new mid course
guidance scheme, so called particle guidance or trade-off mid course guidance
could be considered to guide on several tracks rather than assuming one unique
target [3]. Moreover, by considering allocation plans with two defending missiles
on one target NEZ as computed in [5, 10] it could be possible to involve defending
missile with diminished performances (no up link during the mid-course, less
kinematics performance, low cost seeker). Considering inter-missile communication
capabilities the current centralized algorithm could be improved with decentralized
features. Decentralization would allow to distribute the processing to individual
missiles rather than concentrating the process computation in one unique location;
i.e. the frigate to protect [4].
15.11 Conclusion
Acknowledgements This work was funded by the French—UK Materials and Components for
Missiles—Innovation and Technology Partnership (MCM ITP) research programme.
References
1. Basar, T., Olsder, G.J.: Dynamic Non Cooperative Game Theory, 2nd ed. CLASSICS in
Applied Mathematics, CL 23, Society for Industrial and Applied Mathematics, Philadelphia,
ISBN 978-0-89871-429-6 (1999)
2. Bessière, P. and the BIBA – INRIA Research Group – Projet CYBERMOVE: Survey –
Probabilistic Methodology and Techniques for Artefact Conception and Development, INRIA
Research Report No. 4730, ISSN 0249-6399 ISRN INRIA/RR-4730-FR+ENG (February
2003)
3. Dionne, D., Michalska, H., Rabbath, C.-A.: Predictive guidance for pursuit-evasion engage-
ments involving multiple decoys. J. Guidance Control Dyn. 30(5), 1277–1286 (2007)
4. Farinelli, A., Rogers, A., Petcu A., Jennings, N.-R.: Decentralised coordination of low-power
embedded devices using the max-sum algorithm. In: Padgham, Parkes, Müller, Parsons (eds.)
Proceedings of the 7th International Conference on Autonomous Agents and Multiagent
Systems, AAMAS-08, May 12–16, 2008, Estoril, Portugal, pp. 639–646 (2008)
5. Ganebny, S.A., Kumkov, S.S., Le Ménec, S., Patsko, V.S.: Numerical study of two-on-one
pursuit-evasion game. In: Preprints of the 18th IFAC World Congress, Milano, Italy, August
28–September 2, 2011, pp. 9326–9333 (2011)
6. Ge, J., Tang, L., Reimann, J., Vachtsevanos, G.: Suboptimal approaches to multiplayer
pursuit-evasion differential games. In: AIAA 2006–676 Guidance, Navigation, and Control
Conference, August 21–24, 2006, Keystone, Colorado (2006)
7. Isaacs, R.: Differential Games, a Mathematical Theory with Applications to Warfare and
Pursuit, Control and Optimization. Wiley, New York (1965)
8. Jang, J.S., Tomlin C. .: Control strategies in multi-player pursuit and evasion game. In:
AIAA 2005-6239 Guidance, Navigation, and Control Conference, August 15–18, 2005, San
Francisco, California (2005)
9. Le Ménec, S.: Cooperative mid course guidance law based on attainability constrains:
invited session: advanced methods for the guidance and control of autonomous vehicles. In:
Proceedings of the European Control Conference, August 23–26, 2009, Hungary, MoA3.5, pp.
127–131, ISBN 978-963-311-369-1 (2009)
10. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Breton M.,
Szajowski, K. (eds.) Annals of the International Society of Dynamic Games, Advances in
Dynamic Games, Theory, Applications, and Numerical Methods for Differential and Stochastic
Games. Birkhäuser, Springer, New York, ISBN 978-0-8176-8088-6 (2011)
11. Le Ménec, S., Shin, H.-S., Tsourdos, A., White, B., Zbikowski, R., Markham K.: Cooperative
missile guidance strategies for maritime area air defense. In: 1st IFAC Workshop on Distributed
Estimation and Control in Networked Systems (NecSys09), 24–26 September 2009, Venice,
Italy (2009)
12. Shima, T., Shinar, J., Weiss, H.: New interceptor guidance law integrating time-varying and
estimation-delay models. J. Guidance Control Dyn. 26(2), 295–303 (2003)
13. Shin, H.-S., Le Ménec, S., Tsourdos, A., Markham, K., White, B., Zbikowski, R.: Cooperative
guidance for naval area defence. In: 18th IFAC Symposium on Automatic Control in
Aerospace, September 6–10, 2010, Nara, Japan (2010)
14. Shin, H.-S., Piet-Lahanier, H., Tsourdos, A., Le Ménec, S., Markham K., White B.-A.:
Membership set-based mid course guidance: application to manoeuvring target interception.
In: Preprints of the 18th IFAC World Congress, August 28–September 2, 2011 Milano, Italy,
pp. 3903–3908 (2011)
15 Salvo Enhanced No Escape Zone 313
15. Shin, H.-S., Tsourdos, A., Le Ménec, S., Markham, K., White, B.: Cooperative mid course
guidance for area air defence. In: AIAA 2010–8056 Guidance, Navigation and Control, August
2–5, 2010, Toronto, Ontario, Canada (2010)
16. Shinar, J., Shima, T.: Non-orthodox guidance law development approach for the interception
of maneuvering anti-surface missiles. J. Guidance Control Dyn. 25(4), 658–666 (2002)
17. Zarchan, P.: Tactical and Strategic Missile Guidance, Progress in Astronautics and Aeronautics,
vol. 219, 5th revised ed. (2007)
Chapter 16
A Method of Solving Differential Games Under
Integrally Constrained Controls
Abstract This study deals with linear game problems under integral constraints on
controls. The proposed scheme leans upon the ideas of the method of resolving
functions [Chikrii, Conflict Controlled Processes. Kluwer, Boston (1997)]. The
analog of the Pontryagin condition formulated in the paper, makes it feasible to
derive sufficient conditions for the finite-time termination of differential game.
Obtained results are illustrated with the typical game state of affairs “simple
motion” and continue researches [Chikrii and Belousov, Mem. Inst. Math. Mech.
Ural Div. Russ. Acad. Sci. 15(4):290–301 (2009) (in Russian); Nikol’sky, Diff. Eq.
Minsk. 8(6):964–971 (1972) (in Russian); Subbotin and Chentsov, Optimization of
Guarantee in Problems of Control. Nauka, Moscow (1981) (in Russian)].
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 315
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 16,
© Springer Science+Business Media New York 2012
316 A.A. Belousov et al.
(16.4)
16 Solving Differential Games Under Integrally Constrained Controls 317
For v = 0 this inclusion is evidently satisfied. Taking into account (16.3) and the
inclusion 0 ∈ U at v
= 0 we have a chain of inclusions:
v √
π eA τ C ∈ π eAτ CV ⊂ λ π eAτ BU ⊂ λ π eAτ BU,
v
Whence, taking into account the finiteness of γ (t, τ , v) (if π eAt z0 = 0), we conclude
that δ (γ (t, τ , v),t, τ , v) = 0, and the upper bound in (16.5) is attained.
Let us consider the level set of function γ (t, τ , v):
We will show that this set is open and, therefore, Borel for any positive number a.
This will be imply the Borel measureability of the function γ (t, τ , v).
Let us fix a positive number a and let (t¯, τ̄ , v̄) be an arbitrary point of Λa .
Consequently, a ∈ / Ω (t¯, τ̄ , v̄) and δ (a, t¯, τ̄ , v̄) > 0. The continuity of the function
δ (·) ensures the existence of a neighborhood Δ of the point (t¯, τ̄ , v̄) such that the
inequality δ (a,t, τ , v) > 0 holds for all (t, τ , v) ∈ Δ , that is, γ (t, τ , v) < a for all
(t, τ , v) ∈ Δ . This implies that the set Λa is open, as required.
The mappings on the left- and right-hand sides of this inclusion are Borel measur-
able in (τ , v) and continuous in u, u ∈ U.
By the theorem of Kuratowski and Ryll-Nardzewski [2, 3] there exists a Borel-
measurable selection, that is, a Borel-measurable mapping w(τ , v) ∈ U, such that
γ (T, τ , v)π eAT z0 + π eAτ Cv = (1 − λ )γ (T, τ , v) + λ v2 π eAτ Bw(τ , v)
320 A.A. Belousov et al.
for all (τ , v) ∈ R+× Rl . Also, from this theorem it may be concluded that there exists
a Borel-measurable mapping w̃(τ , v) ∈ U such that
π eAτ Cv = λ v2 π eAτ Bw̃(τ , v)
By condition (16.7) of the theorem there exists an instant T ∗ = T ∗ (z0 , v(·)) such that
T∗
γ (T, T − τ , v(τ )) dτ = 1.
0
Then the control of the pursuer on the interval [0, T ] prescribed by the formula
⎧
⎪
⎨ − (1 − λ )γ (T, T − τ , v(τ )) + λ v(τ )2 · w(T − τ , v(τ )), for τ ∈ [0, T ∗ ],
u(τ ) =
⎪
⎩
− λ v(τ )2 · w̃(T − τ , v(τ )), for τ ∈ (T ∗ , T ].
(16.8)
T T
+ λ v(τ )2 π eA(T −τ ) Bw̃(T − τ , v(τ )) dτ + π eA(T −τ )Cv(τ ) dτ
T∗ 0
T∗
= π eAT z0 − [γ (T, T − τ , v(τ ))π eAT z0 + π eA(T −τ )Cv(τ )] dτ
0
T T
A(T −τ )
+ πe Cv(τ ) dτ + π eA(T −τ )Cv(τ ) dτ
T∗ 0
T∗
= π eAT z0 − γ (T, T − τ , v(τ )) dτ · π eAT z0 = 0.
0
This equation proves that a solution of (16.1) is brought to the terminal set z(T ) ∈ M.
Let us verify that a control u(τ ) (16.8) constructed in such a way meets the
integral constraints (16.2):
T T∗
u(τ )2 dτ = [(1 − λ )γ (T, T − τ , v(τ )) + λ v(τ )2]w(T − τ , v(τ ))2 dτ
0 0
T
+ λ v(τ )2 w̃(T − τ , v(τ ))2 dτ
T∗
T∗ T
(1 − λ ) γ (T, T − τ , v(τ )) dτ + λ v(τ )2 dτ 1.
0 0
The case π eAT z0 = 0 is analyzed in a similar way. In so doing, the pursuer control
on the interval [0, T ] is as follows:
u(τ ) = − λ v(τ )2 · w̃(T − τ , v(τ )). (16.9)
Analogously, it may be shown that in this case, too, the control (16.9) ensures
bringing a solution of (16.1) to the terminal set M at moment T (for any admissible
control v(τ )), and control u(τ ) meets integral constraint (16.2).
Remark 16.1. The theorem is easily transferred to the case of general constraints
on controls:
∞ ∞
uT (τ )Gu(τ ) dτ μ 2 , vT (τ )Hv(τ ) dτ ρ 2 , (16.10)
0 0
transforms the differential game (16.1), (16.10) into the original form
ẋ = u, x(0) = x0 , x ∈ Rn , u ∈ Rn ,
(16.11)
ẏ = v, y(0) = y0 , y ∈ Rn , v ∈ Rn .
ż = μ ũ − ρ ṽ, z0 = x0 − y0 ,
∞ ∞
ũ(τ ) dτ 1,
2
ṽ(τ )2 dτ 1.
0 0
The terminal set is M = {0}, the operator π represents itself as an identical
transformation.
It is easy to see that Assumption 16.3 is satisfied for the parameter λ = ρμ < 1:
−ρ D ⊂ λ · μ D, D = {z ∈ Rn : z2 1}.
The resolving function γ (·) can be found from the formula for the set-valued
mapping
16 Solving Differential Games Under Integrally Constrained Controls 323
Ω (t, τ , ṽ) = γ ∈ R : γ z − ρ ṽ ∈ (1 − λ )γ + λ ṽ · μ D
0 2
= γ ∈ R : (γ z0 − ρ ṽ)2 (1 − λ )γ + λ ṽ2 · μ 2
μ (μ − ρ )
= γ : F(γ , ṽ) = z0 2 γ 2 − 2γ ρ z0 , ṽ
+
2
−ρ (μ − ρ )ṽ2 0 .
The function F(γ , ṽ) represents itself as a quadratic polynomial with respect to γ
with a positive coefficient at a higher degree term; therefore, γ (ṽ) (16.5) is the largest
root of the quadratic equation F(γ , ṽ) = 0.
Note that F(0, v) 0 for all v ∈ Rn ; therefore, the function γ (v) is defined for all
v and γ (v) 0.
Let us find v∗ that yields a minimum to the function γ (v). To this end we
differentiate the following equation:
γ
v∗ = − · z0 .
μ −ρ
The corresponding value v∗ of function γ (·) can be found from the quadratic
equation F(γ , v∗ ) = 0:
(μ − ρ )2
γ (v∗ ) = ,
z0 2
whence
μ −ρ 0
v∗ = − ·z .
z0 2
It can be easily shown that, in view of the fact that the function γ (·) is the largest
root of the quadratic equation, the following inequality is true:
ρ (μ − ρ )v
γ (v) →∞ when v → ∞.
z0 2
From this it follows that the unique extremum of function γ (v) appears as its
minimum.
324 A.A. Belousov et al.
By Theorem 16.1 the time of the game termination is defined by the relationships
T T
(μ − ρ )2
γ (v(τ )) dτ γ (v∗ ) dτ = · T = 1,
0 0 z0 2
whence
z0 2
T= . (16.13)
(μ − ρ )2
It should be noted that instant T coincides with the time of first absorption [4] for
the game (16.11), that is, it coincides with the first moment when the attainability
set of the pursuer x absorbs the attainability set of the evader y. This instant T is the
minimal guaranteed time of the game (16.11) termination.
Let us present an explicit form of a countercontrol u(v) of the pursuer on the
interval [0, T ] that solves the problem of approach. The strategy of the pursuer is
defined by the relationships
γ z0 − ρ ṽ = (1 − λ )γ + λ ṽ2 μ w , w 1 ,
ũ(ṽ) = − (1 − λ )γ + λ ṽ2 w,
whence
−γ z0 + ρ ṽ
ũ(ṽ) = .
μ
Then, using the substitution u = μ ũ, v = ρ ṽ and the quadratic equation F(γ , ṽ) = 0,
we deduce that
u(v) = v − γ (v) · z0,
where
1 μ (μ − ρ )
γ (v) = 0 2 × z0 , v
+
z 2
2
μ ( μ − ρ ) μ − ρ
+ z0 , v
+ + z0 2 v2 .
2 ρ
Thus, this control assures solving the problem (16.11), (16.12) no later than at
time T (16.13).
16 Solving Differential Games Under Integrally Constrained Controls 325
References
Abstract The model considered here will be formulated in relation to the “fishing
problem,” even if other applications of it are much more obvious. The angler goes
fishing, using various techniques, and has at most two fishing rods. He buys a
fishing pass for a fixed time. The fish are caught using different methods according
to renewal processes. The fish’s value and the interarrival times are given by the
sequences of independent, identically distributed random variables with known
distribution functions. This forms the marked renewal–reward process. The angler’s
measure of satisfaction is given by the difference between the utility function,
depending on the value of the fish caught, and the cost function connected with
the time of fishing. In this way, the angler’s relative opinion about the methods of
fishing is modeled. The angler’s aim is to derive as much satisfaction as possible, and
additionally he must leave the lake by a fixed time. Therefore, his goal is to find two
optimal stopping times to maximize his satisfaction. At the first moment, he changes
his technique, e.g., by discarding one rod and using the other one exclusively. Next,
he decides when he should end his outing. These stopping times must be shorter than
the fixed time of fishing. Dynamic programming methods are used to find these two
optimal stopping times and to specify the expected satisfaction of the angler at these
times.
A. Karpowicz
Bank Zachodni WBK, Rynek 9/11, 50-950 Wrocław, Poland
e-mail: [email protected]
K. Szajowski ()
Institute of Mathematics and Computer Science, Wybrzeże
Wyspiańskiego 27, 50-370 Wrocław, Poland
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 327
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 17,
© Springer Science+Business Media New York 2012
328 A. Karpowicz and K. Szajowski
17.1 Introduction
Before we start our analysis of the double optimal stopping problem (cf. the idea
of multiple stopping for stochastic sequences in Haggstrom [8] and Nikolaev [16])
for the marked renewal process related to the angler’s behavior, let us present the
so-called fishing problem. One of the first authors to consider the basic version
of this problem was Starr [19], and further generalizations were done by Starr
and Woodroofe [21], Starr et al. [20], and Kramer and Starr [14]. A detailed
review of papers related to the fishing problem was presented by Ferguson [7].
A simple formulation of the fishing problem, where the angler changes his location
or technique before leaving his fishing spot, was done by Karpowicz [12]. We extend
the problem to a more advanced model by taking into account the various fishing
techniques used at the same time (the parallel renewal–reward processes or the
multivariate renewal–reward process). This is motivated by the natural, more precise
models of the known, real applications of the fishing problem. The typical process of
software testing consists in checking subroutines. Initially, many kinds of bugs are
searched for. Consecutive stopping times are moments when the expert stops general
testing of modules and starts checking the most important, dangerous types of errors.
Similarly, in proofreading, it is natural to look for typographic and grammatical
errors at the same time. Next, we look for errors in language use.
Since various works are done by different groups of experts, it is natural that they
would compete against each other. If the first period work comprises one group and
the second period requires other experts, then they can be players in a game among
themselves. In this case, the proposed solution is to find the Nash equilibrium where
player strategies are the stopping times.
The applied techniques of modeling and finding the optimal solution are similar
to those used in the formulation and solution of the optimal stopping problem for the
risk process. Both models are based on the methodology explicated by Boshuizen
and Gouweleeuw [1]. The background mathematics for further reading are the
monographs by Brémaud [3], Davis [4], and Shiryaev [18]. The optimal stopping
problems for the risk process are considered in papers by Jensen [10], Ferenstein
and Sierociński [6], and Muciek [15]. A similar problem for a risk process with a
disruption (i.e., when the probability structure of the considered process is changed
at one moment θ ) was analyzed by Ferenstein and Pasternak-Winiarski [5]. The
model of the last paper brings to mind the change in fishing methods considered
here. However, such a change is considered as made by a decision maker and it is not
uncontrolled, automatically simply consequence on the basis of environment type.
The following two sections present details of the model. A slight modification
of the background assumption by the adoption of multivariate tools (two rods)
and the possible control of their numbers in use yields a different structure of the
base model (the underlying process, sets of strategies—admissible filtrations and
stopping times). This modified structure allows the introduction of a new kind of
knowledge selection, which consequently leads to a game model of the angler’s
expedition problem in Sects. 17.1.2 and 17.2.2. Following a general formulation
17 Anglers’ Fishing Problem 329
of the problem, a version of the problem for a detailed solution will be chosen.
However, the solution is presented as a scalable procedure dependent on parameters
that depends on various circumstances. It is not difficult to adopt a solution to a wide
range of natural cases.
An angler goes fishing. He buys a fishing pass for a fixed time t0 , which gives him
the right to use at most two rods. The total cost of fishing depends on the real
time of each equipment usage and the number of rods used simultaneously. The
angler starts fishing with two rods up to moment s. The effect on each rod can be
modeled by the renewal processes {Ni (t),t ≥ 0}, where Ni (t) is the number of fish
caught with rod i, i ∈ A := {1, 2}, during time t. Let us combine them together
into a marked renewal process. The usage of the ith rod by time t generates cost
ci : [0,t0 ] → ℜ (when a rod is used simultaneously with other rods, it will be denoted
by an index depending on the set of rods, e.g., a, cai ), and the reward is represented
{i} {i}
by independent, identically distributed (i.i.d.) random variables X1 , X2 , . . . (the
value of the fish caught with the ith rod) with cumulative distribution function Hi .1
The streams of two kinds of fish are mutually independent and are independent
of the sequence of random moments when the fish have been caught. The two-
→
−
vector process N (t) = (N1 (t), N2 (t)), t ≥ 0, can be represented also by a sequence
of random variables Tn taking values in [0, ∞] such that
T0 = 0,
Tn < ∞ ⇒ Tn < Tn+1 , (17.1)
→
−
Both the two-variate process N (t) and the double sequence {(Tn , zn )}∞
n=0 are
called a two-variate renewal process. Optimal stopping problems for the com-
pound risk process based on the two-variate renewal process were considered by
Szajowski [22].
change in the fishing method took place at time s. The natural filtration related to
the double-indexed process Z(s,t) is Fts = σ {0 ≤ u ≤ s ≤ v ≤ t : Z(u, v)}. If the
effect of extending the expedition after s is described by gbj : R+ × A × [0,t0 ] × R ×
2
[0,t0 ] → [0, Gbj ], j ∈ B, minus the additional cost of time cbj (·), where cbj : [0,t0 ] →
[0,Cbj ] [when card(B) = 1, then index j will be abandoned, and cb = ∑ j∈B cbj will
be used, which will be adequate]. Then the payoff can be expressed as
⎧ a − → a
⎪ g ( M t , zN(t) ,t) − c (t)
⎪
⎪
if t < s ≤ t0 ,
⎪ →
−
⎨ ga ( M , z , s) − ca (s)
s N(s)
Z(s,t) = →
− (17.4)
⎪
⎪ +g ( M s , zN(s) , s, Mts ,t) − cb(t − s) if s ≤ t ≤ t0 ,
b
⎪
⎪
⎩
−C if t0 < t,
where the function ca (t), ga (− →m , i,t) and the constant C can be taken as follows:
→
− →
−
c (t) = ∑i=1 ci (t), g ( M s , j,t) = ∑2i=1 gai ( M t , j,t), C = C1a +C2a +Cb . With the no-
a 2 a a
tation wb (−→ = wa (−
m , i, s, m,t) →
m , i, s) + gb (−
→ − cb (t − s) and wa (−
m , i, s, m,t) →m , i,t) =
a →
− a
g ( m , i,t) − c (t), formula (17.4) is reduced to
where
→
− →
−
Z {i} (s,t) = I{t<s≤t0 } wa ( M t , i,t) + I{s≤t≤t0 } wb ( M s , i, s, Mts ,t) − I{t0 <t}C.
When the methods of fishing are operating by separate anglers, a stopping random
field can be built based on the structure of the marked renewal–reward process as
a model of the competitive expedition results. One possible definition of payoff
is based on the assumption that each player has his own account related to the
exploration of the fishery. The states of the accounts depend on who forces the first
stop for changing the technique, under what circumstances, and what techniques
they choose. The first stopping moment, the minimum of stopping moments chosen
by the players, is, after the moment of the event (catching fish), Tn with rod zn , and
the reward functions depend on the type of fishing that gives recent caught fish (i.e.,
j, where j = zn ). The player’s payoff wai (−
→
m , j,t) = gai (−
→
m , j,t) − cai (t). The part of
the payoff that depends on the second chosen moment, which stops the expedition,
is different for the player who forces the change in fishing methods (the leader)
by himself and the other for the opponent. The leader is the one responsible for
determining the expedition deadline.
Let us assume for a while that the ith player, i = 1, 2, takes his opponent’s rod
and gives his own rod to his opponent. It is not a crucial assumption at any rate, and
the method of fishing after the change can be different from both available methods
332 A. Karpowicz and K. Szajowski
before the considered moment. The method of treatment of the case without this
assumption will be explained later (p. 335), when the behavior of the player in the
second part of the expedition is formulated. Define the function
w̃bi (−
→ = w̃ai (−
m , j, s, k, m,t) →
m , j, s) + g̃bi (−
→ − cb(t − s)
m , j, s, k, m,t)
for j ∈ A, k ∈ B, where j is the rod with which the fish had been caught just before
the moment of the first stop and k is the technique used by the ith player after
the change (the denotation −k is used for a complementary rod or player who has
decided what is appropriate). It describes the case where the player deciding to
change the method chooses the perspective technique of fishing as the first one.
Presumably he will explore the best methods with improvements and the second
angler will use the rod that is not used by the leader. The payoff of the players,
when the ith player is the one who forces the first stop, has the following form:
→
− →
−
Zi ( j, s,t) = I{t≤s≤t0 } g̃ai ( M t , j,t) + I{s<t≤t0 } w̃bi ( M s , i, s, −i, Mts ,t) − I{t0 <t}C,
(17.5)
−
→ →
−
Z−i ( j, s,t) = I{t≤s≤t0 } g̃a−i ( M t , j,t) + I{s<t≤t0 } w̃b−i ( M s , i, s, i, Mts ,t) − I{t0 <t}C.
(17.6)
In the preceding payoffs it is assumed that the final stop can be declared at any
moment. Each players declares changes in techniques right after an event with his
rod (catching fish with his rod) as long as on the opponent’s rod there is no event.
The details of the strategy sets and the solution concept are formulated in subsequent
parts of this paper.
The extension considered here is motivated by the natural, more precise models
of known real applications of the fishing problem. The typical process of software
testing consists of checking subroutines. Various types of bugs can be discovered
in this way. Each problem with subroutines generates the cost of bug removal and
increases the value of the software. It depends on the types of bugs found. Prelimi-
nary testing requires various types of experts. The stable version of subroutines can
be kept by less advanced computer scientists. The consecutive stopping times are
moments when the expert of a certain class stops testing one module and another
tester starts checking. The procedure for proofreading is similar.
{z } {i} {i}
by Tn = TNz n(Tn ) . There are also three renewal–reward processes {(Tn , Xn )}∞
n=0 ,
n
{z }
i = 1, 2, 3. By convention let us denote Xn = XNz n(Tn ) . The following σ -field
n
generated by the history of the A-marked renewal processes are defined by
Definition 17.1. Let T be a set of stopping times with respect to σ -fields {Ft },
t ≥ 0, defined by (17.7). The restricted sets of stopping times are
Tn,K = {τ ∈ T : τ ≥ 0, Tn ≤ τ ≤ TK } (17.8)
for n ∈ N and n < K are subsets of T . The elements of Tn,K are denoted τn,K .
The stopping times τ ∈ T have a nice representation that will be helpful in the
solution of the optimal stopping problems for the renewal processes [3]. A crucial
role in our subsequent considerations will be played by such a representation. The
following lemma is for unrestricted stopping times.
Lemma 17.1. If τ ∈ T , then there exist Rn ∈ Mes(Fn ) such that the condition τ ∧
Tn+1 = (Tn + Rn ) ∧ Tn+1 on {τ ≥ Tn } a.s. is fulfilled.
Various restrictions in the class of admissible stopping times will change this rep-
resentation. Some examples of subclasses of T are formulated here (Lemma 17.1).
Only a few of them are used in the optimization problems investigated in this paper
(Corollary 17.1).
{3} {3} {3} {3}
Let Fs,t = σ (FsA , X0 , T0 , . . . , XN ((t−s))+ , TN ((t−s)+ ) ) be the σ -field generated
3 3
by all events up to time t if the switch at time s from a two-variate renewal process to
{i}
another renewal process took place. For simplicity of notation we set Fn := F {i} ,
Tn
{i}
Fn := FTn , Fns := F {3} .2 Let Mes(Fn ) (Mes(Fn )) denote the set of nonnegative
s,Tn
{i}
and Fn (Fn )-measurable random variables. Henceforth, T and T s will stand
for the sets of stopping times with respect to σ -fields Fs and {Fs,t , 0 ≤ s ≤ t},
respectively. Furthermore, we can define for n ∈ N and n ≤ K the sets
{i} {i}
1. Tn,K = {τ ∈ T : τ ≥ 0, Tn ≤ τ ≤ TK };
{i} {i}
2. Tn = {τ ∈ T : τ ≥ Tn };
2 For the optimization problem there are two epochs: before the first stop, where there are some
payoffs, the model of stream of events, and after the first stop, when there are other payoffs and
different streams of events. In Sect. 17.3, this will be emphasized by adopting adequate denotations.
334 A. Karpowicz and K. Szajowski
−i { j}
where A{−i} := A \ {i}, TkA := min{ j∈A{−i} } T { j} {i} ;
n (Tk )
The stopping times τ ∈ T {i} and τ ∈ T̄ {i} can also be represented as shown in
Lemma 17.1.
Lemma 17.2. Let the index i ∈ A be chosen and fixed.
{i} {i} {i}
1. For every τ ∈ T {i} and n ∈ N there exist Rn ∈ Mes(Fn ) such that τ ∧ Tn+1 =
{i} {i} {i} {i}
(Tn + Rn ) ∧ Tn+1 on {τ {i} ≥ Tn } a.s. is fulfilled.
{i} {i}
2. If τ ∈ T̄ {i} and n ∈ N, then there exist Rn ∈ Mes(Fn ) such that the condition
{i} {i} {i} {i} {i}
τ ∧ Tn+1 = (Tn + Rn ) ∧ Tn+1 on {τ ≥ Tn } a.s. is fulfilled.
Obviously the angler wants to have as much satisfaction as possible and must
leave the lake before the fixed time. Therefore, his goal is to find two optimal
∗ ∗
stopping times τ a and τ b so that the expected gain is maximized:
∗ ∗
EZ(τ a , τ b ) = sup sup EZ(τ a , τ b ), (17.9)
τ a ∈T τ b ∈T τ a
∗
where τ a corresponds to the moment when he should change to the more effective
∗
rod and τ b to the moment when he should stop fishing. These stopping moments
should appear before the fixed time of fishing t0 . The process Z(s,t) is piecewise-
deterministic and belongs to the class of semi-Markov processes. The optimal
stopping time of similar semi-Markov processes was studied by Boshuizen and
Gouweleeuw [1] and the multivariate point process by Boshuizen [2]. Here the
structure of multivariate processes is revealed and their importance for the model is
shown. We use dynamic programming methods to find these two optimal stopping
times and to specify the expected satisfaction of the angler. The method of the
solution is similar to those used by Karpowicz and Szajowski [13], Karpowicz [12],
and Szajowski [22]. Let us first observe that by the properties of conditional
expectation we have
∗ ∗
∗
EZ(τ a , τ b ) = sup E{E Z(τ a , τ b )|Fτ a } = sup EJ(τ a ),
τ a ∈T τ a ∈T
where
∗
J(s) = E Z(s, τ b )|Fs = ess sup E Z(s, τ b )|Fs . (17.10)
τ b ∈T s
17 Anglers’ Fishing Problem 335
∗ ∗
Therefore, to find τ a and τ b , we must calculate J(s) first. The process J(s)
corresponds to the value of the revenue function in one stopping problem if the
observation starts at moment s.
Assignment of the leader in the case τ1 = τ2 is arbitrary but defined. The aim is to
find a pair (τ1 , τ2 ) of stopping times such that for i ∈ {1, 2} we have
In this section, we will find the solution to one stopping problem defined by (17.10).
We will first solve the problem for a fixed number of fish caught and then consider
the case with an infinite stream of fish caught. In this section we fix s, the moment
when the change took place, and m = Ms , the mass of the fish at time s. Taking into
account various models of fishing after the first stop, it is necessary to admit various
models of event streams. Assume that the moments of successive fish caught after
{3}
the first stop are Tn and the times between the events are i.i.d. with continuous,
cumulative distribution function F(t) with the density function f(t)and the fish’s
value represented by i.i.d. random variables with distribution function H(t) [for
convenience this part of the expedition is modeled by the renewal process denoted
{3} {3}
(Tn , Xn )].
336 A. Karpowicz and K. Szajowski
b := τ b ∗ ∗
In this subsection we are looking for the optimal stopping time τ0,K K
∗
E Z(s, τKb )|Fs = ess sup E Z(s, τKb )|Fs , (17.13)
τKb ∈T0,K
s
where s ≥ 0 is a fixed time when the position is changed and K is the maximum
number of fish that can be caught. Let us define
b b∗
Γn,K
s
= ess sup E Z(s, τn,K )|Fns = E Z(s, τn,K )|Fns , n = K, . . . , 1, 0 (17.14)
b ∈T s
τn,K n,K
{3}
and observe that ΓK,K
s = Z(s, T
K ). In subsequent considerations, we will use the
representation of stopping time formulated in Lemmas 17.1 and 17.2. The exact
form of the stopping strategies is given in the following corollary.
{i}
Corollary 17.1. Let i ∈ A. If τ a ∈ T {i} , τ b ∈ T s , then there exist Ran ∈ Mes(Fn )
{i} {i}
and Rbn ∈ Mes(Fns ), respectively, such that for conditions τ a ∧ Tn+1 = (Tn + Ran ) ∧
{i} {i} {3} {3} {3} {3}
Tn+1 on {τ a ≥ Tn } a.s. and τ b ∧ Tn+1 = (Tn + Ran ) ∧ Tn+1 on {τ a ≥ s ∧ Tn }
a.s. are valid.
Now we can derive the dynamic programming equations satisfied by Γn,K s .
n{1} = MT 1 , Mns =
To simplify the notation, we can write Mt = Mts for t ≤ s, M n
M s {3} , and F̄i = 1 − Fi . The payoff functions are simplified here to ĝa (m) =
Tn
ga (m1 , m2 , i,t)I{m1 +m2 =m} (m1 , m2 ), ĝb (m) = gb (m1 , m2 , i, s, m,t)I
{m−m 1 −m2 =m}
.
Lemma 17.3. Let s ≥ 0 be the moment of changing the fishing place. For n = K − 1,
K − 2, . . . , 0
{3}
ΓK,K
s
= Z s, TK ,
{3}
Γn,K
s
= ess sup ϑn,K (Ms , s, Mns , Tn , Rbn ) a.s., (17.15)
Rb
n ∈Mes(Fn )
s
where
r) = I{t≤t0 } F̄(r) I{r≤t0 −t} ŵb (m, s, m,t
ϑn,K (m, s, m,t, + r) − CI{r>t0 −t}
+E I {3}
{Sn+1 ≤r}
Γn+1,K
s
|Fns − CI{t>t0 }
17 Anglers’ Fishing Problem 337
and there exists Rbn ∈ Mes(Fns ) such that
{3}
Γn,K
s
= ϑn,K (Ms , s, Mns , Tn , Rbn ) a.s., (17.16)
⎧
⎨ τ b∗ ∗ {3}
b ∗ n+1,K if Rbn ≥ Sn+1 ,
τn,K = (17.17)
⎩ T {3} + Rb ∗ if Rb ∗ < S{3} ,
n n n n+1
∗
b =T {3} b
τK,K = ŵa (m, s)+ ĝb (m−m)−c
K , and ŵ (m, s, m,t) b (t −s) where ŵa (m,t) =
a a
ĝ (m) − c (t).
∗ ∗
Remark 17.2. Let {Rbn }Kn=1 , RbK = 0, be a sequence of Fns -measurable random
s = K ∧ inf{i ≥ n : Rb < S{3} }. Then Γ s =
variables, n = 1, 2, . . . , K, and ηn,K
∗
∗
i i+1 n,K
b b
E Z(s, τn,K )|Fn for n ≤ K − 1, where τn,K = Tηn,K
s b
s + R s .
η n,K
Proof of Remark 17.2. It is a consequence of an optimal choice Rbn in (17.15).
{3}
Proof of Lemma 17.3. The form of Γn,K
s for the > t0 is obvious from (17.4)
case Tn
and (17.14). Let us assume (17.15) and (17.16) for n + 1, n + 2, . . ., K. For any τ ∈
s (i.e., τ ≥ T {3} ) we have {τ < T {3} } = {τ ∧ T {3} < T {3} } = {T {3} + Rb <
Tn,K n n+1 n+1 n+1 n n
{3}
Tn+1 }. This implies
{3} {3} {3} {3}
τ < Tn+1 = Sn+1 > Rbn , τ ≥ Tn+1 = Sn+1 ≤ Rbn . (17.18)
{3} b
Suppose that TK−1 ≤ t0 and take any τK−1,K ∈ TK−1,K
s . According to (17.18) and the
properties of conditional expectation,
{3} s
E [Z(s, τ )|Fns ] = E I {3}
E Z s, τ ∨ Tn+1 Fn+1 Fn
Sn+1 ≤Rb
n
{3} F s
+E I {3}
Z s, τ ∧ Tn+1 n
Sn+1 >Rb
n
{3}
= I{Rb ≤t0 −Tn } F̄(Rn )ŵb Ms , s, Mns , Tn + Rbn
n
{3} s
+ E I {3} b E Z s, τ ∨ Tn+1 Fn+1 Fns .
Sn+1 ≤Rn
b
Let σ ∈ Tn+1 . For every τ ∈ Tns we have
⎧
⎨σ {3}
if Rbn ≥ Sn+1 ,
τ=
⎩ T {3} + Rb {3}
if Rbn < Sn+1 .
n n
338 A. Karpowicz and K. Szajowski
We also have
E[Z(s, τ )|Fns ] =E I {3} b E[Z(s, σ )|Fn+1
s
]|Fn
{S
n+1≤R }n
{3}
+ I{Rb ≤t0 −Tn } F̄ Rbn ŵb Ms , s, Mns , Tn + Rbn
n
≤ sup E I {3} Γn+1,K Fn
s
Sn+1 ≤R
R∈Mes(Fns )
{3} s
b
+I{R≤t0 −Tn } F̄(R)ŵ Ms , s, Mn , Tn + R
s
= E Z s, τn,K Fn
)|F s ] ≤ sup
It follows that supτ ∈Tns E[Z(s, τ )|Fns ] ≤ E[Z(s, τn,K τ ∈Tnb E[Z(s, τ )|Fn ],
s
n
where the last inequality is due to the fact that τn,K ∈ Tn,K . We apply the induction
s
where
κδb (m, s, m,t,
r) = F̄(r) I{r≤t0 −t} ŵb (m, s, m,t
+ r) − CI{r>t0 −t}
r ∞
+ dF(z) δ (m
+ x,t + z) dH(x).
0 0
Proof of Lemma 17.4. Since the case for t > t0 is obvious, let us assume that
{3}
Tn ≤ t0 for n ∈ {0, . . . , K − 1}. According to Lemma 17.3, we obtain ΓK,K
s
=
{3}
γ0s,Ms (MKs , TK ), and thus the proposition is satisfied for n = K. Let n = K − 1; then
Lemma 17.3 and the induction hypothesis lead to
{3}
ΓK−1,K
s
= ess sup F̄ RbK−1 I b b M , s, M s
{3} ŵ s
b
K−1 , TK−1 + RK−1
R ≤t −T
K−1 ∈Mes(Fs,K−1 )
Rb K−1 0 K−1
{3}
−CI b
{3}
+ E I {3} b γ0
s,Ms
MK , TK
s
Fs,K−1 a.s.,
RK−1 >t0 −TK−1 SK ≤RK−1
{3}
ΓK−1,K
s
= ess sup F̄ RbK−1 I b {3} ŵ
b
Ms , s, MK−1
s
, TK−1 + RbK−1
{RK−1 ≤t0 −TK−1 }
K−1 ∈Mes(Fs,K−1 )
Rb
Rb ∞
K−1 {3}
−CI b
{3}
+ dF(z) γ0
s,Ms
MK−1 + x, TK−1 + z dH(x)
s
RK−1 >t0 −TK−1 0 0
{3}
= γ1s,Ms MK−1
s
, TK−1 a.s.
{3}
Let n ∈ {1, . . ., K − 1} and suppose that Γn,K
s = γ s,Ms (M s , T
K−n n n ). As was done
previously, we conclude by Lemma 17.3 and the induction hypothesis that
Γn−1,K
s
= ess sup F̄ Rbn−1 I b b M , s, M s , T {3} + Rb
{3} ŵ s n−1 n−1 n−1
Rn−1 ≤t0 −Tn−1
n−1 ∈Mes(Fn−1 )
Rb s
Rb ∞
n−1 {3}
−CI b {3}
+ dF(s) γ s,Ms
K−n M s
n−1 + x, Tn−1 + s dH(x) a.s.
Rn−1 >t0 −Tn−1 0 0
{3}
Therefore, Γn−1,K
s
= γK−(n−1)
s,Ms
(Mn−1
s
, Tn−1 ).
Henceforth we will use αi to denote the hazard rate of the distribution Fi (i.e.,
αi = fi /F̄i ), and to shorten the notation, we set Δ • (a) = E ĝ• (a + X {i}) − ĝ•(a) ,
where • can be a or b.
Remark 17.3. The sequence of functions γ s,m
j can be expressed as
b b
γ s,m
j (
m,t) = I {t≤t0 } ŵ (m, s,
m,t) + y j (m − m,t − s,t0 − t) , −CI{t>t0 }
yb0 (a, b, c) = 0
ybj (a, b, c) = max φybb (a, b, c, r),
0≤r≤c j−1
r b
where φδb (a, b, c, r) = 0 F̄(z){α (z) Δ (a) + Eδ (a + X
{3}, b + z, c − z) −
cb (b + z)}dz, and F is the c.d.f. of S{3} [α (t) is the hazard rate of the distribution
of S{3}].
Proof of Remark 17.3. Clearly
r ∞
{3} {3}
dF(s) γ s,m
j−1 (m + x,t + s) dH(x) = E I γ s,m
{S ≤r} j−1
{3} m + X ,t + S ,
0 0
340 A. Karpowicz and K. Szajowski
where X {3} has the c.d.f. H. Since F is continuous and κγbs,m (m, s, m,t,
r) is bounded
j−1
and continuous for t ∈ R+ \ {t0 }, the supremum in (17.19) can be changed into a
maximum. Let r > t0 − t; then
κγbs,m (m, s, m,t,
r) = E I{S{3} ≤t0 −t } γ s,m
j−1 m + X {3}
,t + S {3}
−C F̄(t0 − t)
j−1
≤ E I{S{3} ≤t0 −t } γ s,m
j−1 m + X {3} ,t + S{3} + F̄(t0 − t)ŵb (m, s, m,t
0)
The preceding calculations cause that γ s,m = I{t≤t0 } max0≤r≤t0 −t ϕ j (m, s, m,t,
j (m,t) r)
b
−CI{t>t0 } , where ϕ j (m, s, m,t, + r) + E I{S{3} ≤r} γ s,m
r) = F̄(r)ŵ (m, s, m,t j−1
(m {3} {3} {3}
+ X ,t + S ) . Obviously for S ≤ r and r ≤ t0 − t we have S ≤ t0 ; {3}
therefore, we can consider the cases t ≤ t0 and t > t0 separately. Let t ≤ t0 ; then
= ŵb (m, s, m,t),
γ0s,m (m,t) and the hypothesis is true for j = 0. The task is now to
calculate γ s,m
j+1 (
m,t) given γ s,m
j (·, ·). The induction hypothesis implies that for t ≤ t0
r) = F̄(r)ŵb (m, s, m,t
ϕ j+1 (m, s, m,t, + r) + E I{S{3} ≤r} γ s,m
j (m + X {3},t + S{3})
= ĝa (m) − ca (s) + F̄(r) ĝb (m
− m) − cb(t − s + r)
r
+ f(z){Eĝb (m
− m + X {3}) − cb(t − s + z)
0
+Eybj (m
− m + X {3},t − s + z,t0 − t − z)}dz.
therefore,
r
r) = ŵb (m, s, m,t)
ϕ j+1 (m, s, m,t, + F̄(z){α (z)[Δ b (m
− m)
0
+Eybj (m
− m + X {3},t − s + z,t0 − t − z)] − cb (t − s + z)}dz,
bounded, continuous functions with the norm δ = supa,b,c |δ (a, b, c)|. It is easy to
check that B with the norm supremum is complete space. The operator Φ b : B → B
is defined by
(Φ b δ )(a, b, c) = max φδb (a, b, c, r). (17.20)
0≤r≤c
Let us observe that ybj (a, b, c) = (Φ b ybj−1 )(a, b, c). Remark 17.3 now implies that
∗ ∗
there exists a function rbj (a, b, c) such that ybj (a, b, c) = φybb (a, b, c, rb j (a, b, c)),
j−1
and this gives
∗
γ s,m
j (m,t)
= I{t≤t0 } ŵb (m, s, m,t)
+ φybb (m
− m,t − s,t0 − t, rbj
j−1
× (m
− m,t − s,t0 − t)) − CI{t>t0 } .
b b
where 0 ≤ C < 1. Similarly, as before, b (Φ δb2 )(a,
b, c) − (Φ δ1 )(a, b, c) ≤ C
δ2 − δ1 . Finally, we conclude that Φ δ1 − Φ δ2 ≤ C δ1 − δ2, which com-
pletes the proof.
Applying Remark 17.3, Lemma 17.5, and the fixed point theorem we conclude
the following remark
Remark 17.4. There exists yb ∈ B such that yb = Φ b yb and limK→∞ ybK − yb = 0.
According to the preceding remark, yb is the uniform limit of ybK when K tends to
infinity, which implies that yb is measurable and γ s,m = limK→∞ γKs,m is given by
= I{t≤t0 } ŵb (m, s, m,t)
γ s,m (m,t) + yb(m
− m,t − s,t0 − t) − CI{t>t0 } . (17.21)
We can now calculate the optimal strategy and the expected gain after changing
locations.
Theorem 17.2. If F(t0 ) < 1 and has the density function f, then:
∗
(i) For n ∈ N the limit τnb = limK→∞ τn,K
b a.s. exists and τ b ≤ t is an optimal
n 0
{3}
stopping rule in the set T s ∩ {τ ≥ Tn };
{3}
(ii) E Z(s, τnb )|Fns = γ s,m (Mns , Tn ) a.s.
Proof. (i) Let us first prove the existence of τnb . By the definition of Γn,K+1
s , we
have
Γn,K+1
s
= ess sup E [Z(s, τ )|Fns ] = ess sup E [Z(s, τ )|Fns ] ∨ ess sup E [Z(s, τ )|Fns ]
τ ∈Tn,K+1
s τ ∈Tn,K
s τ ∈TK,K+1
s
b∗
= E Z(s, τn,K )|Fns ∨ E [Z(s, σ ∗ )|Fns ] ,
b ∗ ∗ ∗
b or σ ∗ , where τ b ∈ T s and
and thus we observe that τn,K+1 is equal to τn,K n,K n,K
∗ b∗ b∗ , which implies that the
σ ∈ TK,K+1 , respectively. It follows that τn,K+1 ≥ τn,K
s
∗
b is nondecreasing with respect to K. Moreover, Rb ≤ t − T ∗ {3}
sequence τn,K i 0 i
b∗ ≤ t , and therefore τ b ≤ t exists.
for all i ∈ {0, . . ., K}; thus τn,K 0 n 0
Let us now look at the process ξ s (t) = (t, Mts ,V (t)), where s is fixed and
{3}
V (t) = t − TN (t) . ξ s (t) is a Markov process with the state space [s,t0 ] × [m, ∞) ×
3
[0, ∞). In a general case, the infinitesimal operator for ξ s is given by
∂ s,m ∂ s,m
Aps,m (t, m,
v) = p (t, m,
v) + p (t, m,
v)
∂t ∂v
(17.23)
From (17.22) we conclude that
τ b∗ b
τn,K
∗
n,K {3}
{3}
(Ap s,m
)(ξ (z)) dz = Eĝb (Mns + X {3} − m) − ĝb (Mns − m)
s
{3}
α (z − Tn ) dz
Tn Tn
τ b∗
n,K
− {3}
cb (z − s) dz.
Tn
where the last two inequalities result from the fact that the functions ĝb and cb
are bounded. On account of the preceding observation we can use the dominated
convergence theorem and
b ∗
τn,K τnb
lim E {3}
(Aps,m )(ξ s (z)) dz|Fns = E {3}
(Aps,m )(ξ s (z)) dz|Fns .
K→∞ Tn Tn
(17.24)
344 A. Karpowicz and K. Szajowski
Since τnb ≤ t0 , applying Dynkin’s formula to the left-hand side of (17.24) we
conclude that
b
τn s
E s,m s
(Ap ) (ξ (z)) dz Fn = E ps,m ξ s τnb Fns
{3}
Tn
{3}
−ps,m ξ s Tn a.s. (17.25)
Proof of Lemma 17.6. The derivative ν̄+ (c) exists because ν̄ (c) = max0≤r≤c φ̄ b (c, r),
where φ̄ b (c, r) is differentiable with respect to c and r. Fix h ∈ (0,t0 − c) and
define δ̄1 (c) = δ̄ (c + h) ∈ B and δ̄2 (c) = δ̄ (c) ∈ B. Obviously, Φ b δ̄1 − Φ b δ̄2 ≥
|Φ b δ̄1 (c) − Φ b δ̄2 (c)| = |Φ b δ̄ (c + h) − Φ b δ̄ (c)|, and on the other hand using
Taylor’s formula for the right-hand derivatives we obtain
δ̄1 − δ̄2 = sup δ̄ (c + h) − δ̄ (c) ≤ h sup δ̄ (c) + |o(h)| .
+
c c
The significance of Lemma 17.6 is such that the function ȳ(t0 − s) has a bounded
left-hand derivative with respect to s for s ∈ (0,t0 ]. The important consequence of
this fact is shown by the following remark.
Remark 17.5. The function γ s,m can be expressed as γ s,m (m, s) = I{s≤t0 } u(m, s) −
CI{s>t0 } , where u(m, s) = ĝa (m) − ca (s) + ĝb (0) − cb (0) + ȳb (t0 − s) is continuous,
bounded, and measurable with the bounded left-hand derivatives with respect to s.
At the end of this section, we determine the conditional value function of the second
optimal stopping problem. According to (17.10), Theorem 17.2, and Remark 17.5,
we have
∗
J(s) = E Z(s, τ b )|Fs = γ s,Ms (Ms , s) a.s. (17.28)
In this section, we formulate the solution of the double stopping problem. In the
first epoch of the expedition, the admissible strategies (stopping times) depend on
the formulation of the problem. For the optimization problem the most natural
strategies are the stopping times from T (see the relevant problem considered in
Szajowski [22]). However, when the bilateral problem is considered, the natural
class of admissible strategies depends on who uses the strategy. It should be T {i}
for the ith player. Here the optimization problem with restriction to the strategies
from T {1} in the first epoch is investigated.
The function u(m, s) has similar properties to those of the function ŵb (m, s, m,t)
and the process J(s) has a similar structure to that of the process Z(s,t). By this
observation one can follow the calculations of Sect. 17.3 to obtain J(s). Let us define
again Γn,K = ess supτ a ∈Tn,K E [J(τ a )|Fn ] , n = K, . . . , 1, 0, which fulfills the following
representation:
Lemma 17.7. Γn,K = γK−n (M
n{1} , Tn{1} ) for n = K, . . . , 0, where the sequence of
functions γ j can be expressed as
a
γ j (m, s) = I{s≤t0 } u(m, s) + y j (m, s,t0 − s) , −CI{s>t0 }
ya0 (a, b, c) = 0,
yaj (a, b, c) = max φyaa (a, b, c, r),
0≤r≤c j−1
346 A. Karpowicz and K. Szajowski
where
r
φδa (a, b, c, r) = F̄1 (z) α1 (z) Δ a (a) + Eδ (a + x{1}, b + z, c − z)
0
− (ȳb− (c − z) + ca (b + z)) dz.
Lemma 17.7 corresponds to the combination of Lemma 17.4 and Remark 17.3 from
Sect. 17.3.1. Let the operator Φ a : B → B be defined by
The following results may be proved in much the same way as in Sect. 17.3.
Lemma 17.8. If F1 (t0 ) < 1, then the operator Φ a : B → B defined by (17.29) is a
contraction.
Remark 17.6. There exists ya ∈ B such that ya = Φ a ya and limK→∞ yaK − ya = 0.
The preceding remark implies that γ = limK→∞ γK is given by
17.5 Examples
The form of the solution results in the fact that it is difficult to calculate the solution
in an analytic way. In this section, we will present examples of the conditions for
which the solution can be calculated explicitly.
Remark 17.7. If the process ζ2 (t) = Aps,m (ξ s (t)) has decreasing
paths,
then
∗ {3}
the second optimal stopping time is given by τnb = inf{t ∈ Tn ,t0 : Aps,m
(ξ s (t)) ≤ 0}; on the other hand, if ζ2 (t) has nondecreasing paths, then the second
optimal stopping time is equal to t0 .
Similarly, if the process ζ1 (s) = Ap(ξ (s)) has decreasing
paths, then the first
a∗ {1}
optimal stopping time is given by τn = inf{s ∈ Tn ,t0 : Ap(ξ (s)) ≤ 0}; on the
other hand, if ζ1 (s) has nondecreasing paths, then the first optimal stopping time is
equal to t0 .
∗ {3} τnb∗
Proof. From (17.25) we obtain E[Z(s, τnb )|Fns ] = Z(s, Tn ) + E[ {3} (Aps,m )
Tn
(ξ s (z)) dz] a.s., and the application results of Jensen and Hsu [11] complete the
proof.
Corollary 17.2. If S{3} has an exponential distribution with constant hazard rate
α , the function ĝb is increasing and concave, the cost function cb is convex, and
{3}
t2,n = Tn , msn = Mns , then
∗
τnb = inf{t ∈ [t2,n ,t0 ] : α [Eĝb (msn + x{3} − m) − ĝb(msn − m)] ≤ cb (t − s)}, (17.31)
where s is the moment when the location is changed. Moreover, if S{1} has an
exponential distribution with constant hazard rate α1 , ĝa is increasing and concave,
{1}
n{1} , then
ca is convex, and t1,n = Tn , mn = M
∗
τna = inf{s ∈ [t1,n ,t0 ] : α1 Eĝa (mn + x{1} ) − ĝa(mn ) ≤ ca (s)}.
348 A. Karpowicz and K. Szajowski
∗
Proof. The form of τ a ∗n and τnb is a consequence of Remark 17.7. Let us observe
that by our assumptions ζ2 (t) = αΔ b (Mts − m) − cb (t − s) has decreasing paths for
{3} {3} {3} 3 ) = α [Δ b (M s − m) −
t ∈ [Tn , Tn+1 ). It suffices to prove that ζ2 (Tn ) − ζ2 (Tn−1 n
Δ b (Mn−1
s − m)] < 0 for all n ∈ N.
∗ ∗
It remains to check that ȳb− (t0 − s) = 0. We can see that τ b = τ b (s) is
deterministic, which is clear from (17.31). If s ≤ t0 , then combining (17.25), (17.26),
b∗
gives γ s,m (m, s) = E Z(s, τ b )|Fs = Z(s, s) + E sτ (Aps,m )(ξ s (z))
∗
and (17.28)
dz|Fs . By Remark 17.5, it follows that
τ b ∗ (s)
∗
τ b (s)
b
ȳ (t0 − s) = E (Ap s,m
)(ξ (z)) dz =
s
αΔ b (0) − c2(z − s) dz,
s s
17.6 Conclusions
This article presents a solution of the double stopping problem in the “fishing
model” for a finite horizon. The analytical properties of the reward function in
one stopping problem played a crucial rule in our considerations and allowed us to
obtain a solution to the extended double stopping problem. Repeating considerations
from Sect. 17.4, we can easily generalize our model and the solution to the multiple
stopping problem, but the notation can be inconvenient. The construction of the
equilibrium in the two-person non-zero-sum problem formulated in Sect. 17.2 can
be reduced to the two double optimal stopping problems in the case where the payoff
17 Anglers’ Fishing Problem 349
structure is given by (17.5), (17.6), and (17.11). The key assumptions related to
the properties of the distribution functions. Assuming general distributions and an
infinite horizon, one can obtain the extensions of the foregoing model.
References
1. Boshuizen, F., Gouweleeuw, J.: General optimal stopping theorems for semi-Markov
processes. Adv. Appl. Probab. 4, 825–846 (1993)
2. Boshuizen, F.A.: A general framework for optimal stopping problems associated with multi-
variate point processes, and applications. Sequential Anal. 13(4), 351–365 (1994)
3. Brémaud, P.: Point Processes and Queues. Martingale Dynamics. Springer, Berlin (1981)
4. Davis, M.H.A.: Markov Models and Optimization. Chapman & Hall, New York (1993)
5. Ferenstein, E., Pasternak-Winiarski, A.: Optimal stopping of a risk process with disruption
and interest rates. In: Brèton, M., Szajowski, K. (eds.) Advances in Dynamic Games:
Differential and Stochastic Games: Theory, Application and Numerical Methods, Annals of
the International Society of Dynamic Games, vol. 11, 18 pp. Birkhäuser, Boston (2010)
6. Ferenstein, E., Sierociński, A.: Optimal stopping of a risk process. Appl. Math. 24(3), 335–342
(1997)
7. Ferguson, T.: A Poisson fishing model. In: Pollard, D., Torgersen, E., Yang, G. (eds.) Festschrift
for Lucien Le Cam: Research Papers in Probability and Statistics. Springer, Berlin (1997)
8. Haggstrom, G.: Optimal sequential procedures when more then one stop is required. Ann.
Math. Stat. 38, 1618–1626 (1967)
9. Jacobsen, M.: Point process theory and applications. Marked point and piecewise deterministic
processes. In: Prob. and its Applications, vol. 7. Birkhäuser, Boston (2006)
10. Jensen, U.: An optimal stopping problem in risk theory. Scand. Actuarial J. 2, 149–159 (1997)
11. Jensen, U., Hsu, G.: Optimal stopping by means of point process observations with applications
in reliability. Math. Oper. Res. 18(3), 645–657 (1993)
12. Karpowicz, A.: Double optimal stopping in the fishing problem. J. Appl. Prob. 46(2), 415–428
(2009). DOI 10.1239/jap/1245676097
13. Karpowicz, A., Szajowski, K.: Double optimal stopping of a risk process. GSSR Stochast. Int.
J. Prob. Stoch. Process. 79, 155–167 (2007)
14. Kramer, M., Starr, N.: Optimal stopping in a size dependent search. Sequential Anal. 9, 59–80
(1990)
15. Muciek, B.K., Szajowski, K.: Optimal stopping of a risk process when claims are covered
immediately. In: Mathematical Economics, Toru Maruyama (ed.) vol. 1557, pp. 132–139.
Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502 Japan
Kôkyûroku (2007)
16. Nikolaev, M.: Obobshchennye posledovatelnye procedury. Litovskiui Mat. Sb. 19, 35–44
(1979)
17. Rolski, T., Schmidli, H., Schimdt, V., Teugels, J.: Stochastic Processes for Insurance and
Finance. Wiley, Chichester (1998)
18. Shiryaev, A.: Optimal Stopping Rules. Springer, Berlin (1978)
19. Starr, N.: Optimal and adaptive stopping based on capture times. J. Appl. Prob. 11, 294–301
(1974)
20. Starr, N., Wardrop, R., Woodroofe, M.: Estimating a mean from delayed observations. Z. f ür
Wahr. 35, 103–113 (1976)
21. Starr, N., Woodroofe, M.: Gone fishin’: optimal stopping based on catch times. Report No. 33,
Department of Statistics, University of Michigan, Ann Arbor, MI (1974)
22. Szajowski, K.: Optimal stopping of a 2-vector risk process. In: Stability in Probability,
Jolanta K. Misiewicz (ed.), Banach Center Publ. 90, 179–191. Institute of Mathematics, Polish
Academy of Science, Warsaw (2010), doi:10.4064/bc90-0-12
Chapter 18
A Nonzero-Sum Search Game with Two
Competitive Searchers and a Target
Ryusuke Hohzaki
18.1 Introduction
Search theory starts from military affairs. As an application of game theory to search
problem, Morse and Kimball [20] discuss a position planning of the patrol line
in the straits by an anti-submarine warfare (ASW) airplane to block the passages
of submarines. For several decades after the research, they focus on one-sided
problems for an optimal search under the assumption that the stochastic rule on
the behavior of the submarine is known [18].
R. Hohzaki ()
Department of Computer Science, National Defense Academy, 1-10-20 Hashirimizu,
Yokosuka 239-8686, Japan
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 351
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 18,
© Springer Science+Business Media New York 2012
352 R. Hohzaki
Since then, many researchers take one of the game models, named
“hide-and-search game”, where a stationary target hides at a position. Norris [24]
deals with a two-person zero-sum (TPZS) noncooperative game, where a target
hides in a box at first and then a searcher sequentially looks into boxes with possible
overlooking and a payoff is the expected number of looks until the detection of
the target. Baston and Bostock [2] and Garnaev [7] study an ASW game, where
an ASW airplane drops some depth charges to destroy a hidden submarine. They
adopt the destruction probability of the target as a payoff. We can quote Nakai [23],
Kikuta [17] and Iida et al. [16] as the other studies on the hide-and-search game.
Their models are still the TPZS noncooperative game but they adopt a variety
of payoff functions: detection probability, expected reward, expected time until
detection and others.
The hide-and-search game is extended to the game with a moving target, named
“evade-and-search game”. Meinardi [19] analyzes the evade-and-search game,
where a target moves on a line in a diffusive fashion and a searcher looks at a
point on the line sequentially as time elapses. The target is informed of the history
of searched points and then the game is modeled as a multi-stage TPZS game.
Washburn [29] and Nakai [22] adopt the multi-stage model with the payoff of the
moving distance of the searcher until detection of target. Their models are similar
to Meinardi’s one. Danskin [5] discusses a one-shot game played by a submarine
and an ASW airplane. The submarine chooses a point to move to and the airplane
selects a point to drop his dipping sonar buoys for detection of the submarine. Eagle
and Washburn [6] also study the one-shot game, where a searcher moves in a search
space as well as in Danskin.
Hohzaki [11, 12], Iida et al. [15] and Dambreville and Le Cadre [4] deal with an
optimal distribution of searching resources for a searcher and an optimal moving
strategy for a target by a one-shot game called “search allocation game (SAG)”.
For the one-shot SAG, Washburn [30] and Hohzaki [8, 13, 14] take account of
practical constraints such as geographical restriction or energy limitation on moving.
Hohzaki [9] proposes a method to derive an optimal solution for a multi-stage SAG.
As we reviewed the previous research on the search game above, almost all
researchers handle the TPZS noncooperative game of a target versus a searcher
although we can itemize small number of special game models such as Baston and
Garnaev [3], who study a nonzero-sum game with a searcher and a protector of
taking the distribution strategies of resources. However, we can think of cooperative
search situation, in which several searchers cooperate for an effective search for
the target or a drifting person in a shipwreck would take a cooperative action to a
search and rescue (S&R) team, such as firing a distress signal or a flashlight to be
easily detected. Hohzaki [10] is one of few researches on the search defined by the
cooperative game (refer to Owen [26] or Myerson [21]). Using the framework of the
SAG, Hohzaki models the search game with some cooperative searchers against a
moving target under the criterion of detection probability of target. The discussion
includes the imputation of the obtained reward among cooperative searchers of a
team or a coalition, which is a common theme for an ordinary coalition game.
There are other types of cooperative search problems. Alpern and Gal [1] have
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 353
been studying the so-called rendezvous search problem, where players try to meet
each other as soon as possible without knowing the exact position of another
player. In the context of graph theoretic problem, we can enumerate further works.
Parsons [27, 28] studies how many searchers are required to find a mobile hider in a
graph. O’ Rourke [25] theoretically or algorithmically derives the minimum number
of guards to watch over the interior of a polygon-shaped art gallery by computational
geometry. The problem on security by watchmen or guards are named “art gallery
problem”.
In the Hohzaki’s model [10], he proves that searchers have the incentive to
construct a grand coalition and develop his theory based on the assumption that
only the members of the coalition join the search operation for the target. However
there could be a competition between the coalition’s members and nonmembers. In
the treasure hunting from shipwreck, the nonmembers would be going to outwit the
coalition for the preemptive detection of the treasure. In this paper, we discuss a
three-person nonzero-sum noncooperative search game, where two teams or two
coalitions of searchers compete for the detection of a target and the target tries
to evade the teams, and derive a Nash-equilibrium (NE). The results of this paper
would help us step forward to an other type of cooperative game or coalition game,
where several groups of searchers compete each other, other than the Hohzaki’s
model and discuss the incentive of the groups to a larger group or a grand coalition
beyond their competition.
As a preliminary, we consider a search game for a stationary target by a three-
person nonzero-sum noncooperative game model in the next section. In Sect. 18.3,
we discuss a game with a moving target. Because it would be difficult to derive a
NE, we try to do it for a small size of problem at first and propose a computational
algorithm for the NE of the general game with a moving target. In Sect. 18.4, we
analyze the characteristics of the NE by some numerical examples.
We consider the search game where two searchers compete against each other to get
the value of a target while the target evades from them. The problem is formulated
as a three-person nonzero-sum noncooperative game.
(A1) A search space is discrete and it consists of n cells denoted by K =
{1, 2, · · · , n}. A target with value 1 chooses one cell to hide himself.
(A2) Searcher 1 has the amount Φ1 of searching resources in hand and distributes
them in the search space to detect the target while Searcher 2 has the amount
Φ2 of resources.
(A3) If the target is in cell i and the searcher scatters x resources there, the searcher
can detect the target with probability fi (x) = 1 − exp(−αi x), where parameter
αi indicates the effectiveness of unit resource for detection. The event of
detection by one searcher is independent of the detection by the other.
354 R. Hohzaki
(A4) If only one searcher detects the target, the detector monopolizes the value of
the target 1. If both of them do, Searcher 1 gets a reward δ1 and Searcher 2
δ2 , where δ1 + δ2 does not necessarily equal 1. The target is given 1 only if he
is not detected.
We denote a mixed strategy of the target by p = {p1 , p2 , · · · , pn }, where pi is the
probability of hiding in cell i. Let us denote a pure strategy of Searcher 1 or 2 by
x = {xi , i ∈ K} or y = {yi , i ∈ K}, respectively. xi or yi is the respective amount
of resource distributed in cell i by Searcher 1 or 2. We denote feasible regions of
players’ strategies p, x and y by Π , Ψ1 and Ψ2 , which are given by
Π ≡ p ∈ Rn |pi ≥ 0, i ∈ K, ∑ pi = 1
i∈K
Ψ1 ≡ x ∈ R |xi ≥ 0, i ∈ K,
n
∑ xi ≤ Φ1
i∈K
Ψ2 ≡ y ∈ R |yi ≥ 0, i ∈ K,
n
∑ yi ≤ Φ2 .
i∈K
The searchers had obviously better use up all resources because the detection
function fi (x) is monotone increasing for x, as stated in (A3). Therefore, we can
replace inequality signs with equality signs in the definitions of Ψ1 and Ψ2 .
From the independency of detection events in Assumption (A3), three players of
the target, Searcher 1 and 2 would have the expected rewards or payoffs Q(p, x, y),
R1 (p, y, x) and R2 (p, y, x), expressed by
The payoff R1 (·) is strictly concave for variable x and R2 (·) is also the same for y.
The feasible regions Ψ1 and Ψ2 are closed convex sets. Therefore, if there is a
Nash equilibrium (NE), we can find it among pure strategies of the searchers. From
here, we consider maximization problems for these expected payoffs and derive an
optimal response of a player to others.
1. Optimal response of the target
We can transform a maximization problem of the target’s payoff Q(p, x, y), which
is the non-detection probability of the target, as follows.
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 355
As seen by the transformation from the second expression to the third, an optimal
target’s strategy p∗ ∈ Π is given by p∗i = 0 for i ∈ / I ∗ and arbitrary p∗i for i ∈ I ∗ ,
using a set of cells I ≡ {i ∈ K|αi (xi + yi ) = ν ≡ mins∈K αs (xs + ys )} or I ∗ ≡
∗
1
ν= (Φ1 + Φ2 ) (18.4)
∑s∈K 1/αs
1/αi
xi + yi = (Φ1 + Φ2 ), i ∈ K. (18.5)
∑s∈K 1/αs
min
x
∑ pi exp(−αi xi ), s.t. xi ≥ 0, i ∈ K, ∑ xi ≤ Φ . (18.6)
i∈K i∈K
1 pi αi +
xi = log , (18.7)
αi λ
we apply Eq. (18.7) to the original problem with two searchers to derive an optimal
response x for Searcher 1 as
+ +
1 pi αi D1i (y) 1 pi αi
xi = log = log + log {(1 − δ1) exp(−αi yi ) + δ1 } ,
αi λ1 αi λ1
(18.8)
given other strategies p and y. Similarly, we have an optimal response y for Searcher
2 given strategies p and x, as follows:
+
1 pi αi
yi = log + log{(1 − δ2 ) exp(−αi xi ) + δ2 } . (18.9)
αi λ2
Optimal Lagrangian multipliers λ1 and λ2 in Eqs. (18.8) and (18.9) are determined
by conditions ∑i xi = Φ1 and ∑i yi = Φ2 , respectively.
As a result, we organize the necessary and sufficient conditions for optimal x, y
and p in the following system of equations.
+
1 pi αi
xi = log + log{(1 − δ1 ) exp(−αi yi ) + δ1 } , i ∈ K (18.10)
αi λ1
+
1 pi αi
yi = log + log{(1 − δ2 ) exp(−αi xi ) + δ2 } , i ∈ K (18.11)
αi λ2
1/αi
xi + yi = (Φ1 + Φ2 ), i ∈ K (18.12)
∑ j∈K 1/α j
∑ xi = Φ1 or ∑ yi = Φ2 (18.13)
i∈K i∈K
∑ pi = 1. (18.14)
i∈K
We need only one of equations (18.13) for a full system because we can derive the
other equation of (18.13) from Eq. (18.12). The total number of variables x, y, p,
λ1 and λ2 is 3|K| + 2, which is the same as the number of equations contained in
the system. If all equations of the system are independent, optimal variables are
uniquely determined.
We can show that the following solution satisfies the conditions above.
1/αi
x∗i = Φ1 (18.15)
∑ j∈K 1/α j
1/αi
y∗i = Φ2 (18.16)
∑ j∈K 1/α j
1/αi
p∗i = . (18.17)
∑ j∈K 1/α j
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 357
We can easily see that the above strategies satisfy the conditions (18.12)∼(18.14).
Noting that αi x∗i , αi y∗i , and αi p∗i do not depend on the cell number i, we can derive
Lagrangian multipliers λ1 and λ2 by substituting Eqs. (18.15)∼(18.17) in (18.10)
and (18.11).
Φ1 Φ2 1
λ1∗ = exp −
∑ j 1/α j
(1 − δ1) exp −
∑ j 1/α j
+ δ1 ∑ αj (18.18)
j
Φ2 Φ1 1
λ2∗ = exp −
∑ j 1/α j
(1 − δ2) exp −
∑ j 1/α j
+ δ2 ∑ αj (18.19)
j
Let us note that the optimal strategies (18.15), (18.16) and (18.17) are also optimal
for the TPZS game, where a searcher with Φ1 + Φ2 resources and a target fight
against each other for the payoff of non-detection probability. We can easily verify
the optimality of x∗i + y∗i and p∗i for the TPZS game by solving the following
minimax or maximin optimization where a searcher’s strategy is zi = xi + yi
generated by the original strategies of two searchers, xi and yi .
Here we consider the nonzero-sum game with two searchers and a moving target.
Two searchers play in a noncooperative manner for the detection of target. The target
moves in a search space to avoid the detection. The game with a moving target is
modeled as follows:
(B1) A search space consists of a discrete cell space K = {1, · · · , K} and a discrete
time space T = {1, · · · , T }.
(B2) A target chooses one among a set of routes Ω to move along. His position on
a route ω ∈ Ω at time t ∈ T is represented by ω (t) ∈ K.
358 R. Hohzaki
(B3) Two searchers distribute their searching resources to detect the target. Φk (t)
resources are available at each time t for Searcher k = 1, 2. Searchers can start
distributing resource from time τ ∈ T.
(B4) The detection of the target by the distribution of x resources in cell i occurs
with probability 1 − exp(−αi x) only if the target is in the cell i. The parameter
αi indicates the effectiveness of unit resource in the cell i. The events of
detection by two searchers are independent of each other.
If a searcher detects the target, the detector monopolizes the value of the
target 1. If both searchers detect, Searcher 1 and 2 get reward δ1 and δ2 (0 ≤
δ1 , δ2 ≤ 1), respectively. The game is terminated on detection of the target or
at the last time T . The target is given 1 only if he is not detected.
(B5) Players do not know any information about the behavior of other players and
the situation of the search in the process of the game. Therefore, all players
make their plans or strategies in advance of the game.
Let T
= {τ , τ + 1, · · · , T } be an available time period for searching. We denote
a distribution plan of Searcher k = 1, 2 by ϕk = {ϕk (i,t), i ∈ K, t ∈ T},
where
ϕk (i,t) ∈ R is the amount of searching resources distributed in cell i at time t, and a
mixed strategy of the target by π = {π (ω ), ω ∈ Ω }, where π (ω ) is the probability
of taking path ω ∈ Ω .
When the target takes a path ω and the searchers adopt their strategies ϕ1 and ϕ2 ,
non-detection probability Q(ω , ϕ1 , ϕ2 ) is given by
T
Q(ω , ϕ1 , ϕ2 ) = exp − ∑ αω (t) (ϕ1 (ω (t),t) + ϕ2 (ω (t),t)) . (18.20)
t=τ
Because the detection event is exclusive at each time, the total expected reward of
Searcher k, Rk (ω , ϕk , ϕ j ), (k, j) ∈ {(1, 2), (2, 1)}, is given
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 359
T t−1
Rk (ω , ϕk , ϕ j ) = ∑ exp − ∑ αω (ζ ) (ϕk (ω (ζ ), ζ ) + ϕ j (ω (ζ ), ζ ))
t=τ ζ =τ
× (1 − exp(−αω (t) ϕk (ω (t),t))) × exp(−αω (t) ϕ j (ω (t),t))
+ δk (1 − exp(−αω (t) ϕ j (ω (t),t))) (18.21)
for the target path ω . As a result, we have the payoffs of the target and Searcher k,
Q(π , ϕ1 , ϕ2 ) and Rk (π , ϕk , ϕ j ), as follows:
Now that we have formulated the three-person nonzero-sum game with a target and
two searchers, the next thing to do is to obtain a NE which maximizes Q(π , ϕ1 , ϕ2 ),
R1 (π , ϕ1 , ϕ2 ) and R2 (π , ϕ2 , ϕ1 ) with respect to π , ϕ1 and ϕ2 , respectively. The
optimality conditions of the NE, (π ∗ , ϕ1∗ , ϕ2∗ ), are
All are closed convex sets. We can see that the payoff function, Q(π , ϕ1 , ϕ2 ) is
linear for π and convex for ϕ1 and ϕ2 . We are going to prove the strictly concavity
of Rk (ω , ϕk , ϕ j ) for ϕk .
Using a notation
t−1
β j (ω ,t) ≡ exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ )) × exp(−αω (t) ϕ j (ω (t),t))
ζ =τ
+δk (1 − exp(−αω (t) ϕ j (ω (t),t))) ,
360 R. Hohzaki
Rk (ω , ϕk , ϕ j )
T t−1
= ∑ β j (ω ,t) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) (1− exp(−αω (t) ϕk (ω (t),t)))
t=τ ζ =τ
T t−1 t
= ∑ β j (ω ,t) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) − exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ )
t=τ ζ =τ ζ =τ
T −1 t
= β j (ω , τ ) − ∑ (β j (ω ,t) − β j (ω ,t + 1)) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ )
t=τ ζ =τ
T
−β j (ω , T ) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) . (18.27)
ζ =τ
Noting
t
β j (ω ,t + 1) = exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ ))
ζ =τ
× (1 − δk ) exp(−αω (t+1) ϕ j (ω (t + 1),t + 1)) + δk
t−1
≤ exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ ))
ζ =τ
× (1 − δk ) exp(−αω (t) ϕ j (ω (t),t)) + δk
= β j (ω ,t),
∂ R1
= {π (1)α1 exp(−α1 (x + y)) − π (2)α2 exp(−α2 (Φ1 (1) + Φ2 (1) − x − y))}
∂x
× {(1 − δ1 ) − (1 − exp(−α1 Φ1 (2)))
(exp(−α1 Φ2 (2)) + δ1 (1 − exp(−α1 Φ2 (2))))}
+δ1 {π (1)α1 exp(−α1 x) − π (2)α2 exp(−α2 (Φ1 (1) − x))} (18.30)
∂ R2
= {π (1)α1 exp(−α1 (x + y)) − π (2)α2 exp(−α2 (Φ1 (1) + Φ2 (1) − x − y))}
∂y
× {(1 − δ2 ) − (1 − exp(−α1 Φ2 (2)))
(exp(−α1 Φ1 (2)) + δ2 (1 − exp(−α1 Φ1 (2))))}
+δ2 {π (1)α1 exp(−α1 y) − π (2)α2 exp(−α2 (Φ2 (1) − y))} . (18.31)
362 R. Hohzaki
α2
Φ ≡ (Φ1 (1) + Φ2 (1)),
α1 + α2
as follows:
Similarly, the value in the parenthesis { } in the second line of Eq. (18.31)
changes its sign with a threshold
for δ2 .
We are going to prove that x + y = Φ must not hold for any optimal x and y
by classifying δ1 and δ2 into four cases. In the process of proof, we might refer to
Eqs. (18.29)∼(18.34).
(i) Case of δ1 < δ1∗ and δ2 < δ2∗ :
If x + y < Φ , it must be π (1) = 1 from Eq. (18.32) and then R1 (·) mono-
tonically increases for x because ∂ R1 /∂ x > 0 from Eq. (18.30). R2 (·) is also
monotone increasing for y. Therefore, x and y is never optimal within x + y <
Φ . If x+y > Φ , it must be π (1) = 0, from which ∂ R1 /∂ x < 0 and ∂ R2 /∂ y < 0
hold and then smaller x and y are much better for the searcher. The condition
x + y > Φ is never valid for any optimal x and y.
(ii) Case of δ1 < δ1∗ and δ2 > δ2∗ :
If x + y < Φ , we have π (1) = 1 and ∂ R1 /∂ x > 0. This implies that larger x is
desirable for Searcher 1. Concerning ∂ R2 /∂ y of Eq. (18.31), we have
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 363
∂ R2 (π , y, x) ∂ R2 (π , y, x)
≥ = α1 exp (−α1 y)
∂y ∂y x=0
where index (k, j) is one of (1, 2) or (2, 1). Zero point (x∗ , y∗ ) of equations
∂ R1 /∂ x = ∂ R2 /∂ y = 0 gives us the NE of maximizing both payoffs R1 and R2 .
To clarify the relation among optimal x∗ , y∗ and π (1), we solve these equations with
respect to π (1) using π (2) = 1 − π (1), and then we have
The functions in the right-hand sides of Eqs. (18.39) and (18.40) are monotone
increasing for x∗ , y∗ and then x∗ + y∗ is increasing for π (1). When we draw the
function x∗ + y∗ and a horizontal line of Φ on the axis of π (1), a crossing point
between these two curves gives us optimal π ∗ (1). Figure 18.2 shows the function
x∗ + y∗ with respect to π (1) and the optimal response of the target (18.32)−(18.34),
in a general way. We might recall that the function x∗ + y∗ is derived from
Eqs. (18.37) and (18.38) under the condition x + y = Φ . Basically, we should
have used the function x + y of π (1) directly derived from simultaneous equations
∂ R1 /∂ x = 0 and ∂ R2 /∂ y = 0 using Eqs. (18.30) and (18.31). But this derivation
would be difficult. Anyway, we obtain the same NE both ways. From Eqs. (18.37)
and (18.38), we can make sure that ∂ R1 /∂ x = 0 and ∂ R2 /∂ y = 0 hold for variables
π (1), π (2), x and y satisfying π (1)α1 = π (2)α2 , α1 x = α2 (Φ1 (1) − x)) and
α1 y = α2 (Φ2 (1) − y). An equation x + y = Φ is also valid. Therefore, we have
the following conclusion about the NE and the non-detection probability although
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 365
α2 α1
π ∗ (1) = , π ∗ (2) = (18.41)
α1 + α2 α1 + α2
α2
x∗ = Φ1 (1) (18.42)
α1 + α2
α2
y∗ = Φ2 (1) (18.43)
α1 + α2
∗ ∗ ∗ α1 α2
Q(π , x , y ) = exp − (Φ1 (1) + Φ2 (1)) − α1 (Φ1 (2) + Φ2 (2)) .
α1 + α2
(18.44)
The optimal variables π ∗ and x∗ + y∗ also give us a NE for the TPZS game with the
non-detection probability Q(π , x, y) as a payoff, where two searchers are regarded
as one player against the target. Thus we might pay attention to the equivalence
between the nonzero-sum game with three persons and the zero-sum game with two
persons. In the original nonzero-sum game, two searchers do not need to cooperate
in searching for the target because parameters δ1 and δ2 are not necessarily set to
be δ1 + δ2 = 1. Nevertheless, they are possibly motivated to be cooperative by the
selfish behavior of the target aiming to minimize the non-detection probability, as we
see in this special case of the moving target problem. We can present another case
in Sect. 18.4, where the target could exploit the noncooperative behavior between
two searchers to direct the situation to the better with less detection probability.
Pk (ϕ j ; π ) : max Rk (π , ϕk , ϕ j )
ϕk ,λ
ϕk (i,t) ≥ 0, i ∈ K, t ∈ T (18.46)
+
g(ω , ϕk + ϕ j ) = λ , ω ∈ Ω (π ) (18.47)
g(ω , ϕk + ϕ j ) ≥ λ , ω ∈ Ω \Ω + (π ), (18.48)
Conditions (18.47) and (18.48) are necessary to keep π be optimal for ϕ1 and ϕ2 ,
as seen from Lemma 18.1. We have a theorem for the NE.
Theorem 18.1. If a sequence of solutions converges to some (ϕ1∗ , ϕ2∗ ) by the
repetition of solving Problem Pk (ϕ j ; π ) with fixed π for (k, j) = (1, 2), (2, 1), the
solution of π , ϕ1∗ , and ϕ2∗ is a Nash-equilibrium. There exists a Nash-equilibrium for
any target strategy π if Problem Pk (ϕ j ; π ) is well-defined for (k, j) = (1, 2), (2, 1).
Proof. The strategy ϕk∗ is evidently an optimal response to other players’ strategies
π and ϕ ∗j . The rest we have to prove is to verify the optimality of π for ϕ1∗ and
ϕ2∗ . Let λ ∗ be an optimal multiplier λ of Problem Pk (ϕ j ; π ). The non-detection
probability becomes
= ∑ π (ω ) exp(−λ ∗ ) = exp(−λ ∗ ).
ω ∈Ω + ( π )
This implies that the target does not have any incentive to change his current
strategy π .
The problem Pk (ϕ j ; π ) has a unique solution from their strictly concavity if
the problem is well-defined or its feasible region is not empty. A sequence of
the solutions is a mapping of a new point (ϕ1 , ϕ2 ) from an old one (ϕ1 , ϕ2 ) by
solving problems ϕ1 = argmaxϕ1 R1 (π , ϕ1 , ϕ2 ) and ϕ2 = arg maxϕ2 R2 (π , ϕ2 , ϕ1 ).
The mapping is closed from the continuity of functions R1 (·), R2 (·) and the
closed convexity of the feasible region defined by conditions (18.45)−(18.48),
and therefore it has a fixed point from the Kakutani’s fixed-point theorem, that is,
(ϕ1 , ϕ2 ) coincides with (ϕ1 , ϕ2 ). Therefore, there exists a Nash equilibrium for any
target strategy π .
methodology sometimes fails to find the convergence point by the swing of the
temporary solutions in the process of calculation. To avoid the vibration of the
solutions, the objective with penalty function could be effective. Let us substitute
such function
Rk (π , ϕk , ϕ j ) ≡ Rk (π , ϕk , ϕ j ) − γ ||ϕk − ϕ
k ||2
for the original objective function in problem Pk (ϕ j ; π ) (k = 1, 2), and denote the
renewed problem by Pk (ϕ j ; π ). ϕ
k is the current solution of Searcher k’s strategy
and gamma is a parameter for adjustment. If we find a convergence point mentioned
in Theorem 18.1, the point is the NE, aside from the algorithmic idea for the
practical computation of the NE. We can anticipate that there would be many NEs
from Theorem 18.1. We are going to propose a reasonable target strategy π , based
on which we can derive the convergence point (ϕ1∗ , ϕ2∗ ) for optimal searchers’
strategies.
A thoughtful target would think of the worst case that searchers’ strategies ϕ1 and
ϕ2 are totally against his interest to make the non-detection probability Q(π , ϕ1 , ϕ2 )
as small as possible. The target has to respond optimally to the worst case that two
searchers cooperate in minimizing Q(π , ϕ1 , ϕ2 ). We can regard the case as a TPZS
game with non-detection probability as a payoff. In the game, the target chooses
one path ω ∈ Ω as a maximizer and a team of two searchers makes a plan of
distribution ϕ (i,t) = ϕ1 (i,t) + ϕ2 (i,t) as a minimizer. The non-detection probability
or the payoff is given by
T
Q(ω , ϕ ) = exp − ∑ αω (t) ϕ (ω (t),t) ,
t=τ
which is modified from Eq. (18.20). Fortunately, we already have a research on this
kind of TPZS search game, by Hohzaki [10]. It says that we obtain an optimal
strategy of searchers ϕ ∗ from the following linear programming formulation:
PS : w = max η
ϕ ,η
ϕ (i,t) ≥ 0, i ∈ K, t ∈ T,
and an optimal strategy of target π ∗ from the following problem, which is dual to
Problem (PS ) above.
DT : w = min
ν ,π
∑ ν (t)(Φ1 (t) + Φ2(t))
t∈T
368 R. Hohzaki
s.t. ∑ π (ω ) = 1
ω ∈Ω
π (ω ) ≥ 0, ω ∈ Ω
αi ∑
π (ω ) ≤ ν (t), i ∈ K, t ∈ T,
ω ∈Ωit
where Ωit is a set of paths passing through cell i at time t and is defined by Ωit ≡
{ω ∈ Ω |ω (t) = i}. The resulting non-detection probability is calculated by exp(−w)
using the optimal value w of the problem above.
At the end of this section, we incorporate the discussion so far into an algorithm
to derive a NE for the original three-person nonzero-sum search game.
Algorithm AL2S
(ii) Using π = π ∗ , repeat solving convex problem Pk (ϕ j ; π ) for (k, j) = (1, 2) and
(2, 1) by turns. If their solutions ϕ1∗ and ϕ2∗ converge, the obtained π , ϕ1∗ and
ϕ2∗ are a Nash equilibrium. The resulting payoff of each player are given by
Q(π , ϕ1∗ , ϕ2∗ ) and Rk (π , ϕk∗ , ϕ ∗j ), (k, j) = (1, 2), (2, 1).
We took a small size of problem in Sect. 18.3.1 to derive an analytical form of NE.
Here we take a larger size of problem to numerically analyze the property of the NE
by applying the algorithm proposed in Sect. 18.3.2.
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 369
t=
Cells
Table 18.3 Optimal distribution of resource (Case 2: π = π = (0.4, 0.4, 0, 0, 0.2), order of
k = 1, 2)
t
Cells 1 2 3 4 5 6 7 8 9 10
Searcher 1
1 0 0.254 0.200 0.200 0.149 0.149 0.150 0.150 0.150 0.150
2 0.255 0 0.200 0.199 0.149 0.149 0.149 0.150 0.150 0.150
3 0.173 0.173 0 0 0.202 0.201 0.201 0.201 0.201 0.200
4 0.072 0.073 0.100 0.101 0 0 0 0 0 0
Φ1 (t) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
Searcher 2
1 0.174 0.328 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
2 0.326 0.172 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
3 0 0 0 0 0.277 0.278 0.278 0.278 0.278 0.278
4 0 0 0 0 0 0 0 0 0 0
Φ2 (t) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
we apply Algorithm AL2S to three cases of (δ1 , δ2 ) = (1, 1) (Case 1), (0.8, 0.2)
(Case 2) and (0, 0) (Case 3) to derive the NEs. Optimal distributions of searching
resource are almost the same as Table 18.1 with some small differences in the three
cases. They give the target almost the same non-detection probability 0.513 although
the detailed probabilities are 0.51346, 0.51347 and 0.51348 for Case 1, 2 and 3,
respectively, reasonably reflecting the favorableness of the simultaneous detection
based on δ -value. By these NEs, Searcher 1 and 2 get the rewards (0.249, 0.249)
(Case 1), (0.247, 0.240) (Case 2) and (0.238, 0.238) (Case 3). The reward tends to
decrease as the value δk gets smaller. However, we can say that the influence of
the simultaneous detection by both searchers on the reward is not so large as the
total detection probability by either searcher. That is why, for any case, the optimal
distribution of resource is near to the initial distribution ϕk0 , which is derived from
Problem PS under the criterion of total detection probability.
We check another target strategy π = π = (0.4, 0.4, 0, 0, 0.2) different from
π ∗ , in Case 2. After applying Algorithm AL2S to this case, we have Table 18.3 as an
optimal distribution ϕk∗ for two searchers (k = 1, 2).
In this case, the resulting non-detection probability is 0.525 and the rewards are
0.214 and 0.261 for Searcher 1 and 2. Searcher 2 can expect more reward than
Searcher 1 in spite of δ1 = 0.8 and δ2 = 0.2. The results are advantageous to the
target and Searcher 2 but disadvantageous to Searcher 1, comparing with the results
by the original target strategy π ∗ . The advantage depends on the order of calculation
in Step (ii) of Algorithm AL2S . The results above are brought by the order k = 1, 2.
If we change the order to k = 2, 1, we have the distribution obtained by exchanging
two distribution plans for two searchers in Table 18.3. The results also have the
same non-detection probability as the above but bring the expected rewards 0.266
and 0.208 to Searcher 1 and 2, respectively. The rewards become advantageous to
Searcher 1 but disadvantageous to Searcher 2. These phenomena often appear in
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 371
the repeated game or the game with a leader and a follower, where the leading
player with the declaration of his intention or his strategy is usually in the favoring
position. In Algorithm AL2S , ϕk0 (i,t) is declared first by Searcher 2 and used in the
first calculation as fixed parameters in the case of order k = 1, 2. In the case of order
k = 2, 1, the first declaration is done by Searcher 1. Anyway, these distributions are
both the NEs or there are two NEs at least for the target strategy π. Both these NEs
are more favorite than the NE for π ∗ for the target. Now we may have the lesson
that the target could lead the game to more advantageous Nash-equilibrium points
if it lets the searchers carry such a belief on the target strategy as π.
18.5 Conclusion
References
1. Alpern, S., Gal, S.: The Theory of Search Games and Rendezvous. Kluwer Academic, Boston
(2003)
2. Baston, V.J., Bostock, F.A.: A one-dimensional helicopter-submarine game. Naval Res.
Logistics 36, 479–490 (1989)
3. Baston, V.J., Garnaev, A.Y.: A search game with a protector. Naval Res. Logistics. 47, 85–96
(2000)
4. Dambreville, F., Le Cadre, J.P.: Search game for a moving target with dynamically generated
informations. In: Proceedings of the 5th International Conference on Information Fusion
(FUSION’2002), pp. 243–250 (2002)
5. Danskin, J.M.: A helicopter versus submarine search game. Oper. Res. 16, 509–517 (1968)
6. Eagle, J.N., Washburn, A.R.: Cumulative search-evasion games. Naval Res. Logistics 38,
495–510 (1991)
7. Garnaev, A.Y.: A remark on a helicopter-submarine game. Naval Res. Logistics 40, 745–753
(1993)
8. Hohzaki, R.: Search allocation game. Eur. J. Oper. Res. 172, 101–119 (2006)
9. Hohzaki, R.: A multi-stage search allocation game with the payoff of detection probability.
J. Oper. Res. Soc. Jpn. 50, 178–200 (2007)
10. Hohzaki, R.: A cooperative game in search theory. Naval Res. Logistics 56, 264–278 (2009)
11. Hohzaki, R., Iida, K.: A search game with reward criterion. J. Oper. Res. Soc. Jpn. 41, 629–642
(1998)
12. Hohzaki, R., Iida, K.: A solution for a two-person zero-sum game with a concave payoff
function. In: Takahashi, W., Tanaka, T. (eds.) Nonlinear Analysis and Convex Analysis,
pp. 157–166. World Science Publishing Co., London (1999)
13. Hohzaki, R., Iida, K., Komiya, T.: Discrete search allocation game with energy constraints.
J. Oper. Res. Soc. Jpn. 45, 93–108 (2002)
14. Hohzaki, R., Washburn, A.: An approximation for a continuous datum search game with energy
constraint. J. Oper. Res. Soc. Jpn. 46, 306–318 (2003)
15. Iida, K., Hohzaki, R., Furui, S.; A search game for a mobile target with the conditionally
deterministic motion defined by paths. J. Oper. Res. Soc. Jpn. 39, 501–511 (1996)
16. Iida, K., Hohzaki, R., Sato, K.: Hide-and-search game with the risk criterion. J. Oper. Res. Soc.
Jpn. 37, 287–296 (1994)
17. Kikuta, K.: A search game with traveling cost. J. Oper. Res. Soc. Jpn. 34, 365–382 (1991)
18. Koopman, B.O.: The theory of search. III. The optimum distribution of searching effort. Oper.
Res. 5, 613–626 (1957)
19. Meinardi, J.J.: A sequentially compounded search game. In: Mensch A. (ed.) Theory of Games:
Techniquea and Applications, pp. 285–299. The English Universities Press, London (1964)
20. Morse, P.M., Kimball, G.E.: Methods of Operations Research. MIT, Cambridge (1951)
21. Myerson, R.B.: Game Theory: Analysis of Conflict. Harvard University Press, Cambridge
(1991)
22. Nakai, T.: A sequential evasion-search game with a goal. J. Oper. Res. Soc. Jpn. 29, 113–122
(1986)
23. Nakai, T.: Search models with continuous effort under various criteria. J. Oper. Res. Soc. Jpn.
31, 335–351 (1988)
24. Norris, R.C.: Studies in search for a conscious evader. MIT Technical Report, No. 279 (1962)
25. O’Rourke, J.: Art Gallery Theorems and Algorithms. Oxford University Press, New York
(1987)
26. Owen, G.: Game Theory, pp. 212–224. Academic, New York (1995)
27. Parsons, T.D.: Pursuit-Evasion in a Graph. In: Alavi, Y., Lick, D.R. (eds.) Theory and
Applications of Graphs. Springer, Berlin (1976)
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 373
28. Parsons, T.D.: The search number of a connected graph. In: Proceedings of the 10th Southeast-
ern Conference on Combinatorics, Graph Theory and Computing, pp. 549–554 (1978)
29. Washburn, A.R.: Search-evasion game in a fixed region. Oper. Res. 28, 1290–1298 (1980)
30. Washburn, A.R., Hohzaki, R.: The diesel submarine flaming datum problem. Milit. Oper. Res.
4, 19–30 (2001)
Chapter 19
Advertising and Price to Sustain The Brand
Value in a Licensing Contract
Alessandra Buratto
Abstract One of the reasons that induce a brand owner to issue a licensing
contract is that of improving the value of his brand. In this paper, we look at a
fashion licensing agreement where the licensee produces and sells a product in a
complementary business. The value of a fashion brand is sustained by both the
advertising efforts of the licensor and the licensee. We assume that demand is
proportional to the brand value and decreases with the price. The licensor wants
to maximize his revenue coming from the royalties and to minimize his advertising
costs. Moreover, he does not want his brand to be devalued at the end of the selling
season. On the other hand, the licensee plans her advertising campaign in order to
invest in the brand value and maximize the sales revenue. The aim of this paper
is to analyze the different strategies the licensor can adopt to sustain his brand. To
this end, we determine the optimal advertising policies by solving a Stackelberg
differential game, where the owner of the brand acts as the leader and the licensee
as the follower. We determine the equilibrium policies of the two players assuming
that advertising varies over time and price is constant. We also determine a minimum
selling price which guarantees brand sustainability without advertising too much.
19.1 Introduction
Let’s consider a licensing contract between the owner of a brand (licensor) and a
manufacturer (licensee) who produces and sells a product with the licensor’s brand.
We focus on a particular type known as complementary business licensing which
A. Buratto ()
Department of Mathematics, Via Trieste 63, I-35131 Padova, Italy
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 377
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 19,
© Springer Science+Business Media New York 2012
378 A. Buratto
is, for instance, the case with an owner of a brand of clothes that licenses his
brand to a producer of accessories or perfumes [19]. This type of contract is very
common nowadays. In fact, “Brands allow larger firms to diversify from clothing
into other markets, outside their core business: perfumes, accessories . . . ” [17].
Licensing itself can be considered to be a brand extension strategy according to
Keller’s definition [12].
A licensing contract may last for several years, even decades. Nevertheless,
at the beginning of every selling season for each new product, the two agents
involved have to come to an agreement in setting the selling price and must plan
the advertising campaigns they are respectively going to carry out. Advertising
coordination is quite important in any vertical channel and becomes crucial in a
licensing agreement [18].
The importance of the brand value is well known, especially in the fashion
business, [1]. Several papers study brand dilution and imitation in the fashion
industry, see [5, 6, 9]. Its links to the price have been studied since the early fifties,
with the analysis of the different effects of price over demand, [13]. More recently in
the game theory context such effects have been formalized. For example, in [2, 15]
it is stressed that “In fashion goods price increases the brand value.” In [5] the
authors say that the price of prestige goods should not be too low and licensing
is a mechanism for expanding sales; however, one of its risks is brand dilution. On
the other hand, advertising may increase the brand value and a good advertising
campaign can be useful in guaranteeing brand sustainability.
Here we tackle the issue in a game theoretical context, taking into account
production costs too. In this paper we wish to determine the equilibrium policies
of a licensor assuming that advertising varies over time and price is constant. We
conduct a sensitivity analysis with respect to the price parameter, just in order to
see if some particular prices can guarantee brand sustainability. Imposing to the
licensee a given price, among these values, is one of the licensor’s strategies to
sustain the brand itself.
The aim of this paper is to provide a guideline for the owner of a brand who
is concerned about the sustainability of his brand. We analyze different approaches
he can follow in order to achieve this task, in view of the different dynamics
between the two agents involved in the agreement. We consider the selling price as
already fixed and we tackle the problem of determining the advertising campaigns
by solving a Stackelberg game with the Licensor as the leader and the licensee as
the follower. In fact, the licensor lays down the law in a licensing contract and can
sever the agreement whenever the licensee does not meet his target [18]. Each player
determines his optimal advertising strategy, maximizing the profits coming from the
sale of the product and minimizing the advertising costs. Similar approaches, within
differential game theory, are common in order to determine the advertising strategies
in a vertical channel, see [8, 11], and they have been used in the franchising context
too (see f.e. [7,14]). An attempt has been made in [4] for a licensing contract without
considering brand sustainability.
In the following we will analyze the brand sustainability problem from the
licensor’s point of view.The licensor can achieve the sustainability of his brand
19 Advertising and Price to Sustain The Brand Value in a Licensing Contract 379
through advertising, either cooperating with the licensee in maximizing the sum
of their profits and then both of them have to take into account the production costs
and to take care of the brand value sustainability or in a competitive context, such as:
• By sharing with the licensee the necessary increase of the advertising effort.
• By increasing his usual advertising effort, without binding the licensee to do the
same.
• By forcing the licensee to increase her advertising effort, considering the
sustainability constraint in her advertising plan.
Alternatively
• He can wonder if there exists a minimum selling price to impose to the licensee,
such that the brand value is sustained without spending too much for advertising.
We will analyze the licensor’s brand sustainability problem in Sect. 19.2, in
Sect. 19.3 we calculate the advertising strategies of the two players and we give
an operational rule for the licensor to choose which is the optimal advertising effort
with respect to the selling price value. Moreover we determine a minimum selling
price which guarantees brand sustainability. In Sect. 19.4 we consider the fully
cooperative solution and compare it to the Stackelberg solution, wondering if the
former type of strategy is effectively the best option for the licensor.
Let’s denote by [0, T ] the selling season, during which the two players make their
advertising campaigns. In view of the short-term nature of the problem we do not
discount future profits. We model the brand value using the goodwill state variable
introduced by Nerlove and Arrow [16]. Let’s denote by G(t) the brand value,
goodwill, at time t ∈ [0, T ] and by G0 > 0 the initial brand value which we assume
to be sufficiently high, just because only famous brands are licensed.
We assume that the brand value is increased by both the advertising efforts,
aL (t) ≥ 0, al (t) ≥ 0 and the price difference, as follows
where γL > 0 and γl > 0 are the advertising efficiencies of the two advertising
messages and δ > 0 is the decaying rate. Observe that the index i = L stands for
the licensor, while the index i = l stands for the licensee.
The additive term β (p − pR) represents the snob effect by which the brand value
increases with the selling price, p; while pR is a reference price, known as regular
price [15]; it is the price the licensor considers fair/proper for this type of branded
product. If the selling price is greater than the reference price, then the brand value
will be increased. On the other hand, if the selling price is too low, p < pR , then
the brand is under valued. Therefore β ≥ 0 is called price sensitivity toward the
brand value.
380 A. Buratto
From now on, we will refer to brand value sustainability as the request of having
a brand value level greater than or equal to the initial one at the end of the selling
season. Being
G(0) = G0 , (19.2)
G(T ) ≥ G0 . (19.3)
For granting the use of the brand, the licensee has to pay the royalties to the
licensor, they generally consist of a percentage of the sales, that is
where r ∈ (0, 1) is the royalty percentage. Observe that the royalty percentage of the
revenues from the sales that the licensee has to pay to the licensor is also exogenous
and constant, and as a consequence, the unique strategic marketing instruments are
the advertising efforts of the licensor and the licensee. The licensee’s revenue after
paying the royalties is therefore
Production costs are linked to the demand according to the following linear form
(1 − r)p ≥ c,
this means requiring that marginal revenues are greater than marginal costs and such
an assumption is reasonable for any manufacturer.
We consider quadratic advertising costs
1
Clpu (al ) = cl a2l t ∈ [0, T ] , (19.9)
2
with cl > 0 as licensor’s cost parameter.
The licensee has to solve the following problem
T
cl 2
Pl1 : max al ≥0 Jl1 (aL , al , p) = ((1 − r)p − c)(G(t) − θ p) − a (t) dt,
0 2 l
(19.10)
under constraints (19.1)–(19.3).
Observe that we denote by Pi1 the optimization problem for agent i ∈ {L, l} which
maximizes his/her profit and considers brand sustainability, that is Pi1 = max Ji
subject to (19.1)–(19.3). Similarly we denote by Pi0 the optimization problem
for agent i ∈ {L, l} which maximizes his/her profit without considering brand
0/1
sustainability, that is Pi0 = max Ji subject to (19.1) and (19.2) only. Let Ji , i ∈ {L, l}
0/1
be the profits associated to problems Pi respectively.
The licensor’s has his own advertising costs, supposed quadratic too
1
CLpu (aL ) = cL a2L , t ∈ [0, T ] , (19.11)
2
and he obtains from the licensee the royalties, given in (19.6). The licensor’s
problem is
T
cL 2
PL1 : max aL ≥0 JL1 (aL , al , p) = rp (G(t) − θ p) − a (t) dt (19.12)
0 2 L
γ2
G0 + βδ pR + c cl 1−e2δ 2
p ≥ ps = l
. (19.14)
β γ 2 γ 2 −δ T
δ + cLL r + cll (1 − r) 1−e 2δ 2
and
γl (1 − r) p − c
l (t) =
a0∗ 1 − e−δ (T −t) . (19.16)
cl δ
sustain the brand value. In order to do so, the licensor can choose among the
following three different options.
• He can share with the licensee the additional advertising effort necessary to
sustain the brand. The optimal advertising efforts are respectively
γL −δ (T −t)
L (t) = aL (t) + ηL
a11∗ ,
0∗
e (19.17)
cL
γl −δ (T −t)
l (t) = al (t) + ηl
a11∗ ,
0∗
e (19.18)
cl
where ηL and ηl satisfy the equation
γL2 γ2
ηL + l ηl = η0 (19.19)
cL cl
The two players join their efforts in advertising in order to sustain the
brand and infinitely many solutions may exist.
• He can increase his advertising adopting the optimal advertising effort
η0 −δ (T −t)
L (t) = aL (t) +
a1∗ ,
0∗
e (19.20)
γL
19 Advertising and Price to Sustain The Brand Value in a Licensing Contract 383
while the licensee’s advertising effort is the minimum one given in (19.16)
γl (1 − r) p − c
l (t) =
a0∗ 1 − e−δ (T −t) .
cl δ
Observe that in this scenario, the licensee does not take into account the
brand sustainability.
• He can bind the licensee to consider the sustainability constraint in her ad-
vertising plan, whereas he can neglect it. The licensor’s optimal advertising
effort is the one given in (19.15)
γL rp
L (t) =
a0∗ 1 − e−δ (T −t) ,
cL δ
η0 −δ (T −t)
l (t) = al (t) +
a1∗ .
0∗
e (19.21)
γl
L (t)
da0∗ γL rp −δ (T −t) L (t)
d2 a0∗ da0∗ (t)
=− e < 0, 2
=δ L < 0,
dt cL dt dt
l (t)
da0∗ ((1 − r)p − c) γl −δ (T −t) l (t)
d2 a0∗ da0∗ (t)
=− e < 0, 2
=δ L < 0.
dt cl dt dt
On the other hand, if the selling price is lower than the threshold ps then
at least one of the two actors has to increase his/her advertising effort. In the
case in which only one player increases his/her advertising, the derivatives of the
advertising effort of the player who considers brand sustainability are respectively
384 A. Buratto
∂ a1∗
L (t) η0 γL rp ∂ 2 a1∗
L (t) ∂ a1∗
L (t)
=δ e−δ (T −t) − , = δ ,
∂t γL cL δ ∂t 2 ∂t
∂ a1∗
l (t) η0 γl (1 − r)p − c ∂ 2 a1∗
l (t) ∂ a1∗
l (t)
= δ e−δ (T −t) − , = δ .
∂t γl cl δ ∂t 2 ∂t
The advertising efforts turn out to be either increasing and convex, this for very low
prices, that is for p < p̂i , i ∈ {L, l}, or decreasing and concave, for intermediate
price values, that is for p > p̂i , or finally constant if p = p̂i , i ∈ {L, l} where
−δ T
γ2
G0 + δc2 cl 1−e2 + β δpR
p̂L = l
< ps
1−e−δ T γL2 γl2 β γL2 r 1+ e−δ T
2δ 2 cL r + cl (1 − r) + δ + cL 2δ 2
and
γl2
G0 + δc2 cl + β δpR
p̂l = .
1−e−δ T γL2 γl2 β γL2 (1−r) 1+ e−δ T
2δ 2 cL r + c (1 − r) + δ + c 2δ 2
l l
γ2
G0 + c cl 1−e 2δ 2
pR =
l
,
γL2 γl2 1−e−δ T
cL r + cl (1 − r) 2δ 2
then the optimal selling price must necessarily be greater than the regular price pR
itself, whereas with a high regular price, greater than the threshold pR , then the
selling price can be lower than pR .
For what concerns the dependence on the royalty coefficient, r, if the licensor’s
γ2 γ2
advertising effectiveness is greater than the licensee’s one, that is if cLL > cl , then the
l
minimum selling price decreases with the royalty coefficient. The opposite happens
if the licensor’s advertising is less effective than the licensee’s one. Observe that
the asymmetry of the game influences its solution; if we had assumed γL = γl and
cL = cl , then a substantial difference on the equilibria would hold, for example the
minimum price ps would not depend on the royalty’s percentage r at all.
19 Advertising and Price to Sustain The Brand Value in a Licensing Contract 385
Other behaviors are summarized in the following table (sign “+” means
increasing)
G0 c γL cL
ps ++ – +
Here we consider the situation in which the licensor cooperates with the licensee
in maximizing the sum of the profits and both of them take care of the brand value
sustainability. They have to solve the following optimal control problem
subject to (19.1)–(19.3).
Theorem 19.2 (Cooperative Equilibrium). The coordinated optimal advertising
efforts are
γL p − c
γ2 γ2 −δ T
G0 + βδ pR + cLL + cl c 1−e 2δ 2
l
p ≥ pC =
.
β γL2 γl2 1−e−δ T
δ + cL + c 2δ 2
l
386 A. Buratto
and therefore,
• If p ≥ pC , then ηC = 0 and therefore the cooperative advertising efforts a∗LC (t)
and a∗lC (t) reduce to the minimum, just because of their monotonicity in ηC .
Nevertheless they are greater than the minimum advertising efforts a0∗ L (t) and
a0∗
l (t) obtained without considering brand sustainability.
• If p < pC , then ηC > 0, and the players have to increase their advertising efforts.
The cooperative advertising efforts in (19.22) and (19.23) are all the more reason
greater than the minimum advertising efforts a0∗L (t) and al (t). Nonetheless, it’s
0∗
not possible to determine a priori neither if they are greater or lower than the
increased advertising efforts in (19.17), (19.20), (19.18) and (19.21). In order to
compare ps and pC , many parameters influence their values and generally not all
the presented scenarios are practicable. It is easy to prove that cases c/1 − r <
ps < pc and ps < c/1 − r < pc never occur, in fact from ps < pc , it follows
that pc < c/1 − r. Furthermore, let be T = 30, cL = 0.15, cl = 0.2, c = 8.1,
pR = 8, β = 0.5, γL = 0.75, γl = 0.7, r = 0.1 and δ = 0.1; then c/1 − r = 9 and
according to the value of G0 , we have the following results
– If G0 = 50, then pc = 8.265239040, ps = 8.095855766 and therefore
Observe that situation c/1−r < pc < ps occurs for high initial brand values: only the
brand owner of a well known brand can take into account the free-riding situation
of binding the licensee to sustain the brand.
Obviously the leader will adopt the strategy which leads him a greater profit.
With this task, let’s denote by JL∗kw , with k, w ∈ {0, 1}, the optimal profit of
the licensor if he adopts the advertising strategy akL and the follower adopts the
∗1 , the optimal profit of the
advertising effort awl . Analogously, let’s denote by JLC
licensor while the players adopt the cooperation advertising strategies a∗LC and a∗lC
in problem PC1 .
A possible rational rule to obtain the players’ profits in a cooperation context is
∗1 ≤ J ∗1 .
the Nash Bargaining solution, see [3, 11], in any case it must be JLC C
19 Advertising and Price to Sustain The Brand Value in a Licensing Contract 387
Turning back to the licensor’s decision, it’s a well known result that cooperation
leads to greater profits, that is JLC∗1 > J ∗10 . It can be easily proved also that each
L
player earns more when the other one takes care about the brand sustainability
constraint, just because the goodwill is increased by the effect of the other player’s
additional advertising effort. This can be formalized as follows: JL∗10 < JL∗01 .
It remains to check if it turns out convenient to the leader to cooperate or to bind
the licensee to care about brand sustainability. This is not possible to determine a
priori, as this comparison depends on the values of the many problem’s parameters.
Nevertheless, once considered a particular instance of the problem, it can be easily
found the licensor’s best choice by comparing the optimal profits JLC ∗1 and J ∗01
L
evaluated with the particular values of the parameters which characterize such an
instance.
An interesting result is that the cooperative solution doesn’t always lead to a
greater profit for the licensor. In fact, let be T = 30, cL = 0.15, cl = 0.2, c =
8.1, pR = 8, β = 0.5, γL = 0.75, γl = 0.7, r = 0.1, δ = 0.1, θ = 0.5 and p = 10;
either if G0 = 1, 543, or G0 = 1, 545, we have that 9 = c/1 − r < p < pc < ps and
therefore it makes sense to consider the free-riding situation. In the former case
JC∗1 − JL∗01 = 28.48809 > 0, while in the latter JC∗1 − JL∗01 = −28.50707 < 0. Being
∗1 < J ∗1 , by definition, we have proved that there exists, at least, one situation
JLC C
in which the licensor’s strategy of binding the licensee to take care about brand
∗1 < J ∗01 .
sustainability is, for him, more profitable than cooperating, that is JLC L
the licensor to a greater profit. Another interesting, and non trivial, approach consists
in considering price as a constant decision variable to be determined using the theory
of optimal processes with parameters. Such an approach has been used in [10] for an
optimal control problem, but to the best of my knowledge nothing similar has ever
been applied to a Stackelberg differential game. Considering the pricing problem
requires to analyze the problem also from the follower’s point of view; this can be
done determinating the optimal selling price which takes into account production’s
costs and maximizes the licensee’s profit.
Appendix. Proofs
∂ Hl (G, al , λl ,t)
= −cl al + γl λl (19.25)
∂ al
and the stationary point is
λl (t)γl
al (t) = , t ∈ [0, T ]. (19.26)
cl
∂ Hl
λ̇l (t) = − = − ((1 − r) p − c) + δ λl (t), (19.27)
∂G
and solved it gives
(1 − r)p − c
it follows that the λl (t) > 0 for all t ∈ [0, T ], therefore the optimum advertising effort
for the licensee is
λl (t)γl
a∗l (t) = max ,0
cl
γl (1 − r)p − c −δ (T −t)
−δ (T −t)
= 1− e + λl (T ) e . (19.31)
cl δ
If condition (19.30) didn’t hold, then it would not be convenient for the licensee
to produce at all. Therefore, it would not even be convenient to advertise.
Let’s substitute the follower optimal strategy into the state equation of the
leader’s problem
The advertising efforts a∗l (t) and a∗L (t) given in (19.31) and (19.38) with λl (T ) ≥
0 and λL (T ) ≥ 0 such that λl (T )(G(T ) − G0 ) = 0 and λL (T )(G(T ) − G0 ) = 0
constitute a Stackelberg equilibrium and it can be proved that such equilibrium is
time consistent as it coincides with the Markovian Nash Equilibrium.
In order to determine the values of parameters λL (T ) and λl (T ) from transver-
sality condition, let’s solve the motion equation with the initial condition
Ġ(t) = γL a∗L (t) + γl a∗l (t) − δ G(t) + β (p − pR) , t ∈ [0, T ] ,
(19.39)
G(0) = G0 .
γL2 rp
Ġ(t) = (1 − e−δ (T −t) ) + λL (T ) e−δ (T −t)
cL δ
γ 2 (1 − r) p − c
where
ηL = λL (T ), ηl = λl (T ),
2 2
γL γl
2
γL rp γl2 (1 − r)p − c
H= ηL + ηl − + ,
cL cl cL δ cl δ
γL2 rp γl2 (1 − r) p − c
K= + + β (p − pR) .
cL δ cl δ
Let’s observe that G(T ) is linear in ηL and ηl , furthermore G(T ) ≥ G0 if and only
γ2 γl2
if ( cLL ηL + cl ηl ) ≥ η0 , where
2
1 −δ T γL rp γl2 (1 − r)p − c
η0 = 2(δ G0 − β (p − pR)) − (1 − e ) + .
1 + e− δ T cL δ cl δ
∂ HC (G, aL , λC ,t)
= −cL aL + γL λC , (19.43)
∂ aL
∂ HC (G, al , λC ,t)
= −cl al + γl λC (19.44)
∂ al
and the stationary point is
λC (t)γL λC (t)γC
(aL (t), al (t)) = , , t ∈ [0, T ]. (19.45)
cL cl
Mangasarian’s Theorem holds for the licensor’s solution too, in fact his Hamil-
tonian function (19.42) is concave in (G, aL , al ).
The co-state equation is
∂ HC
λ̇C (t) = − = −(p − c) + λC(t)δ , (19.46)
∂G
solved it gives
p−c
−δ (T −t)
a∗lC (t) = 1− e + λC (T ) e . (19.50)
cl δ
The motion equation can be rewritten as
ηC = λC (T ),
2
γL γl2 p−c
M= + ηC − ,
cL cl δ
2
γL γl2 p − c
N= + + β (p − pR) .
cL cl δ
1 − e−2δ T 1 − e− δ T
G(T ) = G0 e−δ T + M+ N.
2δ δ
From the transversality condition we obtain
⎧ ⎫
⎪
⎪ ⎪
⎪
⎨ 2(δ G − β (p − p )) p − c 1 − e− δ T ⎬
0 R
ηC = max − , 0 .
⎪
⎪ γL2 + γl2 (1 + e−δ T ) δ 1 + e− δ T ⎪ ⎪
⎩ cL c
⎭
l
References
1. Aaker D.A.: Managing Brand Equity. The Free Press, New York (1991)
2. Amaldoss, W., Jain., S.: Pricing of conspicuous goods: a competitive analysis of social effects.
J. Marketing Res. 42(1), 30–42 (2005)
394 A. Buratto
3. Binmore, K., Osborne, M.L., Rubinstein, A.: Noncooperative models of bargaining. In:
Aumann, R.J., Hart, S. (eds.) Handbook of Game Theory with Economic Applications. North-
Holland, Amsterdam (1992)
4. Buratto, A., Zaccour, G.: Coordination of advertising strategies in a fashion licensing contract.
J. Opt. Theory Appl. 142(1), 31–53 (2009)
5. Caulkins, J.P., Feichtinger, G., Kort P., Hartl, R.F.: Brand image and brand dilution in the
fashion industry. Automatica 42, 1363–1370 (2006)
6. Caulkins, J.P., Hartl, R.F., Feichtinger, G.: Explaining fashion cycles: imitators chasing
innovators in product space. J. Econ. Dyn. Control 31, 1535–1556 (2007)
7. Chintagunta, P., Sigué, J.P.: Advertising strategies in a franchise system. Eur. J. Oper. Res. 198,
655-665 (2009)
8. Dockner, E.J., Jørgensen, S., Van Long, N., Sorger, G.: Differential Games in Economics and
Management Science. Cambridge University Press, Cambridge (2000)
9. Jørgensen, S., Di Liddo, A.: Design imitation in the fashion industry. In: Annals of the
International Society of Dynamic Games, vol. 9, pp. 569–586. Birkhauser, Boston (2007)
10. Jørgensen, S., Kort P., Zaccour, G.: Optimal pricing and advertising policies for an entertain-
ment event. J. Econ. Dyn. Control 33, 583–596 (2009)
11. Jørgensen, S., Zaccour, G.: Differential Games in Marketing. Kluwer Academic, Boston (2004)
12. Keller, K.L.: Strategic Brand Management: Building, Measuring, and Managing Brand Equity,
3rd ed. Prentice-Hall, New York (2007)
13. Leibenstein, H., Bandwagon, Snob, Veblen: Effects in the Theory of Consumers’ Demand.
Quart. J. Econ. 64(2), 183–207 (1950)
14. Martı́n-Herrán, G., Sigué, S.P., Zaccour, G.: Strategic interactions in traditional franchise
systems: Are franchisors always better off? Eur. J. Oper. Res. 213(3), 526–537 (2011)
15. Martı́n-Herrán, G., Taboubi, S., Zaccour, G.: On yopia in a dynamic marketing channel.
G-2006-37, GERAD (2006)
16. Nerlove, M., Arrow, K.J.: Optimal advertising policy under dynamic conditions. Economica
39(114), 129–142 (1962)
17. Power, D., Hauge, A.: No man’s brand – brands, institutions, and fashion. Growth Change
39(1), 123–143 (2008)
18. Raugust, K.: The Licensing Business Handbook, 8th Ed. EPM Communications, New York
(2012)
19. White, E.P.: Licensing. A Strategy for Profits. KEW Licensing Press, Chapel Hill (1990)
Chapter 20
Cost–Revenue Sharing in a Closed-Loop
Supply Chain
P. De Giovanni ()
Department of Information, Logistics and Innovation, VU University Amsterdam,
de Boelelaan 1105, 3A-31, 1081 HV Amsterdam, The Netherlands
e-mail: [email protected]
G. Zaccour
GERAD, HEC Montréal, 3000, chemin de la Côte-Sainte-Catherine, Montréal,
QC H3T 2A7, Canada
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 395
International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 20,
© Springer Science+Business Media New York 2012
396 P. De Giovanni and G. Zaccour
20.1 Introduction
share the economic benefits [43]. In the latter case, the manufacturer should design
an adequate contract, provide attractive incentives for collaborating in closing the
loop, and properly share the economic advantages of remanufacturing [8, 13].
This paper contributes to this research area by developing a dynamic CLSC
game where a cost-sharing program for green activities is introduced along with a
reverse-revenue-sharing contract (RRSC). As reported by Geng and Mallik [19], an
RRSC is a good option when the upstream player wants to involve the downstream
player in a specific activity. For instance, Savaskan et al. [43] show that, when a
retailer is involved in the product-return process, the CLSC performs better. We
confine our interest to a single-manufacturer-single-retailer case and characterize
and contrast the equilibrium strategies and outcomes in two scenarios. In the first
scenario, referred to as Benchmark scenario, the two firms choose non-cooperatively
and simultaneously their strategies. In the second scenario, referred to as CRS, the
players share the manufacturer’s sales revenues and the cost of the green activities.1
In both cases, the manufacturer controls the rate of green activities and the retailer
controls the price. By contrasting the results of the two scenarios, we will be able
to assess the impact of implementing an active approach to increasing consumers’
environmental awareness, and, by the same token, the return rate of used products.
When the retailer contributes to the manufacturer’s activities, the game is played la
Stackelberg. This game structure is common in the literature on marketing channels
(see, e.g., the books [31, 35] operations (e.g., [29]), as well as environmental
management (e.g., [43]).
There is a growing game-theoretic literature that deals with CLSCs, see, e.g.,
[1, 3, 10, 14, 21, 24, 39, 43]. While these contributions investigate CLSCs in static or
two-period games, here we seek to evaluate the CLSC in a dynamic setting. Guide
et al. [27] emphasize the importance of time in managing product returns, which are
subject to time-value decay. Ray et al. [42] evaluate profits and pricing policy under
time-dependent and time-independent scenarios—namely, age-dependent and age-
independent differentiation—and show that the attractiveness of remanufacturing
changes substantially. Finally, Savaskan et al. [43] advise researchers that the CLSC
should be investigated as a dynamic phenomenon, as the influence of dynamic
returns changes channel decisions. Our paper takes up this challenge and proposes
a differential game to analyze equilibrium returns and pricing strategies in the two
scenarios described above.
Our main results can be summarized as follows:
A1. A CRS alleviates the double-marginalization problem in the supply chain. The
consumer pays a lower retail price and demands more product.
A2. The investment in green activities and the return rate of used products are
higher in the CRS scenario than in the benchmark game. The environment
also benefits from the implementation of a CRS contract.
1 What we have in mind here is similar to cooperative advertising programs, where typically, a
manufacturer pays part of the cost of promotion and advertising activities conducted locally by its
retailers. Cooperative advertising programs have been studied in the marketing literature, in a static
setting (e.g., [4, 5, 36]), as well as in a dynamic context (e.g., [32–34]).
398 P. De Giovanni and G. Zaccour
A3. The retailer always prefers the CRS scenario to the benchmark scenario. The
manufacturer does the same, under certain conditions involving the revenue-
sharing parameter, the return rate, and the level of cost reduction attributable to
remanufacturing. The conclusion is that a CRS is not always Pareto improving.
The paper is organized as follows. In Sect. 20.2 we state the differential game
model and in Sect. 20.3 we characterize the equilibria in the two scenarios.
Section 20.4 compares strategies and outcomes. Section 20.5 briefly concludes.
Consider a supply chain made up of one manufacturer, player M, and one retailer,
player R. Let time t be continuous and assume an infinite planning horizon.2 The
manufacturer can produce its single good using new materials or old materials
extracted from returned past-sold products. This second option is common practice
in many industries.3 Managing returns effectively represents one key edge that is
required for CLSC to succeed. Guide et al. [24] presents two streams of practices
for managing returns: a passive and an active approach. The passive approach to
returns (waste-stream approach) consists of waiting and hoping that the customers
return their products. An active (market-driven) approach instead implies that CLSC
participants are continuously involved in controlling and influencing the return rate
by setting appropriate strategies. An active approach makes it possible to manage
the forward activities as a further source of economic benefits [24].
The literature in CLSC can be divided into three streams in terms of modeling
the return rate of used products. The first stream adopted a passive approach and
assumed the return rate to be exogenous (see, e.g., [3, 14, 16, 21, 27]). The second
stream also adopted a passive approach, but modelled the return rate as a random
variable, e.g., an independent Poisson (see, e.g., [1]). The third group of studies
considered an active approach, with the return rate being a function of a player’s
strategy (see, e.g., [43, 44]). We follow this last view. More precisely, we suppose
that the manufacturer can increase the rate of return for previously sold products by
investing in a “green” activity program (GAP). Examples of such activities include
advertising and communications campaigns about the firm’s recycling policies,
uM (A(t))2
C (A(t)) = , (20.1)
2
where uM > 0 is a scaling parameter.4 We suppose that the return rate r (t), depends
on the whole history, and not only on the current level of green activities. A
common hypothesis in such a context is to assume that r (t) corresponds to a
continuous and weighted average of past green activities with an exponentially
decaying weighting function. This assumption is intuitive because the return rate
is related to environmental awareness, which is a “mental state” that consumers
acquire over time, not overnight. This process is captured by defining r (t) as a state
variable whose evolution is governed by the linear-differential equation
where δ > 0 is the decay rate and r0 is the initial rate of return.
The main economic benefit of the CLSC for the manufacturer is given by the
saved cost (see, e.g., [39]). Following [43], we adopt the unit-production cost
function:
C (r (t)) = cn (1 − r(t)) + cur(t), (20.3)
where cn > 0 is the cost of producing one unit with new raw materials, and cu > 0 is
the cost to produce one unit with used material from returned products, with cu < cn .
The above equation can be rewritten as
and, therefore, the difference cn −cu is the marginal remanufacturing efficiency (cost
saving) of returned products. The manufacturer incurs the highest unit cost cn when
r(t) = 0, and the lowest unit cost cu is achieved when all previously purchased
products are returned, i.e., for r(t) = 1. In (20.2)–(20.3), we implicitly assume that
products may be returned independently of their condition, and that a good can be
remanufactured an infinite number of times. In practice, this clearly does not hold
true. For instance, Kodak’s camera frame, metering system, and flash circuit are
designed to be used up to six times [37] and any additional use compromises the
product’s reliability. Therefore, our functional forms in (20.2)–(20.3) are meant to
be rough approximations of return dynamics and cost savings. In the conclusion we
discuss some (necessarily much more complicated) avenues that are worth exploring
in future investigations.
Denote by p (t) the retail price controlled by the retailer. We suppose that the
demand for the manufacturer’s product is given by
where α > 0 is the market potential and β > 0 represents the marginal effect of
pricing on current sales. To have nonnegative demand, we assume that p (t) ≤ α /β .
Two comments are in order regarding this demand function. First, following a
long tradition in economics, we have chosen a linear form. In addition to being
tractable, this choice is typically justified by the fact that such a demand form is
derivable from the consumer-utility function. Second, we follow [43] and suppose
that D (·) is independent of the return rate. Put differently, we are assuming here
that the CLSC’s main purpose is as a cost-saving rather than a demand-enhancing
mechanism. Denote by ω the constant wholesale price charged by the manufacturer,
with cn < ω < p (t) ≤ α /β . The lower bound ensures that the manufacturer’s margin
is positive even when there is no recycling. The second inequality ensures that the
retailer’s margin is nonnegative.
Up to now, our formulation states that the manufacturer is taking care of the
CLSC’s operational features, and that the marketing decisions (represented by
pricing) are left to the retailer, who is not at all involved in recycling. Although the
players follow an individual profit-maximization objective, they still may attempt
to link their activities to achieve higher economic benefits for both of them. For
instance, IBM and Xerox coordinate their recovery activities with their suppliers
in order to increase their profitability [18, 24]. IBM gives the responsibility for
managing all product returns worldwide to a dedicated business unit called Global
Asset Recovery Services, that collects, inspects, and assigns a particular recovery
option (resale or remanufacturing), and that maximizes the chain’s efficiency by
coordinating its activities with IBM refurbishment centres worldwide [18].
Here we explore a setting where: (a) the retailer financially supports the
manufacturer’s GAP; and, (b) the manufacturer designs an incentive mechanism to
compensate the retailer for this participation, and to better coordinate the CLSC.
Denote by B (t) , 0 ≤ B (t) ≤ 1, the support rate, to be chosen by the retailer,
in the total cost of the GAP. Consequently, the retailer pays B (t)C (A(t)) and
the manufacturer contributes the remaining portion, i.e., (1 − B (t))C (A(t)). The
rationale for the retailer to participate in the manufacturer’s GAP is the premise that
the combined efforts of the two players would lead to a higher return rate for used
products, and consequently, to a lower production cost and wholesale price.
Denote by I (r (t)) the state-dependent incentive provided by the manufacturer
to the retailer. The incentive assumes the traditional form as presented by [7, 9, 20],
where the manufacturer transfers a share of his revenues to the retailer in order to
modify her strategies. This way of modeling the incentive differs from the traditional
scheme elaborated in the literature of CLSC. Typically, the incentive schemes
assume the form of payment, where the manufacturer pays a certain per-unit amount
when another player returns a product [43, 44]. Alternatively, rebates on new sales
can also coordinate a CLSC [21]. Other valid alternative contract schemes link the
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 401
incentive to some operational features. For instance, Guide et al. [24] characterized
a quality-dependent price incentive for used products; Guide et al. [27] suggest
the integration between return management with inventory management (VMI)
and resource management (employees). Our way of modeling the incentive is
analogous to the incentive modeled by ReCellular Inc. and presented by [24], where
the manufacturer offers a two part incentive that is formed of a fix (direct) per
unit component as well as of a variable (indirect) component that depends of the
operational (collecting) costs.
Similarly, our incentive consists of a share of the manufacturer’s revenues that is
transferred to the retailer and that is formed of a fix state-independent part, as well
as of a variable state-dependent component. In this sense, instead of focusing only
on its main strengths—reduction of the double-marginalization effect, lower price
and higher demand (see, e.g., [7, 9, 20]—a two-parameter contract implemented
in a CLSC enhances collaboration in product return management. In [7, 9, 20] the
incentive depends only on the sharing parameters, wholesale price, and production
cost, while in our model it also is a function the remanufacturing cost and the
return rate.
Assuming profit-maximization behavior, the players’ objective functionals are
then given by
∞
uM
JM = e−ρ t (α − β p (t)) (ω − C (r (t)) − I (r(t))) − (1 − B (t)) A(t)2 dt,
0 2
(20.5)
∞
uM
JR = e−ρ t (α − β p (t)) (p (t) − ω + I (r(t))) − B (t) A(t)2 dt, (20.6)
0 2
We shall characterize and compare equilibrium strategies and outcomes for two
scenarios. In both of them, the assumption is that the players use Markovian
strategies, i.e., strategies that are functions of the state variable. Further, we restrict
ourselves to stationary strategies, that is, strategies that only depend on the current
value of the state variable, and not explicitly on time.
Benchmark Scenario: The retailer does not participate in the green activities
program of the manufacturer, and the latter does not offer any incentive to
coordinate the CLSC, i.e., B (t) ≡ 0 and I (r(t)) ≡ 0, ∀t ∈ [0, ∞). The game is played
noncooperatively and a feedback-Nash equilibrium is sought. Equilibrium strategies
and outcomes will be superscripted with N (for Nash).
402 P. De Giovanni and G. Zaccour
Cost-Revenue Sharing Scenario: We assume that the retailer is the leader and
announces its support rate for the green activities conducted by the manufacturer,
who acts as the follower. The right (subgame-perfect) equilibrium concept in
such a setting is the feedback-Stackelberg equilibrium. Equilibrium strategies and
outcomes will be superscripted with S (for Sharing or Stackelberg). Denote by φ the
percentage of revenues transferred from the manufacturer to the retailer to stimulate
green investments and coordinate the CLSC. Under CRS, we have
As a consequence of this transfer, the manufacturer’s margin (mSM (r)) and the
retailer’s margin (mSR (r)) become
The incentive scheme in (20.7) is made of two parts, with one being independent of
the return rate (φ (ω − cn )), and the other being a positive and increasing function
in the return rate (φ (cn − cu ) r (t)). This shows that the retailer has a vested interest
in contributing to a higher return rate.
From now on, we will omit the time argument when no ambiguity may arise.
20.3 Equilibria
In the following two subsections, we characterize the equilibria in the two scenarios
described above.
Recall that in this scenario, the players choose, simultaneously, and independently
their strategies to maximize their individual profits, with B (t) ≡ 0 and I (r(t)) ≡
0, ∀t ∈ [0, ∞).
Proposition 20.1. The equilibrium GAP and price strategies are given by
(cn − cu ) (α − β ω )
AN = > 0, (20.8)
2uM (ρ + δ )
α +βω
pN = > 0. (20.9)
2β
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 403
(α − β ω )2
VRN (r) = . (20.11)
4β ρ
∞
−ρ t (α − β ω )2
VRN (r) ≡ VRN = e dt,
4β
0
∂ VMN (α − β ω )(cn − cu )
= > 0.
∂r 2 (ρ + δ )
This result provides the rationale for the next scenario. Indeed, as it is in the best
interest of the manufacturer to increase the level of used-product returns, it is
tempting to provide an incentive to the retailer to induce a greater contribution to the
green-activity program. It remains to be seen under which conditions this incentive
is profitable for the manufacturer.
Substituting for green expenditures in the state dynamics (20.2) and solving gives
the following trajectory for the return rate:
1 − e− δ t N
r (t) =
N
A + e−δ t r0 > 0.
δ
The steady-state value is strictly positive and given by
AN (cn − cu ) (α − β ω )
N
r∞ = = > 0. (20.12)
δ 2uM (ρ + δ ) δ
From now on we assume (and check for in the numerical simulations) that the
parameters are such that rN (t) ≤ 1, ∀t ∈ [0, ∞).
404 P. De Giovanni and G. Zaccour
The RRSC was recently introduced by [19] to coordinate a supply chain in which the
upstream players have an economic incentive to coordinate. The traditional revenue
sharing contract (RSC) fits with the implementation of a coordination strategy that
is mainly driven by the retailer [7, 12]. The RSC, in fact, mitigates the double-
marginalization effect and creates efficiency along the chain. While the retailer
transfers a share of its revenues to the manufacturer, it also buys at a lower wholesale
price. Consequently, price decreases and demand increases.
In the RRSC, however, the retailer receives a share of the manufacturer’s net
revenues. The manufacturer wishes to influence the retailer’s strategies by offering
an attractive economic incentive. This type of contract fits adequately with the
CLSC’s targets where the manufacturer has the highest incentive to close the loop.
Further, in the marketing literature dealing with cooperative advertising programs,
the context is one of a manufacturer helping his retailer by paying part of the cost
of the local advertising or promotional efforts conducted by the retailer. Here, the
situation is reversed and it is the retailer who is contributing to the manufacturer’s
GAP. Therefore, the retailer plays the role of leader and the manufacturer, the role
of follower. The following proposition characterizes the equilibrium strategies.
Proposition 20.2. Assuming an interior solution, the feedback-Stackelberg equilib-
rium price, the green activities, and the participation rate are given by
α + β (ω (1 − φ ) + cn (1 − r) φ + cu φ r)
pS = , (20.13)
2β
(2 μ1 + ϕ1 ) r + 2μ2 + ϕ2
AS = , (20.14)
2uM
(2 μ1 − ϕ1 ) r + 2μ2 − ϕ2
BS = , (20.15)
(2 μ1 + ϕ1 ) r + 2μ2 + ϕ2
value functions cannot be obtained analytically (the six Ricatti equations are highly
coupled—see Appendix), we shall verify in the numerical simulations that the GAP
strategy is nonnegative and that the support rate BS and the return rate r are between
0 and 1. Unlike with the previous scenario, the strategies are now state-dependent,
with the price being a decreasing function of the return rate. This is intuitive because
a higher rate leads to a lower production cost. Further, the higher the percentage φ
of revenues transferred from M to R, the lower the retail price. Indeed, we have
∂ pS −ω −(cn −cu )r
∂φ = 2 < 0. Therefore, as in the literature on revenue-sharing contracts
(see, e.g., [7]), this parameter also lessens the double-marginalization problem in
RRSCs.
Table 20.1 provides the results of a sensitivity analysis of the strategies and the state
variable with respect to the main model’s parameters. A positive (negative) sign
indicates that the value taken by a variable increases (decreases) when we increase
the value of the parameter. A “0” indicates that the variable is independent of that
parameter, and n.a. means not applicable. The reported results for the benchmark
game are analytical and hold true for all admissible parameter values, not only
for those shown in the table. In the S scenario, when we vary the value of a
parameter, the values of all other parameters remain at their base-case levels. Note
that the selected parameters’ values satisfy nonnegativity conditions for price, green
406 P. De Giovanni and G. Zaccour
activities, and demand in both scenarios. They also satisfy the requirement that the
support rate and the return rate be bounded between zero and one. The results allow
for the following intuitive comments:
A1. Varying α and β yields the same qualitative impact in both scenarios for all
variables. Regarding the effect on the support rate provided by the retailer to
the manufacturer’s GAP, we obtain that a larger demand (through a higher
market potential α or a lower consumer-price sensitivity β ) induces the retailer
to increase its support.
A2. A higher uM means an upward shift in the cost of the green-activity pro-
gram. Consequently, the manufacturer reduces its effort. Although the retailer
increases its support rate to compensate for the manufacturer’s lower effort,
the final outcome is a lower return rate in the steady state. In short, the
higher the cost of green activities, the lower the environmental and economic
performance of the CLSC. The same qualitative results are obtained when
the remanufacturing cost, cu , is increased. Under such circumstances, the
manufacturer is less interested in closing the loop since the savings from
producing with used parts are lower.
A3. The higher the production cost, cn , the higher the interest of the manufacturer
in introducing used parts into production. Hence, the positive relationship
between cn and the investment in green activities. Consequently the return rate
is increasing in cn . The retailer benefits from the cost reduction and reduces the
retail price, which in turn, feeds the demand and the returns of used products.
A high cn is therefore an incentive to implement an environmental policy. In
the N scenario, the price is constant, as the production cost does not influence
the retailer’s strategy. In the S scenario, the support rate decreases in cn . The
economic incentive decreases with the production cost; and thus, the retailer’s
willingness to implement a coop program decreases accordingly.
A4. A higher wholesale price leads to a higher retail price and a lower demand.
In turn, the pool of used products is smaller and green activities become less
attractive. Consequently, the rate of return decreases.
A5. To interpret the results regarding φ , the revenue-sharing parameter in the S
scenario, we recall the margins of the two players:
mSM (r) = (1 − φ ) (ω − C (r (t))) ,
mSR (r) = p (t) + φ (ω − C (r (t))) .
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 407
Therefore, a higher φ means a higher margin for the retailer and a lower one
for the manufacturer. This incentive is achieving its goal, that is, the retailer
increases its support with φ , and consequently the manufacturer invests more
in GAP, which leads to a higher a return rate.
A6. A higher decay rate leads to lower investments in GAP, and consequently, to a
lower return rate. Also, increasing ρ , which amounts to giving more weight to
short-term profits, leads to a lower investment in GAP.
20.4.2 Comparison
We turn to the analysis of the players’ strategies and outcomes. As most of the
comparisons need to be carried out numerically, we have to limit the number of
parameters that we will let vary. It seems quite reasonable to focus on the most
important parameters in our model, namely, the incentive parameter φ and the
reduction in marginal cost due to manufacturing with used parts, i.e., cn − cu . All
other parameter values are kept at their benchmark levels.5
Retail-Price Strategies: Recall that the Nash and Stackelberg equilibrium prices are
given by
α +βω α + β (ω (1 − φ ) + cn (1 − r) φ + cu rφ )
pN = , pS = .
2β 2β
Without resorting to numerical simulations, we can make two observations: First,
u )φ
the Stackelberg price is decreasing in the return rate ( ∂∂pr = − (cn −c
S
2 ); and, second,
we have pS < pN , for all parameter values. To see this, we note that the two
equilibrium prices are related as follows:
I S (r)
pS (r) = pN − .
2β
By the nonnegativity of the incentive I S (r), we then have pS < pN . Similarly to [24]
and [10], the higher the remanufacturing efficiency, then the higher the incentive
provided by the manufacturer, and the lower the retail price, and consequently, the
higher the demand. Therefore, implementing a CRS contract alleviates the double-
marginalization problem and is beneficial to the consumer.
GAP Strategies: The investments in green activities in the two scenarios are given by
5 We ran other simulations without noticing any significant qualitative changes in the results.
408 P. De Giovanni and G. Zaccour
As for the retail price, the GAP strategy is constant in the Nash scenario, and it is
linear in the return rate in the S scenario. Figures 20.1 and 20.2 display the green-
activity strategy for different values of cn −cu and φ . The following observations can
be made: First, the higher the return rate, the higher the manufacturer’s investment
in GAP. This result is partly opposed to that of [3]; they advise OEMs not to
invest into increasing the return rate if it is already high, but to focus on other
activities, e.g., the collection system’s efficiency. This would, in spirit, include the
manufacturer’s GAPs, which focus on the marketing and operational aspects of the
return process. One interpretation of our result is that, when a CLSC achieves a high
return rate, GAP investments are also required to keep up in terms of operations
(e.g., logistics network, remanufacturing process, quality-control activities) and
marketing (informing, promoting and advertising to a larger customer base). This
is in line with [21] who suggest decreasing the investment in remanufacturing
activities (e.g., product durability) when the return rate is low. Second, for any given
return rate, the manufacturer invests more in GAP in the S scenario than in the
Nash equilibrium. This result has also been reached in the cooperative-advertising
literature cited previously. The fact that the manufacturer is sharing the cost of the
green activities is in itself an incentive to do more. The implication is that the steady-
state value of the return rate in the S scenario is higher than its Nash counterpart.
Therefore, from an environmental point of view, as in [43], coordination in a CLSC
is preferable to the benchmark game. Third, for any admissible value of r, shifting
up φ leads to a higher investment in green activities.
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 409
Fig. 20.2 Green-activity strategy for different values of the sharing parameter
(2 μ1 − ϕ1 ) r + 2μ2 − ϕ2
BS = .
(2μ1 + ϕ1 ) r + 2μ2 + ϕ2
Figure 20.3 shows that BS is decreasing in the return rate. When we combine result
with the previous one, namely, that GAP is increasing in r, then it is appealing
to conjecture that the manufacturer and retailer control-variables are strategic
substitutes. This can be seen by rewriting the support rate as
Therefore, a higher AS leads the retailer to lower its support rate (but not necessar-
ily the total amount of the subsidy given to the manufacturer). Figure 20.4 reveals
that the support rate increases with φ . This result is somehow expected, given that
the incentive provided by the manufacturer to the retailer is precisely to (hopefully)
drive up the retailer’s participation in the green-activity program [19]. Still, it is
interesting to mention the very significant effect that φ has on the support rate.
Indeed, increasing φ , for instance by less than 15 % (i.e., from 0.35 to 0.40) more
than triples the support provided by the retailer. Note that the positive impact of φ
on the investment in GAP is much less limited (see Fig. 20.2). Further, the higher
the production-cost saving resulting from recycling (higher cn − cu), the steeper the
410 P. De Giovanni and G. Zaccour
decline in the rate of support provided by the retailer to the manufacturer. Actually,
when the manufacturer is highly efficient and the return rate of used products is also
high, the retailer’s support is simply less important. Note that the impact of varying
cn − cu on the support rate is proportionally much less visible than the impact of φ .
However, the level of GAP is very sensitive to the value of cn − cu .
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 411
Fig. 20.5 Regions where manufacturer (left) and retailer (right) S profits are higher than Nash
“sufficiently high;” (b) the cost reduction resulting from recycling is “sufficiently
high;” and, (c) the incentive parameter φ is “not too high.” If these conditions
are met, then a CRS contract is Pareto payoff-improving. As the cost savings and
the return rate are expected to vary significantly across firms and industries, it is
difficult to make a general statement about the feasibility of Pareto-optimality in
practice. Based on the following data, it seems reasonable to believe that this result
is achievable. Indeed, regarding the return rate, Guide [23] reports that, if firms
collect the products themselves, or provide incentives to other CLSC participants
and adopt a market-driven approach, then the return rate can be as high as 82 %.
If this example is representative of what can be realized, then we are in the zone
of a “sufficiently high return rate.” Concerning the cost savings, [18] report that
remanufacturing costs at IBM are much lower than those for buying new parts,
sometimes 80 % lower. Similarly, Xerox saves 40–65 % of its manufacturing costs
by reusing parts and materials from returned products [43]. For these firms, the
cost reduction due to remanufacturing is clearly “sufficiently high.” However, these
examples are (good) business exceptions, and no one would expect these levels
of cost reduction to be very common. According to [13], most firms do not adopt
closed-loop practices because of the small savings and inefficient remanufacturing.
Nevertheless, other strategic motivations could still lead those firms to close the
loop. For instance, to avoid and reduce remanufacturing competition, Lexmark has
introduced the Prebate program, whereby customers who return an empty printer
cartridge obtain a discount on a new cartridge; Lexmark does not remanufacture
these used cartridges because the cost savings are low, but recycling them instead
allows to reduce competition (www.atlex.com2003).
The last determinant of Pareto-optimality is the sharing-revenue parameter φ .
The literature on contracting and coordination has already established the appro-
priateness of a revenue-sharing contract in a CLSC, and highlighted the critical
role played by the sharing parameter [8]. Its actual value depends on the players’
bargaining power, and therefore, no general statement can be made.
20.5 Conclusion
To the best of our knowledge, this study is the first attempt to assess, in a dynamic
setting, the impact on a closed-loop supply chain of a CRS contract. Our starting
point is that firms can influence the return rate of used products by carrying out green
activities, and that this return rate is an inherently dynamic process. The optimality
of a CRS can be assessed from the consumer’s, the environmental, and the firms’
points of view. We wrap up our main results in the following series of claims on
these different points of view:
Claim 1 Compared to the benchmark scenario, a CRS contract leads to a lower
retail price and higher demand.
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 413
The conclusion is that the consumer and the retailer will vote in favor of such
a CRS contract and that the environment is always in better off with one. For the
manufacturer, the results are not clear cut.
As future research direction, the following extensions are worth considering:
A1. An analysis of the same game, but assuming a finite horizon. Indeed, our
assumption of infinite horizon is a very strong one and were made mainly
for tractability, i.e., to solve a system of algebraic Ricatti equations instead
of having to deal with a highly coupled differential-equations system. A first
step could be to analyze a two-stage game, where in the second period, the
manufacturer produces with used parts recycled from first-period sales.
A2. An analysis of a multi-retailer situation, in which a manufacturer cooperates
with different retailers while the retailers compete in the same market. This
type of multi-agent configuration has been shown to be extremely important
when evaluating a contract in supply-chain management. For instance, [7]
evaluate a RRSC in a one-manufacturer–one-retailer chain configuration, and
demonstrated its effectiveness for mitigating the double-marginalization effect
and for making players better off. Later, they model a multi-retailer situation,
and show that the positive effects of a two-parameter contract vanish whenever
retailer competition occurs.
A3. A competitive setting where a manufacturer and an original equipment man-
ufacturer (OEM) compete in the collection process. In this context, the
manufacturer has more reasons to collect the end-of-use products, where the
reverse flows need to be managed not only to appropriate some of the returns’
residual value, but also to deter new entrants into the industry [13]. This context
has also been described by [3] with real applications (e.g., Bosch). They report
that remanufacturing can be really effective in a competitive context because
remanufactured products may cannibalize competitors’ products. However,
the literature has overlooked competition in dynamic-CLSC settings, where
players compete while adopting an active return policy.
A4. An evaluation of the impact of a green brand image on remanufacturing. In
a CLSC, marketing and operations interface to ensure high remanufacturing
efficiency while goodwill not only plays the traditional role of increasing
sales (marketing role) but it also increases product returns (operational role).
The main assumption here is that customer returns depend on the stock of
(green) goodwill, which acts as a sustainable lever. Several companies, such
as Coca-Cola, HP, and Panasonic, are modifying their brands, changing the
colors and style to increase customers’ green consciousness. Firms know that
414 P. De Giovanni and G. Zaccour
customers are concerned about the environment and are willing to buy and
return green products, and that an appropriate brand strategy may provide
superior performance. CLSCs seek to use goodwill not only to increase sales
but also to induce customers to adopt sustainable behavior. By returning end-
of-use products, customers contribute to conserving landfill space, reducing air
pollution and preserving the environment. Therefore, green goodwill acts as a
sustainable lever with the dual purpose of increasing both sales and returns.
A5. The integration of some quality features in our assumptions. The product
remanufacturability, in fact, decreases over time, thereby also reducing the
attractiveness. The quality of a return governs disassembly, recovery, and dis-
posal operations to be carried out after closing the loop [38]. When a return is
in good conditions, it possesses high residual value, and remanufacturing turns
out to be an extremely appropriate operational strategy. Consequently, firms
in the CLSC are committed to reducing the residence time (which is the time
a product stays with customers) while increasing product remanufacturability
(the high number of times a return may be used in a remanufacturing process).
One our paper’s main assumptions is that a return can be remanufactured
an infinite number of times. Despite the obvious limitations, applications
do exist in several industries (e.g., glass industry). Research in CLSCs has
investigated remanufacturability in terms of product durability [3,21], highlight
the trade-off this implies. High product durability maximizes cost savings but
it considerably extends product life, thereby lowering demand [10]. Moreover,
since durability is a quality feature, it impacts directly on production costs,
reducing the players’ unit-profits margin. Product durability has been also
investigated as a dynamic phenomenon [41], where the stock of durability
decreases over time, influencing both operational decisions and sales in future
periods. While incorporating durability substantially increases the complexity
of the model, addressing this trade-off determines the CLSC’s success and
improves decision-making process.
Acknowledgements We wish to thank the anonymous reviewer for his/her very helpful com-
ments. Research supported by NSERC, Canada.
Appendix
We need to establish the existence of a pair of value functions VM (r), VR (r) such
that there exists a unique solution r to (20.2) and that the Hamilton–Jacobi–Bellman
(HJB) equations
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 415
uM 2
ρ VM (r) = (α − β p)(ω − cn − r (cu − cn )) − A + VM (A − δ r), (20.16)
2
ρ VR = (α − β p)(p − ω ), (20.17)
VM
A= , (20.18)
uM
α +βω
p= . (20.19)
2β
α −βω VM
ρ VM = (ω − cn − r (cu − cn )) + VM −δr , (20.20)
2β 2uM
(α − β ω )2
ρ VR = . (20.21)
4β
We show that a linear value function satisfies (20.20) and (20.21). We define
VM = ς1 r + ς2 , where ς1 and ς2 are the constants. Substituting VM and its derivative
into (20.20) we obtain:
α −βω ς1
ρ (ς1 r + ς2 ) = (ω − cn − r (cu − cn )) + ς1 −δr . (20.22)
2 2uM
By identification, we obtain
(cn − cu ) (α − β ω )
ς1 = ,
2 (ρ + δ )
1 uM (α − β ω )(ω − cn ) + β ς12
ς2 = ,
2ρ uM
(cn − cu ) (α − β ω )
VM = r
2 (ρ + δ )
1 uM (α − β ω )(ω − cn ) (cn − cu )2 (α − β ω )2
+ + .
2ρ uM 4uM (ρ + δ )2
VM
A= . (20.24)
uM (1 − B)
B2 (VM )2 VM
− + VR −δr . (20.25)
2uM (1 − B)2 uM (1 − B)
α + β [ω − φ (ω − cn + r (cn − cu))]
p= , (20.26)
2β
2VR − VM
B= . (20.27)
2VR + VM
α − β (ω − φ (ω − cn + r (cn − cu )))
ρ VM (r) = (1 − φ )
2
2VR + VM
(ω − cn + r (cn − cu )) + VM −δr , (20.28)
4uM
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 417
(α − β [ω − φ (ω − cn + r (cn − cu ))])2
ρ VR (r) =
4β
VM V + V
+ + VR R M
−δr . (20.29)
8uM 2uM
ϕ1 2
VM (r) = r + ϕ2 r + ϕ3 ,
2
μ1 2
VR (r) = r + μ2 r + μ3 ,
2
where ϕ1 , ϕ2 , ϕ3 , μ1 , μ2 and μ3 are parameters to be determined. Let
a1 = 2uM (1 − φ )β φ (cu − cn )2 ,
a2 = 2uM (ρ + 2δ ),
a3 = uM (1 − φ ) (cn − cu ) (α − β (ω − 2φ (ω − cn ))) ,
a4 = 4uM (δ + ρ ),
a5 = 2uM (1 − φ ) (ω − cn ) [α − β (ω − φ (ω − cn ))] ,
a6 = 4uM ρ ,
a7 = 2uM φ 2 β (cu − cn )2 ,
a8 = 2uM φ (cn − cu ) (α − β (ω − φ (ω − cn))) ,
a9 = 2uM (α − β [ω − φ (ω − cn )])2 .
Inserting VM and VR and their derivatives in (20.28) and (20.29), we obtain the
following six algebraic Ricatti equations:
a1 + ϕ1 (2 μ1 + ϕ1 ) − a2ϕ1 = 0, (20.30)
a3 + ϕ1 μ2 + ϕ2 (ϕ1 + μ1 − a4 ) = 0, (20.31)
a5 + ϕ2 (2 μ2 + ϕ2 ) − a6ϕ3 = 0, (20.32)
a7 + 2 (2μ1 − a2) μ1 + (ϕ1 + 4μ1 ) ϕ1 = 0, (20.33)
a8 + (ϕ1 + 2μ1) ϕ2 + 2 (2μ1 + ϕ1 − a4) μ2 = 0, (20.34)
a9 + β ϕ22 + 4μ2 (μ2 + ϕ2 ) − 2a6 μ3 = 0, (20.35)
with the first three equations corresponding to the manufacturer and the next three
to the retailer.
418 P. De Giovanni and G. Zaccour
We briefly describe the procedure used to reduce the solution of that system into
the solution of one non-linear equation to be solved numerically using Maple 10.
From (20.30), we can obtain μ1 as a function of ϕ1 : μ1 = f (ϕ1 ), where
(a2 − ϕ1 ) ϕ1 − a1
f1 (ϕ1 ) = Ω1 = . (20.36)
2ϕ1
Replacing (20.36) for (20.31) and (20.34), we can obtain both ϕ2 and μ2 as function
of ϕ1
a3 Ω3 + ϕ1 Ω2
ϕ2 = f2 (ϕ1 ) = − = Ω4 (20.37)
(ϕ1 + Ω1 − a4) Ω3
Ω2
μ2 = f3 (ϕ1 ) = = Ω5 (20.38)
Ω3
with
a5 + (2Ω5 + Ω4 ) Ω4
ϕ3 = f4 (ϕ1 ) = , (20.39)
a6
a9 + β Ω42 + 4β (Ω5 + Ω4) Ω5
μ3 = f5 (ϕ1 ) = . (20.40)
2a6 β
Finally, replacing (20.36) into (20.33) gives a non-linear equation that unfortunately
cannot be solved analytically. We use the Maple “fsolve” function that uses
numerical-approximation techniques to find a decimal approximation to a solution
of an equation or a system of equations.
From the positivity of AS , Fig. 20.2, and the expression of the support rate, we
conclude that
2μ1 − ϕ1 > 0 and 2 μ2 − ϕ2 > 0 . . . (20.42)
Combining (20.41) and (20.42), we conclude that μ1 and μ2 are positive. The fact
that the support rate is decreasing in the return rate leads to
∂ BS 4 (μ1 ϕ2 − μ2 ϕ1 )
= < 0 ⇒ μ1 ϕ2 − μ2 ϕ1 < 0. (20.43)
∂r ((2 μ1 + ϕ1 ) r + 2μ2 + ϕ2 )2
Further, the positivity of AS and BS (0) ≤ 1, imply ϕ2 > 0. From the positivity of
ϕ2 , μ1 and μ2 , and the condition in (20.43), we conclude that ϕ1 > 0.
Finally, it suffices to note that, since all other parameters involved in equa-
tions (20.32) and (20.35) are positive, a necessary condition for these equations
to hold is to have ϕ3 and μ3 positive.
References
1. Aras, N., Boyaci, T., Verter, V.: The effect of categorizing returned products in remanufactur-
ing. IIE Trans. 36(4), 319–331 (2004)
2. Atasu, A., Guide, V.D.R., Van Wassenhove, L.N.: Product reuse economics in closed-loop
supply chain research. Prod. Oper. Manage. 17(5), 483–497 (2008)
3. Atasu, A., Sarvary, M., Van Wassenhove, L.N.: Remanufacturing as a marketing strategy.
Manage. Sci. 54(10), 1731–1746 (2008)
4. Berger, M.: Vertical cooperative advertising ventures. J. Marketing Res. 9, 309–312 (1972)
5. Bergen, M., John, G.: Understanding cooperative advertising participation rates in conventional
channels. J. Marketing Res. 46, 357–369 (1997)
6. Bhattacharya, S., Guide, V.D.R., Van Wassenhove, L.N.: Optimal order quantities with
remanufacturing across new product generations. Prod. Oper. Manage. J. 15(3), 421–431
(2006)
7. Cachon, G.P., Lariviere, M.A.: 2005. Supply chain coordination with revenue sharing con-
tracts: strength and limitations. Manage. Sci. 51, 30–44 (2005)
8. Corbett, C.J., Savaskan, R.C.: Contracting and coordination in closed-loop supply chains. In:
Daniel, V., Guide, R., Van Wassenhove, L.N. (eds.) Business Aspects of Closed-Loop Supply
Chains: Exploring the Issues. Carnegie Mellon University Press, Pittsburgh (2003)
9. Dana, Jr., J.D., Spier, K.E.: Revenue sharing and vertical control in the video rental industry. J.
Ind. Econ. 49(3), 223–245 (2001)
10. Debo, L.G., Toktay, L.B., Van Wassenhove, L.N.: Market segmentation and product technology
selection for remanufacturable products. Manage. Sci. 51, 1193–1205 (2005)
11. Dekker, R., Fleischmann, M., Van Wassenhove, L.N. (eds.) Reverse Logistics: Quantitative
Models for Closed-Loop Supply Chains. Springer, Berlin (2004)
12. El Ouardighi, F., Jørgensen, S., Pasin, F.: A dynamic game of operations and marketing
management in a supply chain. Int. J. Game Theory Rev. 34, 59–77 (2008)
13. Ferguson, M.E., Toktay, L.B.: The effect of competition on recovery strategies. Prod. Oper.
Manage. 15(3), 351–368 (2006)
14. Ferrer, G., Swaminathan, J.M.: Managing new and remanufactured products. Manage. Sci.
52(1), 15–26 (2006)
420 P. De Giovanni and G. Zaccour
15. Ferrer, G., Whybark, C.: Material planning for a remanufacturing facility. Prod. Oper. Manage.
10, 112–124 (2001)
16. Fleischmann, M., Beullens, P., Bloemhof-Ruwaard, J., Van Wassenhove, L.N.: The impact of
product recovery on logistics network design. Prod. Oper. Manage. 10(2), 156–173 (2001)
17. Fleischmann, M., Bloemhof-Ruwaard, J., Dekker, R., van der Laan, E., Van Wassenhove, L.N.:
Quantitative models for reverse logistics: a review. Eur. J. Oper. Res. 103, 1–17 (1997)
18. Fleischmann, M., van Nunen, J., Grave, B.: Integrating closed-loop supply chain and spare
parts management at IBM. ERIM Report Series Research in Management, ERS-2002-107-LIS
(2002)
19. Geng, Q., Mallik, S.: Inventory competition and allocation in a multi-channel distribution
system. Eur. J. Oper. Res. 182(2), 704–729 (2007)
20. Gerchak, Y., Wang, Y.: Revenue-sharing vs. wholesale-price contracts in assembly systems
with random demand. Prod. Oper. Res. 13(1), 23–33 (2004)
21. Geyer, R., Van Wassenhove, L.N., Atasu, A.: The economics of remanufacturing under limited
component durability and finite product life cycles. Manage. Sci. 53(1), 88–100 (2007)
22. Ginsburg, J.: Once is ot enough. Business Week, April 16 (2001)
23. Guide, Jr., V.D.R.: Production planning and control for remanufacturing: industry practice and
research needs. J. Oper. Manage. 18, 467–483 (2000)
24. Guide, Jr., V.D.R., Jayaraman, V., Linton, J.D.: Building contingency planning for closed-loop
supply chains with product recovery. J. Oper. Manage. 21, 259–279 (2003)
25. Guide, Jr., V.D.R., Van Wassenhove, L.N.: The evolution of closed-loop supply chain research.
Oper. Res. 57(1), 10–18 (2009)
26. Guide, Jr., V.D.R., Van Wassenhove, L.N.: Managing product return for remanufacturing. Prod.
Oper. Manage. 10(2), 142–155 (2001)
27. Guide, Jr., V.D.R., Souza, G.C., Van Wassenhove, L.N., Blackburn, J.D.: Time of value of
commercial product returns. Manage. Sci. 52(8), 1200–1214 (2006)
28. Hauser, W.M., Lund, R.T.: The Remanufacturing Industry: Anatomy of Giant. Boston Univer-
sity, Boston (2003)
29. He, X., Prasad, A., Sethi, S.: Cooperative advertising and pricing in a dynamic stochastic
supply chain: feedback Stackelberg strategies. Prod. Oper. Manage. 18, 78–94 (2009)
30. Hussain, S.S.: Green consumerism and ecolabelling: a strategic behavioural model. J. Agric.
Econ. 51(1), 77–89 (2000)
31. Ingene, C.A., Parry, M.E.: Mathematical Models of Distribution Channels. Kluwer Academic,
Dordrecht (2004)
32. Jørgensen, S., Sigué, S.P., Zaccour, G.: Dynamic cooperative advertising in a channel. J. Retail.
76(1), 71–92 (2000)
33. Jørgensen, S., Sigué, S.P., Zaccour, G.: Stackelberg leadership in a marketing channel. Int.
Game Theory Rev. 3(1), 13–26 (2001).
34. Jørgensen, S., Taboubi, S., Zaccour, G.: Retail promotions with negative brand image effects:
is cooperation possible? Eur. J. Oper. Res. 150, 395–405 (2003)
35. Jørgensen, S., Zaccour, G.: Differential Games in Marketing. International Series in Quantita-
tive Marketing. Kluwer Academic, Boston, MA (2004)
36. Karray, S., Zaccour, G.: Could co-op advertising be a manufacturer’s counterstrategy to store
brands? J. Bus. Res. 59, 1008–1015 (2006)
37. Kodak: Corporate Environmental Annual Report. The Kodak Corporation, Rochester (1999)
38. Krikke, H., Le Blanc, I., van de Velde, S.: Product modularity and the design of closed-loop
supply chains. Calif. Manage. Rev. 46(2), 23–39 (2004)
39. Majumder, P., Groenevelt, H.: Competition in remanufacturing. Prod. Oper. Manage. 10,
125–141 (2001)
40. Mantovalli, J.: The producer pays. Environ. Mag. 8(3), 36–42 (1997)
41. Muller, E., Peles, Y.C.: Optimal dynamic durability. J. Econ. Dyn. Control 14(3–4), 709–719
(1990)
42. Ray, S., Boyaci, T., Aras, N.: Optimal prices and trade-in rebates for durable, remanufacturable
products. Manuf. Serv. Oper. Manage. 7(3), 208–228 (2005)
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 421
43. Savaskan, R.C., Bhattacharya, S., Van Wassenhove, L.N.: Closed loop supply chain models
with product remanufacturing. Manage. Sci. 50, 239–252 (2004)
44. Savaskan, R.C., Van Wassenhove, L.N.: Reverse channel design: the case of competing
retailers. Manage. Sci. 52(1), 1–14 (2006)
45. Talbolt, S., Lefebvre, E., Lefebvre, L.A.: Closed-loop supply chain activities and derived
benefits in manufacturing SMEs. J. Manuf. Technol. Manage. 18(6), 627–658 (2007)