Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st

Annals of the International Society of Dynamic Games
Volume 12
Series Editor
Tamer Başar
Editorial Board
Tamer Başar, University of Illinois at Urbana-Champaign

Pierre Bernhard, I3S-CNRS and University of Nice-Sophia Antipolis
Maurizio Falcone, University of Rome “La Sapienza”
Jerzy Filar, University of South Australia, Adelaide
Alain Haurie, ORDECSYS, Chêne-Bougeries, Switzerland
Andrzej S. Nowak, University of Zielona Góra
Leon A. Petrosyan, St. Petersburg State University
Alain Rapaport, INRIA, Montpelier
Josef Shinar, Technion, Haifa
For further volumes:

http://www.springer.com/series/4919
Pierre Cardaliaguet • Ross Cressman
Editors
Advances in Dynamic Games

Theory, Applications, and Numerical
Methods for Differential and Stochastic
Games
Editors
Pierre Cardaliaguet Ross Cressman
CEREMADE Department of Mathematics
Université Paris-Dauphine Wilfrid Laurier University
Paris, France Waterloo, ON, Canada
ISBN 978-0-8176-8354-2 ISBN 978-0-8176-8355-9 (eBook)

DOI 10.1007/978-0-8176-8355-9
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2012945410
Mathematics Subject Classification (2010): 91A15, 91A22, 91A23, 91A24, 91A25, 91A80
© Springer Science+Business Media New York 2012

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.birkhauser-science.com)

We dedicate this volume of the Annals to the
memory of
Thomas L. Vincent
Tom was a dedicated and valued member of

the International Society of Dynamic Games
who hosted the Eleventh International
Symposium on Dynamic Games and
Applications in Tucson, Arizona. He was
sorely missed by friends and colleagues at the
Fourteenth Symposium held in Banff, Alberta.
Preface
The Annals of the International Society of Dynamic Games have a strong track
record of reporting recent advances in dynamic games by selecting articles primarily
from papers based on presentations at an international symposium of the Society
of Dynamic Games. This edition, Volume 12, continues the tradition, with most
contributions connected to the 14th International Symposium on Dynamic Games
and Applications held in Banff, Alberta, Canada in June 2010. The symposium was
cosponsored by St. Francis Xavier University, Antigonish, Nova Scotia, Canada; by
the Group for Research in Decision Analysis (GERARD); and by the Chair in Game
Theory and Management, HEC Montréal, Canada.
The volume contains 20 chapters that have been peer-reviewed according to
the standards of the international journals in game theory and its applications.
The chapters are organized into four parts: evolutionary game theory (Part I),
theoretical developments in dynamic and differential games (Part II), pursuit-
evasion games and search games (Part III), and applications of dynamic games
(Part IV). Beginning with its first volume in 2011, the journal Dynamic Games and
Applications has provided another important venue for the dissemination of related
research. Combined with the Annals, this development points to a bright future for
the theory of dynamic games as it continues to evolve.
Part I is devoted to evolutionary game theory and applications. It contains four
chapters.
David Ramsey examines age-structured game-theoretic models of mating behav-
ior in biological species, a topic of early interest when evolutionary game theory
began in the 1970s. Ramsey extends recent progress in this area first by allowing
the individual mating rate to depend on the proportion searching for mates and then
by incorporating asymmetries in the newborn sex ratio or the time for which males
and females are fertile. An iterative best-response procedure is used to generate the
equilibrium age distribution of fertile individuals.
Mike Mesterton-Gibbons and Tom Sherratt consider the evolutionary conse-
quences of signaling and of dominance in conflicts between two individuals. In
particular, it is shown that, when dominance over the opponent is sufficiently
advantageous, the evolutionarily stable strategy (ESS) is for only winners of the
vii
viii Preface
conflict to signal in long contests and for neither winners nor losers to signal in
short contests.
Quanyan Zhu, Hamidou Tembine, and Tamer Başar formulate a multiple-access
control game and show that there is a convex set of pure strategy Nash equilibria.
The paper also addresses how to select one equilibrium from this set through game-
theoretic solutions such as ESS as well as through the long-run behavior of standard
evolutionary dynamics applied to this game that has a continuum of pure strategies.
Andrei Akhmetzhanov, Frédéric Grognard, Ludovic Mailleret, and Pierre Bern-
hard study the evolution of a consumer–resource system assuming that the repro-
duction rate of the resource population is constant. The consumers’ optimal behavior
is found over one season when they all act for the common good. The authors then
show that selfish mutants can successfully invade this system but are eventually as
vulnerable to invasion as the initially cooperative residents.
Part II contains eight chapters on theoretical developments of dynamic and
differential games.
Sergey Chistyakov and Leon Petrosyan analyze coalition issues in m-person
differential games with prescribed duration and integral payoffs. They show that
components of the Shapley value are absolutely continuous and thus differentiable
functions along any admissible trajectory.
Yurii Averboukh studies two-player, non-zero-sum differential games and charac-
terizes the set of Nash equilibrium payoffs in terms of nonsmooth analysis. He also
obtains sufficient conditions for a pair of continuous payoff functions to generate a
Nash equilibrium.
Anne Souquière studies two-player, non-zero-sum differential games played in
mixed strategies and characterizes the set of Nash equilibrium payoffs in this
framework. She shows in particular that the set of publicly correlated equilibrium
payoffs is the same as the set of Nash equilibrium payoffs using mixed strategies.
Dean Carlson and George Leitmann explain how to solve non-zero-sum differ-
ential games with equality constraints by using a penalty method approach. Under
the assumption that the penalized problem has an open-loop Nash equilibrium,
they show that this open-loop Nash equilibrium converges to an open-loop Nash
equilibrium for the constrained problem.
Paul Frihauf, Miroslav Krstic, and Tamer Başar investigate how to approximate
the stable Nash equilibria of a game by solving a differential equation in which the
players only need to measure their own payoff values. The approximation method
is based on the so-called extremum-seeking approach.
Miquel Oliu-Barton and Guillaume Vigeral obtain Tauberian-type results in
(continuous-time) optimal control problems: they show an equivalence between the
long-time average and the convergence of the discounted problem as the discount
rate tends to 0.
Lucia Pusillo and Stef Tijs propose a new type of equilibrium for multicriteria
noncooperative games. This “E-equilibrium” is based on improvement sets and
captures the idea of approximate and exact solutions.
Olivier Guéant studies a particular class of mean field games, with linear-
quadratic payoffs (mean field games are obtained as the limit of stochastic
Preface ix
differential games when the number of interacting agents tends to infinity). The
author shows that the system of equations associated with these games can be
transformed into a simple system of coupled partial differential equations, for
which he provides a monotonic scheme to build solutions.
Part III is devoted to pursuit-evasion games and search games and contains six
contributions.
Sourabh Bhattacharya and Tamer Başar investigate the effect of an aerial
jamming attack on the communication network of a team of unmanned aerial
vehicles (UAVs) flying in a formation. They analyze the problem in the framework
of differential game theory and provide analytical and approximate techniques to
compute nonsingular motion strategies of UAVs.
Serguei A. Ganebny, Serguei S. Kumkov, Stéphane Le Menec, and Valerii S.
Patsko study a pursuit-evasion game with two pursuers and one evader having linear
dynamics. They perform a numerical construction of the level sets of the value
function and explain how to produce feedback-optimal control.
Stéphane Le Menec presents a centralized algorithm to design cooperative
allocation strategies and guidance laws for air defense applications. One of its main
features is a capability to generate and counter alternative target assumptions based
on concurrent beliefs of future target behaviors, i.e., a Salvo Enhanced No Escape
Zone (SENEZ) algorithm.
Alexander Belousov, Alexander Chentsov, and Arkadii Chikrii study pursuit-
evasion games with integral constraints on the controls. They derive sufficient
conditions for the game to terminate in finite time.
Anna Karpowicz and Krzysztof Szajowski study the angler’s fishing problem, in
which an angler has at most two fishing rods. Using dynamic programming methods,
the authors explain how to find the optimal times to start fishing with only one rod
and then to stop fishing altogether to maximize the angler’s satisfaction.
Ryusuke Hohzaki deals with a non-zero-sum three-person noncooperative search
game, where two searchers compete for the detection of a target and the target tries
to evade the searchers. He shows that, in some cases, there is cooperation between
two searchers against the target and that the game can then be reduced to a zero-sum
one.
Part IV contains two papers dedicated to the applications of dynamic games to
economics and management science.
Alessandra Buratto formalizes a fashion licensing agreement where the licensee
produces and sells a product in a complementary business. Solving a Stackelberg
differential game, she analyzes the different strategies the licensor can adopt to
sustain his brand.
Pietro De Giovanni and Georges Zaccour consider a closed-loop supply chain
with a single manufacturer and a single retailer. They characterize and compare the
feedback equilibrium results in two scenarios. In the first scenario, the manufacturer
invests in green activities to increase the product-return rate while the retailer
controls the price. In the second scenario, the players implement a cost revenue
sharing contract in which the manufacturer transfers part of its sales revenues and
the retailer pays part of the cost of the manufacturer’s green activities program that
aims at increasing the return rate of used products.
Acknowledgements
The selection of contributions to this volume started during the 14th International
Symposium on Dynamic Games and Applications held in Banff. Our warmest
thanks go to all the referees of the papers. Without their invaluable efforts this
volume would not have been possible. Finally, our thanks go to the editorial staff at
Birkhäuser, and especially Tom Grasso, for their assistance throughout the editing
process. It has been an honor to serve as editors.
Paris, France Pierre Cardaliaguet

Waterloo, Ontario, Canada Ross Cressman
March 2012
xi
Contents
Part I Evolutionary Games
1 Some Generalizations of a Mutual Mate Choice Problem

with Age Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
David M. Ramsey
2 Signalling Victory to Ensure Dominance: A Continuous Model . . . . . . 25
Mike Mesterton-Gibbons and Tom N. Sherratt
3 Evolutionary Games for Multiple Access Control . .. . . . . . . . . . . . . . . . . . . . 39
Quanyan Zhu, Hamidou Tembine, and Tamer Başar
4 Join Forces or Cheat: Evolutionary Analysis
of a Consumer–Resource System . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 73
Andrei R. Akhmetzhanov, Frédéric Grognard, Ludovic Mailleret,
and Pierre Bernhard
Part II Dynamic and Differential Games: Theoretical

Developments
5 Strong Strategic Support of Cooperative Solutions

in Differential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
Sergey Chistyakov and Leon Petrosyan
6 Characterization of Feedback Nash Equilibrium
for Differential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
Yurii Averboukh
7 Nash Equilibrium Payoffs in Mixed Strategies .. . . . .. . . . . . . . . . . . . . . . . . . . 127
Anne Souquière
8 A Penalty Method Approach for Open-Loop Variational
Games with Equality Constraints . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 161
Dean A. Carlson and George Leitmann
xiii
xiv Contents
9 Nash Equilibrium Seeking for Dynamic Systems

with Non-quadratic Payoffs . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 179
Paul Frihauf, Miroslav Krstic, and Tamer Başar
10 A Uniform Tauberian Theorem in Optimal Control . . . . . . . . . . . . . . . . . . . 199
Miquel Oliu-Barton and Guillaume Vigeral
11 E-Equilibria for Multicriteria Games .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 217
Lucia Pusillo and Stef Tijs
12 Mean Field Games with a Quadratic Hamiltonian:
A Constructive Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 229
Olivier Guéant
Part III Pursuit-Evasion Games and Search Games
13 Differential Game-Theoretic Approach to a Spatial

Jamming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 245
Sourabh Bhattacharya and Tamer Başar
14 Study of Linear Game with Two Pursuers and One
Evader: Different Strength of Pursuers . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 269
Sergey A. Ganebny, Sergey S. Kumkov, Stéphane Le Ménec,
and Valerii S. Patsko
15 Salvo Enhanced No Escape Zone . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 293
Stéphane Le Ménec
16 A Method of Solving Differential Games Under Integrally
Constrained Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 315
Aleksandr A. Belousov, Aleksander G. Chentsov,
and Arkadii A. Chikrii
17 Anglers’ Fishing Problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 327
Anna Karpowicz and Krzysztof Szajowski
18 A Nonzero-Sum Search Game with Two Competitive
Searchers and a Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
Ryusuke Hohzaki
Part IV Applications of Dynamic Games
19 Advertising and Price to Sustain The Brand Value

in a Licensing Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 377
Alessandra Buratto
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain . . . . . . . . . . . . . . . 395
Pietro De Giovanni and Georges Zaccour
Contributors
Andrei R. Akhmetzhanov Department of Biology (Theoretical Biology Labora-

tory), McMaster University, Life Sciences Building, 1280 Main St. West, Hamilton,
ON L8S 4K1, Canada
INRIA Sophia Antipolis – Méditerranée (Project BIOCORE), 2004 route des
Lucioles, BP 93, 06902 Sophia Antipolis, France
Yurii Averboukh Institute of Mathematics and Mechanics UrB RAS,
S. Kovalevskaya Street 16, GSP-384, Ekaterinburg 620990, Russia
Tamer Başar Coordinated Science Laboratory, University of Illinois, 1308 West
Main Street, Urbana, IL 61801-2307, USA
Aleksandr Andreevich Belousov Glushkov Institute of Cybernetics, NAS of
Ukraine, Kiev, Ukraine
Sourabh Bhattacharya Coordinated Science Laboratory, University of Illinois at
Urbana-Champaign, Urbana, IL 61801, USA
Pierre Bernhard INRIA Sophia Antipolis – Méditerranée (Project BIOCORE),
2004 route des Lucioles, BP 93, 06902 Sophia Antipolis, France
Alessandra Buratto Department of Mathematics, Via Trieste 63, I-35131 Padova,
Italy
Dean A. Carlson American Mathematical Society, Mathematical Reviews, 416
Fourth Street, Ann Arbor, MI 48103, USA
Aleksander Georgievich Chentsov Ural Division of RAS, Institute of Mathemat-
ics and Mechanics, Ekaterinburg, Russia
Arkadii Alekseevich Chikrii Glushkov Institute of Cybernetics, NAS of Ukraine,
Kiev, Ukraine
Sergey Chistyakov Faculty of Applied Mathematics and Control Processes, St.
Petersburg University, Universitetskiy pr. 35, St. Petersburg 198504, Russia
xv
xvi Contributors
Pietro De Giovanni Department of Information, Logistics and Innovation, VU

University Amsterdam, de Boelelaan 1105, 3A-31, 1081 HV Amsterdam, The
Netherlands
Paul Frihauf Department of Mechanical and Aerospace Engineering, University
of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0411, USA
Sergey Alexandrovich Ganebny Institute of Mathematics and Mechanics, Ural
Branch of Russian Academy of Sciences, Ekaterinburg, Russia,
Frédéric Grognard INRIA Sophia Antipolis – Méditerranée (Project BIOCORE),
2004 route des Lucioles, BP 93, 06902 Sophia Antipolis, France
Olivier Guéant UFR de Mathématiques, Laboratoire Jacques-Louis Lions,
Université Paris-Diderot. 175, rue du Chevaleret, 75013 Paris, France
Ryusuke Hohzaki Department of Computer Science, National Defense Academy,
1-10-20 Hashirimizu, Yokosuka 239-8686, Japan
Anna Karpowicz Bank Zachodni WBK, Rynek 9/11, 50-950 Wrocław, Poland
Miroslav Krstic Department of Mechanical and Aerospace Engineering,
University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0411,
USA,
Sergey Sergeevich Kumkov Institute of Mathematics and Mechanics, Ural Branch
of Russian Academy of Sciences, Ekaterinburg, Russia
Stéphane Le Ménec EADS/MBDA, Paris, France
George Leitmann Graduate School, University of California at Berkeley,
Berkeley, CA 94720, USA
Ludovic Mailleret INRA Sophia Antipolis (UR880), 400 route des Chappes, BP
167, 06903 Sophia Antipolis, France
Mike Mesterton-Gibbons Department of Mathematics, Florida State University,
1017 Academic Way, Tallahassee, FL 32306-4510, USA
Miquel Oliu-Barton Institut Mathématique de Jussieu, UFR 929, Université Paris
6, Paris, France
Valerii Semenovich Patsko Institute of Mathematics and Mechanics, Ural Branch
of Russian Academy of Sciences, Ekaterinburg, Russia
Leon Petrosyan Faculty of Applied Mathematics and Control Processes, St.
Petersburg University, Universitetskiy pr. 35, St. Petersburg 198504, Russia
Lucia Pusillo Dima – Department of Mathematics, University of Genoa, Via
Dodecaneso 35, 16146 Genoa, Italy
David M. Ramsey Department of Mathematics and Statistics, University of
Limerick, Ireland
Contributors xvii
Tom N. Sherratt Department of Biology, Carleton University, 1125 Colonel By

Drive, Ottawa, ON K1S 5B6, Canada
Anne Souquière Université de Bretagne Occidentale, Laboratoire de
Mathématiques, UMR 6205, 6 Avenue Victor Le Gorgeu, CS 93837, 29238 Brest
Cedex, France
Krzysztof Szajowski Institute of Mathematics and Computer Science, Wybrzeże
Wyspiańskiego 27, 50-370 Wrocław, Poland
Hamidou Tembine Department of Telecommunications, École Supérieure
d’Electricité (SUPELEC), 3 rue Joliot-Curie, 91192 Gif-Sur-Yvette Cedex, France
Stef Tijs CentER and Department of Econometrics and Operations Research,
Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands
Guillaume Vigeral CEREMADE, Université Paris-Dauphine, Paris, France
Georges Zaccour GERAD, HEC Montréal, 3000, chemin de la Côte-Sainte-
Catherine, Montréal, QC H3T 2A7, Canada
Quanyan Zhu Coordinated Science Laboratory and Department of Electrical and
Computer Engineering, University of Illinois at Urbana-Champaign, 1308 W. Main
St. Urbana, IL 61801, USA
Chapter 1
Some Generalizations of a Mutual Mate Choice
Problem with Age Preferences
David M. Ramsey
Abstract This paper considers some generalizations of the large population game
theoretic model of mate choice based on age preferences introduced by Alpern et
al. [Alpern et al., Partnership formation with age-dependent preferences. Eur. J.
Oper. Res. (2012)]. They presented a symmetric (with respect to sex) model with
continuous time in which the only difference between members of the same sex is
their age. The rate at which young males enter the adult population (at age 0) is
equal to the rate at which young females enter the population. All adults are fertile
for one period of time and mate only once. Mutual acceptance is required for mating
to occur. On mating or becoming infertile, individuals leave the pool of searchers.
It follows that the proportion of fertile adults searching and the distribution of
their ages (age profile) depend on the strategies that are used in the population
as a whole (called the strategy profile). They look for a symmetric equilibrium
strategy profile and corresponding age profile satisfying the following condition:
any individual accepts a prospective mate if and only if the reward obtained from
such a pairing is greater than the individual’s expected reward from future search.
It is assumed that individuals find prospective mates at a fixed rate. The following
three generalizations of this model are considered: (1) the introduction of a uniform
mortality rate, (2) allowing the rate at which prospective mates are found to depend
on the proportion of individuals who are searching, (3) asymmetric models in which
the rate at which males and females enter the population and/or the time for which
they are fertile differ.
Keywords Dynamic game • Mate choice problem • Policy iteration

• Equilibrium profile
D.M. Ramsey ()

Department of Mathematics and Statistics, University of Limerick, Ireland,
e-mail: [email protected]
P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 3

International Society of Dynamic Games 12, DOI 10.1007/978-0-8176-8355-9 1,
4 D.M. Ramsey
1.1 Introduction
Many models of mate choice have been based on common preferences. According
to such preferences, individuals prefer attractive partners and each individual of a
given sex agrees on the attractiveness of a member of the opposite sex. Some work
has been carried out on models in which preferences are homotypic, i.e. individuals
prefer partners who are similar (e.g. in character) to themselves in some way. In
such models the attractiveness and character of an individual are assumed to be
fixed. One obvious characteristic upon which mate choice might be based is the age
of a prospective partner (and the searcher himself/herself). By definition, the age of
an individual must change over time. Very little theoretical work has been carried
out on such problems. This article extends a model considered by Alpern et al. [4].
Janetos [8] was the first to present a model of mate choice with common
preferences. He assumed that only females are choosy and the value of a male to
a female comes from a distribution known to the females. There is a fixed cost for
observing each prospective mate, but there is no limit on the number of males a
female can observe. Real [19] developed these ideas.
In many species both sexes are choosy and such problems are game theoretic.
Parker [17] presents a model in which both sexes prefer mates of high value.
He concludes that assortative mating should occur with individuals being divided
into classes. Class i males are paired with class i females and there may be one
class of males or females who do not mate. Unlike the models of Janetos [8] and
Real [19], Parker’s model did not assume that individuals observe a sequence of
prospective mates.
In the mathematics and economics literature such problems are often formulated
as marriage problems or job search problems. McNamara and Collins [12] consider
a job search game in which job seekers observe a sequence of job offers and,
correspondingly, employers observe a sequence of candidates. Both groups have
a fixed cost of observing a candidate or employer, as appropriate. Their conclusions
are similar to those of Parker [17]. Real [20] developed these ideas within the
framework of mate choice problems. For similar problems in the economics
literature see e.g. Shimer and Smith [21] and Smith [22].
In the above models it is assumed that the distribution of the value of prospective
partners has reached a steady state. There may be a mating season and as it
progresses the distribution of the value of available partners changes. Collins and
McNamara [6] were the first to formulate such a model as a one-sided job search
problem with continuous time. Ramsey [18] considers a similar problem with
discrete time. Johnstone [9] presents numerical results for a discrete time, two-
sided mate choice problem with a finite horizon. Alpern and Reyniers [3] use a
more analytic approach to similar mate choice problems. These models are further
developed and analyzed in Alpern and Kantrantzi [1] and Mazalov and Falko [11].
Burdett and Coles [5] consider a dynamic model in which the outflow resulting
from partnership formation is balanced by job seekers and employers coming into
the employment market. Alpern and Reyniers [2] consider a similar model in which
individuals have homotypic preferences.
1 Mate Choice with Age Preferences 5
This paper is an extension of the work by Alpern et al. [4]. They consider a
problem of mutual mate choice in which all individuals of a sex are identical except
for their age. They first consider a problem with discrete time in which males are
fertile for m periods and females are fertile for n periods. Without loss of generality,
we may assume that m ≥ n. At each moment of time a number a1 (b1 ) of young
males (females) of age 1 enter the adult population. All the other adult individuals
age by 1 unit. If a male (female) reaches the age m + 1 (n + 1) without having
mated, then he (she) is removed from the population. The ratio R = a1 /b1 is called
the incoming sex ratio (ISR). The ratio of the number of adult males searching for
a mate to the number of females searching for a mate is called the operational sex
ratio (OSR) and is denoted by r.
At each moment an individual of the least common sex in the mating pool is
matched with a member of the opposite sex with probability ε . The age of the
prospective partner is chosen at random from the distribution of the age of members
of the appropriate sex. Suppose males are at least as common as females (i.e. r ≥ 1).
It follows that a male is matched with a female with probability ε /r. Given a male
is matched with a female, her age is chosen at random from the age of females.
Similarly, if females are at least as common as males (i.e. r ≤ 1), then in each period
a searching female is matched with a male with probability ε r. The age of such a
male is chosen at random from the distribution of male age. When two individuals
are matched, they must decide whether to accept or reject their prospective partner.
Mating only occurs by mutual consent. On mating two individuals are removed from
the population of searchers. It follows that the steady state distributions of the ages
of males and females in the population of searchers depend on the strategies used
within the population as a whole (the strategy profile).
The reward obtained by a pair on mating is taken to be the expected number of
offspring produced over the period of time for which both individuals are fertile. It is
assumed that offspring are produced at a rate depending on the ages of the partners
in such a way that the reward obtained by an individual on mating is non-increasing
in the age of the prospective partner. In this case, the equilibrium strategies of
males and females are threshold strategies in which each individual defines the
maximum acceptable age of a prospective partner as a function of the individual’s
age. This maximum acceptable age is non-decreasing in the age of the individual.
One example of such a reward function is the simple fertility model, according to
which the payoff of a pair on mating is simply the number of periods for which
both partners remain fertile. Equilibrium strategy profiles and the corresponding
age profiles are derived for a selection of problems of this form.
In addition, they define a continuous time model of a symmetric mate choice
problem in which both males and females enter the adult population at the same rate
and are fertile for one unit of time. It is assumed that when both males and females
use the same strategy (and thus r = 1), individuals meet prospective partners as a
Poisson process of rate λ (called the interaction rate). Hence, an individual expects
to meet λ prospective partners during their fertile period. The payoff obtained by a
pair on mating is equal to the length of time for which both remain fertile. A policy
iteration algorithm is defined to approximate a symmetric equilibrium of the game.
6 D.M. Ramsey
It should be noted that when the strategy used by females differs from the strategy
used by males, then the OSR may well differ from one. Since a matching of a
male with a female must correspond exactly to one matching of a female with a
male, it follows that the interaction rate depends on the OSR. Given the assumption
regarding the interaction rate at a symmetric equilibrium, females should meet
2λ r 2λ
males at rate r+1 , while males should meet females at rate r+1 . On the other hand,
the policy iteration algorithm assumes that this interaction rate is always λ . This
assumption affects the dynamics of the evolution of the threshold rule. However,
since at a symmetric equilibrium the OSR will be equal to 1, a fixed point of such
a regime will also be a fixed point of a suitably adapted algorithm in which the
interaction rate varies according to the OSR.
The paper presented here considers three extensions of this continuous time
model. Section 1.2 outlines the original model. Section 1.3 adapts this model to
include a fixed mortality rate for fertile individuals. Section 1.4 considers a model
in which the interaction rate depends on the proportion of fertile members of the
opposite sex who are searching for a mate. Section 1.5 considers a model of an
asymmetric game in which males are fertile for longer than females and/or the
ISR differs from 1. For convenience and ease of exposition, these adaptations are
considered separately, but they can be combined relatively easily. Section 1.6 gives
some numerical results, while Sect. 1.7 gives a brief conclusion and some directions
for future research.
1.2 The Original Model with Continuous Time
1.2.1 Outline of the Model
We consider a symmetric (with respect to sex) model in which the rate at which new
males enter the adult population equals the rate at which females enter, i.e. R = 1.
Suppose individuals are fertile for one unit of time, there is no mortality over this
period and the rate at which they meet prospective partners is λ .
Each prospective partner is chosen at random from the set of members of the
opposite sex that are searching for a mate. When two prospective partners meet, they
decide whether to accept or reject the other on the basis of his/her age. Acceptance
must be mutual, in order to form a breeding pair. If acceptance is not mutual, both
individuals continue searching. No recall of previously encountered prospective
partners is possible.
The strategy of an individual defines the set of ages of acceptable prospective
mates at each age. We look for a symmetric equilibrium of such a game in which
males and females use the same strategy. It is clear that at such an equilibrium the
OSR is also equal to 1.
Suppose the rate at which a male of age x and a female of age y produce offspring
is γ (x, y), where γ (x, y) ≥ 0. The reward of both partners when a male of age x pairs
with a female of age y is given by u(x, y), where

1−max{x,y}
u(x, y) = γ (x + t, y + t) dt.
0
Suppose the rate at which fertile partners produce offspring is independent of their
ages. We may assume γ (x, y) = 1. In this case u(x, y) = 1 − max{x, y}. This is simply
the period of time for which both of the partners remain fertile. In the following
analysis, we assume the reward is of this form.
The equilibrium condition is as follows: each individual should accept a prospec-
tive mate if and only if the reward gained from such a mating is greater than the
expected reward from future search. An equilibrium can be described by a strategy
pair. It is assumed that all males follow the first strategy in this pair and females
follow the second. At a symmetric equilibrium males and females use the same
strategy.
Note that the reward of an individual of age x from mating with a prospective
partner of age y, 1 − max{x, y}, is non-increasing in y. Hence, if an individual of
age x should accept a prospective partner of age y, then he/she should accept a
prospective partner of age ≤ y. Thus at a symmetric equilibrium each individual
uses a threshold rule such that an individual of age x accepts any prospective partner
of age ≤ f (x). The function f will be referred to as the threshold profile.
The future expected reward at age 1 is 0. Hence, an individual of age 1 will
accept any prospective mate, i.e. f (1) = 1. Suppose an individual of age x meets a
prospective mate of age ≤ x. By mating with such a prospective mate, the individual
obtains a payoff of 1 − x, which for x < 1 is greater than the payoff obtained from
continued search. Hence, f (x) ≥ x with equality if and only if x = 1. In addition,
f (x) ≥ 0, since at equilibrium an individual of age x can ensure himself/herself the
same reward as an individual of age x + δ by rejecting all prospective partners until
age x + δ and then following the threshold profile f (x). It should be noted that an
individual of age ≤ f (0) will be acceptable to any member of the opposite sex.
Define a(x) to be the steady state proportion of individuals of age x that are
still searching for a mate. It should be noted that this proportion depends on the
acceptance rule being used in the population [i.e. on f (x)]. The proportion of fertile

individuals that have not mated is a, where a = 01 a(x) dx. It follows that the density
function of the age of available, fertile individuals is given by â(x) = a(x)/a. The
function a will be referred to as the age profile.
We now derive a differential equation which the equilibrium threshold profile
must satisfy. Consider a male of age x. The probability of encountering a unmated
female in a small interval of time of length δ is λ δ . We consider two cases:
1. x < f (0). In this case the male is acceptable to females of any age y. The female
is acceptable if y ≤ f (x). The probability that the female is acceptable is given by
f (x)
a(u) du
0
.
a
8 D.M. Ramsey
Given a male is still searching at age x, the probability he mates between age x
and age x + δ is given by
f (x)
λδ
a(u) du + O(δ 2).
a 0
Hence,
f (x)
λδ
a(x + δ ) = a(x) 1 − a(u) du + O(δ 2 )
a 0
f (x)
a(x + δ ) − a(x) λ a(x)
=− a(u) du + O(δ ).
δ a 0
Letting δ → 0, we obtain
f (x)
λ a(x)
a (x) = − a(u) du. (1.1)
a 0
2. x ≥ f (0). In this case, the male must also be acceptable to the female, i.e.
x ≤ f (y). Since f is an increasing function, it follows that f −1 (x) ≤ y. Hence,
acceptance is mutual if f −1 (x) ≤ y ≤ f (x). Given a male is still searching at age
x, the probability he mates between age x and age x + δ is given by
f (x)
λδ
a(u) du + O(δ 2).
a f −1 (x)
Calculations analogous to the ones made in Point 1 lead to

f (x)
λ a(x)
a (x) = − a(u) du. (1.2)
a f −1 (x)
The decisions of the players at equilibrium are illustrated in Fig. 1.1.

For x < f (0), dividing both sides of the equation by a(x) and differentiating with
respect to x, we obtain the following second order differential equation:
λ [a(x)]2 a( f (x)) f (x)

a(x)a (x) − [a (x)]2 = − . (1.3)
a
Equation (1.3) is very difficult to solve directly, even numerically, due to the
presence of the composite function a ◦ f . We have the boundary condition a(0) = 1,
but in order to use a difference equation to estimate a(δ ) for small δ , we need to
know a( f (0)), where f (0) > 0. One additional problem is that there is no boundary
condition for f (0). For this reason, we will define a policy iteration algorithm to
numerically compute the equilibrium threshold rule and age profile.
Fig. 1.1 Illustration of the decisions made at equilibrium
1.2.2 Numerical Computation of a Symmetric Equilibrium
At first glance, it appears that the following procedure might work: choose an
arbitrary male strategy f1 ; determine an optimal female response strategy f2 ;
determine an optimal male response f3 to f2 , and so on, hoping that the sequence
converges to a limit. If there is a limit, this defines a symmetric equilibrium of the
game considered. In a true two person game, such a procedure is at least feasible in
principle. However, it will not work in the present setting. To see this, suppose the
females know the male strategy f1 and consider the problem faced by a female of
age y. In order to know which males to accept, she needs to know her optimal future
return from searching. For this she needs to know (a) the rate at which she will be
matched (this is assumed to be known), (b) which males will accept her at her future
ages (this is known from f1 ), and (c) the age profile, call it a1 , of the males she will
be matched with. However, in the scheme we proposed above, she will not know
this, as it is not determined solely by the male strategy f1 , but also depends on what
the females are doing. In theory, we could determine f2 based on f1 and a previous
female strategy, say f0 , and determine a1 as the age profile corresponding to f1
and f0 . However, as we showed in Sect. 1.2.1, the determination of a1 is difficult.
So we use a different iterative procedure, described below.
Suppose we begin by positing an initial male strategy, denoted f1 (where f1
is a non-decreasing function), and any non-increasing initial male age profile,
10 D.M. Ramsey
denoted a1 . Given f1 and a1 , we can indeed determine a female’s optimal response

(which is a threshold rule), denoted f2 . This calculation will be derived in
Sect. 1.2.3. We write this calculation as
f2 = H1 ( f1 , a1 ) .
To continue the process, we then need to compute the function a2 defining the
probability that an individual female using f2 is still searching for a mate at age
x when the male age profile is a1 and the males adopt strategy f1 . Note that this is
not by definition the age profile of females when all males use f1 and all females
use f2 . We denote this computation (derived in Sect. 1.2.3) as
a2 = H2 ( f1 , a1 ) .
Define
( f2 , a2 ) = H ( f1 , a1 ) = (H1 ( f1 , a1 ) , H2 ( f1 , a1 )) .
Since the game is symmetric with respect to sex, we may define the optimal response
of an individual, fi+1 and the probability that when such an individual uses this
strategy he/she is still searching at age x, ai+1 (x), when the members of the opposite
sex use the threshold profile fi and have age profile ai as follows:
( fi+1 , ai+1 ) = H ( fi , ai ) = (H1 ( fi , ai ) , H2 ( fi , ai )) .
Theorem 1.1 (From Alpern et al. [4]). Suppose that for some initial strategy-age
profile pair ( f1 , a1 ) , the iterates ( fi+1 , ai+1 ) = H ( fi , ai ) converge to a limit ( f , a).
Then
• The strategy pair ( f , f ) is a symmetric equilibrium and
• Both sexes have the invariant age profile a = a (x) .
Proof. In the limit we will have
( f , a) = H ( f , a) .
It follows from the definition of H1 that f is the best response function of females
when males adopt f and their age profile is a. Similarly, f is the best response
function of males when females adopt f and have age profile a. Hence, it suffices to
show that when all individuals use the threshold strategy f , then the age profile in
both sexes is a. The second part of the iteration indicates that the probability that an
individual using f is still searching at age x is given by a(x). This individual is using
the same strategy as the rest of the population, thus a(x) is simply the proportion of
individuals of age x who are still in the mating pool, as required.

1.2.3 Definition of the Mapping H
In order to simplify the notation used, we define f −1 (x) = 0 for x ≤ f (0), otherwise
f −1 (x) is the standard inverse function. First, we consider the best response fi+1

to the pair of profiles ( fi , ai ). Denote ai = 01 ai (x) dx. Suppose an individual is
still searching at age x. The optimal expected future reward from search is equal
to the reward obtained by accepting the oldest acceptable prospective partner, i.e.
1 − fi+1 (x). Suppose the next encounter with an available mate occurs at age W .
The probability density function of W is given by p(w|W > x) = λ e−λ (w−x) , for
x ≤ w < 1. It should be noted that there is an atom of probability at w = 1 of mass
equal to the probability that an individual does not meet another prospective partner
given that he/she is still searching at age x. In this case, the reward from search is
defined to be 0. Suppose the age of the prospective mate is y. The pairing is mutually
acceptable if y ∈ [ fi−1 (w), fi+1 (w)]. If y ∈ [ fi−1 (w), w], then the searcher obtains a
reward of 1 − w. If y ∈ (w, fi+1 (w)], then the searcher obtains a reward of 1 − y. In
all other cases, the future expected reward from search is 1 − fi+1 (w). Conditioning
on the age of the prospective mate and taking the expected value, it follows that
1
f −1 (w) w
λ eλ x −λ w i
1− fi+1 (x) = e [1− fi+1 (w)] ai (y) dy+ (1−w)ai (y) dy +
ai x 0 f i−1 (w)
fi+1 (w) 1

+ (1 − y)ai(y) dy + [1 − fi+1(w)]ai (y) dy dw.
w f i+1 (w)
Dividing by eλ x and differentiating with respect to x, after some simplification we

obtain
x fi+1 (x)
λ
fi+1 (x) = [ fi+1 (x) − x] −1 ai (y) dy + [ fi+1 (x) − y]ai (y) dy . (1.4)
ai f i (x) x
Equation (1.4) can be solved numerically, using the boundary condition f (1) = 1
and estimating fi+1 (x) sequentially at x = 1 − h, 1 − 2h, . . ., 0.
Once fi+1 has been estimated, we can estimate the corresponding age profile. The
calculations are analogous to the calculations carried out in the previous section.
We have

λ ai+1 (x) fi+1 (x)
ai+1 (x) = − ai (y) dy. (1.5)
ai f i−1 (x)
Equation (1.5) can be solved numerically, using the boundary condition a(0) = 1
and estimating a(x) sequentially at x = h, 2h, . . . , 1.
A proof that the iteration procedure is well defined can be found in Alpern et al.
[4]. This proof may be adapted to show that the procedures proposed for the three
extensions considered below are also well defined.
12 D.M. Ramsey
1.3 A Symmetric Game with a Fixed Mortality Rate
We now adapt the model presented above by assuming that mortality affects fertile
individuals at a constant rate of μ (independently of sex and status, single or mated).
We first derive the expected reward obtained by a pair composed of a male of age x
and a female of age y, denoted u(x, y). This is given by the expected time for which
both partners survive and are fertile. Note that the death of one of the partners occurs
at rate 2 μ . Suppose x ≤ y. If both partners survive a period of 1 − y (i.e. until the
female becomes infertile), then they both receive a payoff of 1 − y. Otherwise, the
reward obtained is the time until the death of the first of the partners, denoted by Z.
It follows that
∞ 1−y
1 − e−2μ (1−y)
u(x, y) = (1 − y) 2μ e−2μ z dz + 2μ z e−2μ z dz = . (1.6)
1−y 0 2μ
Analogously, when x > y it can be shown that
1 − e−2μ (1−x)
u(x, y) = .
2μ
Suppose that the threshold profile and age profile

of females is given by the
profile pair ( fi , ai ). As before, we define ai = 01 ai (x) dx. We now consider the
optimality criterion for an individual male of age x. As in the original model, an
individual should always accept a prospective partner who is younger. Such a male
should accept an older female if and only if the expected reward obtained from such
a pairing is greater than the male’s expected reward from future search, ri+1 (x). It
follows from Eq. (1.6) that the optimal response (the threshold profile fi+1 ) must
satisfy
1 − e−2μ [1− fi+1(x)] ln[1 − 2 μ ri+1(x)]

ri+1 (x) = ⇒ fi+1 (x) = 1 + . (1.7)
2μ 2μ
We now derive a differential equation for ri+1 (x) by conditioning on the time
of the next event, where the death of a male and meeting a prospective partner
are defined to be events. Events occur at rate λ + μ , thus given the male is still
searching at age x, the time at which the next event occurs, W , has density function
p(w|W > x) = (λ + μ ) e−(λ +μ )(w−x) for x ≤ w < 1. Note that W has an atom of
probability at w = 1 of mass equal to the probability that no event occurs before
the male becomes infertile. Given that an event occurs before the male becomes
μ
infertile, this event is his death with probability λ + μ . If no event occurs before he
reaches age 1 or the first event is his death, the reward of the male is 0. Considering
the time of the next event, the type of this event and the age of the prospective
partner, we obtain
1
λ
ri+1 (x) = exp[−(λ + μ )(w − x)]
x ai
w
f i−1 (w) (1 − e−2μ (1−w))ai (y) dy
× ri+1 (w)ai (y) dy +
0 f i−1 (w) 2μ
fi+1 (w) 1

(1 − e−2μ (1−y))ai (y) dy
+ + ri+1 (w)ai (y) dy dw.
w 2μ f i+1 (w)
Dividing by e(λ +μ )x and differentiating with respect to x, after some simplification

we obtain

ri+1 (x) = μ ri+1 (x)

λ x 1 − e−2μ (1−x)
− − ri+1 (x) ai (y) dy
ai f i−1 (x) 2μ
fi+1 (x)

1 − e−2μ (1−y)
+ − ri+1 (x) ai (y) dy . (1.8)
x 2μ
The functions fi+1 and ri+1 can be calculated numerically from Eqs. (1.7) and (1.8)
using the boundary conditions fi+1 (1) = 1 and ri+1 (1) = 0. Given ri+1 (x) and
fi+1 (x) for a sequence of values x ∈ {x0 , x0 + h, x0 + 2h, . . . , 1}, we can evaluate
ri+1 (x0 − h) using a numerical procedure to solve Eq. (1.8). We can then evaluate
fi+1 (x0 − h) directly from Eq. (1.7).
Having calculated fi+1 , we can then estimate ai+1 (x), the probability that a male
using the threshold profile fi+1 is still searching at age x given the threshold profile
and age profile of females, ( fi , ai ). A male of age x will leave the population of
searchers in the time interval [x, x + δ ] if he either finds a mate or dies in that
time interval. Analogous calculations to the ones used to obtain Eq. (1.1) lead to
the differential equation
fi+1 (x)
λ
ai+1 (x) = −ai+1 (x) μ + ai (y) dy . (1.9)
ai f i−1 (x)
Equation (1.9) can be solved using the boundary condition ai+1 (0) = 1 and using a
numerical procedure to evaluate ai+1 (x) at x = 0, h, 2h, . . ., 1.
It should be noted that this model can be relatively easily modified to allow
the mortality rate of individuals to depend on their status (either single or paired),
but not on sex. In order to generalize this model to one in which the mortality
rate can depend on sex, we have to generalize the model considered above to
allow asymmetries between the sexes. Asymmetric problems will be considered in
Sect. 1.5.
14 D.M. Ramsey
1.4 A Model in Which the Interaction Rate Depends

on the Proportion of Adults Searching
The model presented in Sect. 1.2 assumes that as long as the OSR is equal to
1 individuals meet prospective mates at a constant rate regardless of the strategy
profile used (i.e. independently of the proportion of adult individuals who are
searching for a partner). One might think of this model as describing a population
in which all the singles are concentrated in a particular area (i.e. a type of “singles
bar” model). We might consider a model under which the adult population mixes
randomly. In this case, we assume that when a male meets a female the probability
of her being single is equal to the proportion of females searching for a mate. In
reality, it seems likely that the rate of finding prospective mates would be found at
an increasing rate as the proportion of adults searching increases. However, it would
be realistic to assume that singles can concentrate their search in such a way that the
probability of an encounter being with another single is greater than the proportion
of the opposite sex who are single. Hence, the two models described above define
the two extremes of a spectrum for modelling encounters between searchers.
For ease of presentation, we only consider the “randomly mixing” model under
which the rate of meeting prospective mates is proportional to the fraction of
individuals of the opposite sex searching for a mate. As in Sect. 1.2, we only
consider symmetric equilibria of symmetric games of this form. We define an
iterative procedure ( fi+1 , ai+1 ) = H( fi , ai ), where fi+1 defines the best response of
an individual (without loss of generality we may assume a male) when the threshold
and age

profiles of females are given by fi and ai , respectively. As before, define
ai = 01 ai (x) dx. It is assumed that the rate at which individuals meet prospective
partners is λ ai .
Firstly, we define the best response, fi+1 . Suppose a male is still searching at age
x. The optimal expected future reward from search is equal to the reward obtained by
accepting the presently oldest acceptable female, i.e. 1 − fi+1 (x). Suppose the next
encounter with a single female occurs at age W . The probability density function
of W is given by p(w|W > x) = λ ai e−λ ai (w−x) , for x ≤ w < 1. As before, there is
an atom of probability at w = 1 of mass equal to the probability that the male does
not meet another available female given that he/she is still searching at age x. In this
case, the male’s reward from search is defined to be 0. Suppose the age of the female
is y. The pairing is mutually acceptable if y ∈ [ fi−1 (w), fi+1 (w)]. If y ∈ [ fi−1 (w), w],
then they obtain a reward of 1 − w. If y ∈ (w, fi+1 (w)], then they obtain a reward
of 1 − y. In all other cases, the future expected reward of the male from search is
1 − fi+1(w). Conditioning on the age of the female and taking the expected value, it
follows that

−1
1 f i (w) w
1− fi+1 (x) = λeλ ai x e−λ ai w [1− fi+1 (w)] ai (y) dy+ (1−w)ai (y) dy
x 0 f i−1 (w)
fi+1 (w) 1

+ (1 − y)ai(y) dy + [1 − fi+1(w)]ai (y) dy dw.
w f i+1 (w)
Dividing by eλ x and differentiating with respect to x, after some simplification we

obtain
w fi+1 (w)

fi+1 (x) = λ [ fi+1 (x) − w] −1 ai (y) dy + [ fi+1 (x) − y]ai (y) dy . (1.10)
f i (w) w
Using the boundary condition fi+1 (1) = 1, we can estimate fi+1 (x) for x = 1 − h, 1 −
2h, . . . , 0 by solving Eq. (1.10) numerically.
Having calculated fi+1 , we now estimate ai+1 , where ai+1 (x) is the probability
that a male using the optimal response is still searching at age x. This male finds
prospective mates at rate λ ai . Given an optimally responding male of age x meets a
female of age y, such a pairing is mutually acceptable if and only if fi−1 (x) ≤ y ≤
fi+1 (x). Analogous calculations to the ones used to obtain Eq. (1.1) lead to
fi+1 (x)
ai+1 (x) = −λ ai+1 (x) ai (y) dy. (1.11)
f i−1 (x)
We can estimate ai+1 (x) for x = h, 2h, . . . , 1 using the boundary condition ai+1 (0)=1
and solving Eq. (1.11) numerically.
1.5 An Asymmetric Game
In this section we assume that males are fertile for a period of t units, while females
are fertile for 1 unit of time. Also, young males enter the adult population at a rate
R times the rate at which young females enter the adult population. Without loss of
generality, it may be assumed that t ≥ 1. The OSR r depends on the strategy profile
used. It is assumed that there is no mortality.
As stated earlier, there is an intrinsic problem with the formulation of the original
model. Although the ISR is one, when the strategy used depends on sex, the OSR
may differ from 1 (see Alpern et al. [4]). Suppose r = 1 and individual males meet
females at the same rate as which individual females meet males. It follows that the
ratio of the number of times a male meets a female to the number of times a female
meets a male must differ from 1. This is clearly a contradiction.
In order to generalize the model, we assume that the rate at which singles meet
other singles (of either sex) is λ0 . It follows that the rate at which single females meet
λ0 r
prospective mates is λ f , where λ f = 1+r . Similarly, the rate at which single males
λ0
meet prospective mates is λm , where λm = 1+r . This satisfies the Fisher condition
(see Houston and McNamara [7]) that the ratio of the number of times a male meets
a female to the number of times a female meets a male must be equal to 1.
In the case of the symmetric problem, this problem is sidestepped by the
assumption that the equilibrium is symmetric with respect to sex. This means that at
16 D.M. Ramsey
equilibrium λ f = λm = λ = λ20 . It follows from this argument that a fixed point

of the policy iteration algorithm defined in Sect. 1.2 is also a fixed point in an
analogous algorithm where the encounter rates depend on the strategy profiles used
as described directly above.
As in the original model, it is assumed that the payoff obtained on forming
a pair is the length of time for which both members of the pair remain fertile.
From the form of this payoff function and the optimality criterion, it follows that
at equilibrium both males and females should use sex specific threshold strategies.
Also, using an argument similar to the one given in Sect. 1.2, it is easy to show that
these thresholds are non-decreasing in age and individuals should always accept a
prospective partner who will remain fertile for a longer period than themselves.
Suppose males use the threshold profile f (x), for 0 ≤ x ≤ t, and females use the
threshold profile g(y), for 0 ≤ y ≤ 1. We use the definition of the inverse function
adapted to this problem, i.e. f −1 (x) = 0, for x ≤ f (0), otherwise f −1 is the standard
standard inverse function. Define a(x) to be the number of searching males of age x
relative to the number of females of age 0. Hence, a(0) = R. Define b(y)
to be the
proportion of females still searching at age y. Thus, b(0) = 1. Let a = 0t a(x) dx and

b = 01 b(y) dy. It follows that r = a/b. The density function of the age of searching
males is given by â, where â(x) = a(x)/a. Similarly, the density function of the age
of searching females is given by b̂(y) = b(y)/b.
We now define a policy iteration algorithm to define an equilibrium profile
( f , g). In order to do this, we must define initial threshold and age profiles for both
sexes. Let f1 , a1 , g1 and b1 be the initial male threshold profile, male age profile,
female threshold profile and female age profile, respectively. We define an iterative
procedure ( fi+1 , ai+1 , gi+1 , bi+1 ) = H( fi , ai , gi , bi ) as described below. Assume that
initially females will not mate and males are ready to mate with any female. Thus
the initial threshold and age profiles of males are given by f1 (x) = 1 and a1 (x) = R,
for all 0 ≤ x ≤ t. The initial threshold and age profiles of females are given by
g1 (y) = 0 and b1 (y) = 1, for all 0 ≤ y ≤ 1. The initial OSR is r1f = Rt (the superfix
here indicates that this is the OSR used to calculate the following female threshold
f
and age profiles in the procedure). In general, ri = ai /bi . Given the present pair of
profiles for males ( fi , ai ) and OSR rif , we calculate the best response of a female
gi+1 and the probability an optimally behaving female is still searching at age y,
bi+1 (y) as follows:
λ0 rif
The rate at which females find prospective mates is λi =
f
f . Suppose a female
1+ri
is still searching at age y. Her optimal reward from future search is given by the
length of time for which the presently oldest acceptable male remains fertile, i.e.
t − gi+1 (y). Suppose the next prospective mate is of age x and appears when the
female is of age w. The male will be fertile for at least as long as the female if
t − x ≥ 1 − w ⇒ x ≤ t − 1 + w. Hence, when fi−1 (w) ≤ x ≤ t − 1 + w, then a pair is
formed and both individuals obtain a reward of 1 − w. If t − 1 + w ≤ x ≤ gi+1 (w),
then a pair is formed and both individuals obtain a reward of t − x. Otherwise, both
individuals continue searching. By conditioning on the age of the female at the next
encounter with a male and his age, we obtain

1 f −λif (w−y) f −1 (w) t−1+w
λi e i
t−gi+1 (y) = [t−gi+1 (w)]ai (x) dx+ [1−w]ai (x) dx
y ai 0 f i−1 (w)
gi+1 (w) 1

+ [t − x]ai (x) dx + [t − gi+1(w)]ai (x) dx dw.
t−1+w gi+1 (w)
f
Dividing by eλi y and differentiating with respect to y, after some simplification we
obtain
t−1+y gi+1 (y)
λi f
gi+1 (y) = [gi+1 (y) + 1 − y − t] −1 ai (x) dx + [gi+1 (y) − x]ai (x) dx .
ai f i (y) t−1+y
(1.12)
Using Eq. (1.12) and the boundary condition gi+1 (1) = t, we can numerically
calculate gi+1 (y) for y ∈ {1 − h, 1 − 2h, . . ., 0}.
Now we consider the probability that a female using this optimal response
will still be searching at age y. This is denoted by bi+1 (y). Such a female meets
prospective partners at rate λi . If she meets a male of age x when she is y years old,
f
mating occurs if and only if fi−1 (y) ≤ x ≤ gi+1 (y). Using an argument analogous to
the one to derive Eq. (1.1), it follows that
gi+1 (y)
bi+1(y)λi
f
bi+1 (y) = − ai (x) dx. (1.13)
ai f i−1 (y)
Using Eq. (1.13) and the boundary condition bi+1 (0) = 1, we can numerically
calculate bi+1 (y) for y ∈ {h, 2h, . . . , 1}.
We then calculate the optimal response of a male given that the female threshold
and age profiles are gi+1 and bi+1 , respectively and the OSR is assumed to be given
λ0
by rim = ai /bi+1 . The rate at which males find females is thus λim = 1+r m . It should
i
be noted that this is not by definition the OSR when males use the threshold profile
fi and females use the profile gi+1 .
Suppose a male is still searching at age x. His optimal reward from future
search is given by the length of time for which the presently oldest acceptable
female remains fertile, i.e. 1 − fi+1 (x). Suppose the next prospective mate is of
age y and appears when the male is of age w. From the definition of the inverse
of the threshold function used here, the youngest female who will accept such a
male, is of age g−1 i+1 (w). A female should accept a male who will be fertile for a
longer period than her. It follows that for w ≤ t − 1, then g−1 i+1 (w) = 0. Also, for
w > t − 1, we have 1 − g−1 i+1 (w) ≥ t − w. Hence, g −1
i+1 (w) ≤ 1 + w − t. It follows that
−1 −1
gi+1 (w) ≤ max{0, 1 + w − t}. When gi+1 (w) ≤ y ≤ 1 + w − t, then a pair is formed
18 D.M. Ramsey
and both individuals obtain a reward of t − w. If 1 + w − t ≤ y ≤ fi+1 (w), then a pair

is formed and both individuals obtain a reward of 1 − y. Otherwise, both individuals
continue searching. By conditioning on the age of the female at the next encounter
with a male and his age, we obtain

1 m −λ m (w−x) g−1 (w)
λi e i i+1
1 − fi+1(x) = [1 − fi+1 (w)]bi (y) dy
x bi 0
max{0,1+w−t} fi+1 (w)
+ [t − w]bi (y) dy + [1 − y]bi(y) dy
g−1
i+1 (w) max{0,1+w−t}
1

+ [1 − fi+1(w)]bi (y) dy dw.
f i+1 (w)
Dividing by eλi x and differentiating with respect to x, after some simplification we

m
obtain that for x < t − 1

fi+1 (x)
λim
fi+1 (x) = [ fi+1 (x) − y]bi (y) dy (1.14)
bi 0
and for x > t − 1

1+x−t fi+1 (x)

λim
fi+1 (x) = [t − x − 1 + fi+1(x)] −1 bi (y) dy + [ fi+1 (x) − y]bi(y) dy .
bi gi+1 (x) 1+x−t
(1.15)
Using Eqs. (1.14) and (1.15), together with the boundary condition fi+1 (t) = 1
and the continuity of fi+1 , we can numerically estimate fi+1 (x) for x ∈ {t − h,t −
2h, . . . , 0}.
We define ai+1 (x) to be the probability that a male using this best response is
still searching at age x. Using an argument analogous to the one used in deriving
Eq. (1.1), we obtain
fi+1 (y)
ai+1 (x)λim
ai+1 (x) = − bi (y) dy. (1.16)
bi g−1
i+1 (y)
Using Eq. (1.16) and the boundary condition ai+1 (0) = R, we can numerically
calculate ai+1 (x) for x ∈ {h, 2h, . . .,t}.
We have thus updated each of the four profiles and defined the mapping H.
Suppose the mapping H has a fixed point ( f , a, g, b). In this case, the best response
of females to the male threshold and age profiles ( f , a) is to use the threshold profile
g. The probability that such an optimally behaving female is still searching at age y
is given by b(y). Since this female is using the same strategy as the other females,
b gives the age profile of the females. Using a similar argument, f is the optimal
response of a male to (g, b) and a gives the age profile of the males. It follows
that the OSR defined by the iterative procedure is equal to the actual OSR given
the quartet of profiles ( f , a, g, b). Hence, any fixed point of the mapping H is an
equilibrium of this asymmetric game.
It should be noted that the algorithm described above must be used when looking
for a asymmetric equilibrium of a symmetric problem. The algorithm described in
Sect. 1.2 generally does not work in this case, since the OSR at such an equilibrium
may well differ from 1 and so the rate at which prospective mates are found is sex
dependent.
Also, the model presented above can easily be modified to introduce constant
mortality rates and encounter rates which are dependent on the proportion of adult
individuals who are searching for a mate (as described in Sects. 1.3 and 1.4,
respectively).
1.6 Numerical Results
A MATLAB programme was written to estimate the equilibrium threshold rule and
age profiles at points 0, h, 2h, . . . , 1 based on the appropriate difference equations,
using the trapezium rule to calculate the required integrals and double precision
arithmetic. The inverse to a threshold rule was estimated at the same points using
linear interpolation. Comparison of different step sizes suggested that using a step
size of h = 10−4 allowed estimation of the threshold and age profile to at least
three decimal places for λ ≤ 50. The maximum value of the second derivative of
the threshold profile is increasing in λ and for larger values of λ a more accurate
procedure would be necessary to achieve the same accuracy.
1.6.1 Model with Mortality
Table 1.1 gives the expected reward and initial threshold (in brackets) at equilibrium
for various mortality rates and interaction rates. The case μ = 0 corresponds to the
original model. Figure 1.2 illustrates the threshold rules evolved for λ = 20 and
various mortality rates. Figure 1.3 illustrates the corresponding age profiles. When
the mortality rate increases, we expect that individuals become less choosy. It would
thus seem that the threshold used would be increasing in the mortality rate. This
seems to be the case when the mortality rate is relatively low. However, the threshold
profile on its own does not tell us how choosy individuals are. Figure 1.2 shows that
Table 1.1 Expected rewards and initial threshold (in round brackets) at equilibrium for various
mortality rates, μ , and interaction rates, λ
μ =2 μ =1 μ = 0.5 μ = 0.2 μ = 0.1 μ =0
λ = 2 0.109 (0.858) 0.205 (0.737) 0.292 (0.655) 0.365 (0.605) 0.395 (0.589) 0.426 (0.574)
λ = 5 0.168 (0.721) 0.311 (0.514) 0.445 (0.411) 0.563 (0.362) 0.611 (0.348) 0.665 (0.335)
λ = 10 0.201 (0.592) 0.366 (0.343) 0.527 (0.250) 0.675 (0.213) 0.736 (0.204) 0.805 (0.195)
λ = 20 0.221 (0.462) 0.395 (0.218) 0.574 (0.148) 0.740 (0.123) 0.810 (0.116) 0.895 (0.105)
20 D.M. Ramsey
Fig. 1.2 Effect of mortality rate on the equilibrium threshold profile (λ = 20)
Fig. 1.3 Effect of mortality rate on the equilibrium age profile (λ = 20)
Table 1.2 Expected rewards λ Original (singles bar) model Randomly mixing
at equilibrium under the
singles bar model and model 2 0.4264 0.3082
of a randomly mixing 5 0.6645 0.4636
population 10 0.8054 0.5775
20 0.8954 0.6774
for ages between 0.2 and 0.4 the threshold used at equilibrium when μ = 2, λ = 20
is lower than the threshold used in the case where μ = 0.5, λ = 20 (i.e. it seems that
increasing mortality increases the choosiness of individuals at some ages). However,
in the case μ = 2, λ = 20 (i.e. relatively high interaction and mortality rates), there
are virtually no individuals of age greater than 0.3 in the mating pool. Individuals
always accept a prospective mate of age below 0.462 and hence at equilibrium the
probability of an individual rejecting a prospective partner is virtually zero.
When the mortality rate is high, the age profile of the mating pool is very highly
concentrated on young ages. As long as a young individual survives, he/she will
almost certainly mate with the next perspective partner and the expected payoff
obtained is much more dependent on the mortality rate than the maximum length
of time for which an individual can remain fertile. Thus young individuals will
increase their threshold only very slowly. This remains true until an individual
attains the age at which the youngest individuals begin rejecting him/her. At this
point the probability of rejection increases very rapidly due to the shape of the age
profile. It follows that for high mortality rates the equilibrium threshold profile is
similar to a step function. However, at equilibrium the probability of meeting an
unacceptable partner is virtually zero. Thus an individual who always mates with
the first prospective partner would have virtually the same reward at equilibrium as
an individual using the equilibrium threshold. Hence, at equilibrium there would be
very low selection pressure on the threshold used.
1.6.2 Model with Interaction Rates Dependent

on the Proportion of Searchers
Table 1.2 gives the expected reward from search at equilibrium for the original
(singles bar) model and the model of a randomly mixing population for various
interaction rates. Since it is assumed that there is no mortality of fertile individuals,
the initial threshold is simply one minus the expected reward.
It should also be noted that if a proportion a of the adult population are searching
for a mate at equilibrium, such an equilibrium is also stable in a game where the
interaction rate is fixed to be λ a when the sex ratio is one. However, the dynamics
of the policy iteration procedure corresponding to these two problems are different.
22 D.M. Ramsey
Table 1.3 Expected reward of females, males (in round brackets) and OSR (in square
brackets) when λ0 = 4
R = 0.5 R=1 R=2
T =1 0.253 (0.506) [0.293] 0.427 (0.427) [1.000] 0.506 (0.253) [3.413]
T =2 0.356 (0.714) [0.602] 0.585 (0.586) [2.412] 0.653 (0.327) [8.656]
T =5 0.412 (0.826) [1.590] 0.658 (0.659) [7.148] 0.717 (0.359) [26.004]
1.6.3 Asymmetric Model
Table 1.3 gives the expected reward of females, males (in round brackets) and OSR
[in square brackets] when λ0 = 4. It should be noted that the case T = R = 1
corresponds to the original symmetric model with λ = 2. Various values of λ0 were
used for this parameter set and the equilibrium found was always symmetric. Also,
the problem with T = 1 and R = 0.5 is equivalent to the problem with T = 1 and
R = 2 with the roles of the sexes reversed.
The sum of the rewards of females is by definition equal to the sum of the rewards
of males. It follows that the ratio of the expected reward of a female to the expected
reward of a male is equal to the ratio of the number of males entering the adult
population to the number of females entering the adult population (i.e. R). The minor
deviations from this rule are due to numerical errors in the iterative procedure.
1.7 Conclusion
This paper has generalized a model of mate choice with age based preferences
introduced by Alpern et al. [4] by (a) introducing a uniform mortality rate, (b)
allowing the rate at which prospective mates are found to depend on the proportion
of individuals searching, (c) considering models which are asymmetric with respect
to sex.
It may well be interesting to generalize the model to allow variable mortality
rates. It seems reasonable to assume that the mortality rate increases with age and in
this case it is expected that the equilibrium strategy will be of the same form, i.e. a
threshold strategy according to which younger mates are always preferred. However,
it is possible that the mortality rate is higher for young adults than for middle-aged
adults. In this case, the equilibrium strategy may well be of a more complex form,
since a middle-age mate may be preferable to a young mate.
It would also be interesting to look at the interplay between resource holding
potential (RHP) (see Parker [16]) and age. For example, as a human ages his/her
RHP (i.e. qualifications, earnings, wealth, social position) increases. Hence, from
this point of view the attractiveness of an individual may well increase over time.
However, from the point of view of the model considered here, older individuals are
less attractive as mates as they will be fertile for a shorter period.
Mauck et al. [10] note that the average number of surviving offspring per brood
in a population of storm petrels is increasing in the age of the partners. There are
various explanations for this (e.g. fitter individuals may live longer, individuals (or
pairs) may become increasingly efficient at rearing offspring, or simply that older
pairs invest more in reproduction than in survival). However, this may mean that
age preferences may be to some degree homotypic. This is due to the fact that an
old individual may well prefer an old partner, since it is more important for them to
maximize their present reproduction rate. Younger partners may well prefer young
partners, since they have time to adapt to each other and perfect their method of
rearing. It would be interesting to see how the form of the equilibrium function
depends on these factors (by considering a wider range of payoff functions).
Also, our model assumes that the ages of prospective partners are independent. It
may well be that individuals concentrate their search on prospective partners of a
“suitable age”.
According to our model, individuals only mate once during their life, whereas
in reality pairs can divorce or an individual can remate after the death of a partner.
Since individuals may be of different qualities, it might be optimal for an individual
to divorce a partner who turns out not to be good as expected (see McNamara and
Forslund [13], McNamara et al. [14]). It is intended that future work will extend the
model to allow individuals to remate after a partner dies or becomes infertile.
Of course, mate choice may depend on other factors, such as attractiveness
(common preferences) and compatibility (homotypic preferences). It would be
interesting to see how these factors might interact with age. Due to the necessarily
complex nature of such models, it would seem that simulations based on replicator
dynamics would be a sensible approach to such problems (see Nowak [15]).
Acknowledgements I would like to thank Prof. Steve Alpern for the conversations, advice and
encouragement that have aided my work on this paper.
References
1. Alpern, S., Katrantzi, I.: Equilibria of two-sided matching games with common preferences.
Eur. J. Oper. Res. 196(3), 1214–1222 (2009)
2. Alpern, S., Reyniers, D.: Strategic mating with homotypic preferences. J. Theor. Biol. 198,
71–88 (1999)
3. Alpern, S., Reyniers, D.: Strategic mating with common preferences. J. Theor. Biol. 237,
337–354 (2005)
4. Alpern, S., Katrantzi, I., Ramsey, D.: Partnership formation with age-dependent preferences.
Eur. J. Oper. Res. (2012)
5. Burdett, K., Coles, M.G.: Long-term partnership formation: marriage and employment. Econ.
J. 109, 307–334 (1999)
6. Collins, E.J., McNamara, J.M.: The job-search problem with competition: an evolutionarily
stable strategy. Adv. Appl. Prob. 25, 314–333 (1993)
7. Houston, A.I., McNamara, J.M.A.: Self-consistent approach to paternity and parental effort.
Phil. Trans. R. Soc. Lond. B 357, 351–362 (2002)
24 D.M. Ramsey
8. Janetos, A.C.: Strategies of female mate choice: a theoretical analysis. Behav. Ecol. Sociobiol.
7, 107–112 (1980)
9. Johnstone, R.A.: The tactics of mutual mate choice and competitive search. Behav. Ecol.
Sociobiol. 40, 51–59 (1997)
10. Mauck, R.A.: Huntington, C.E., Grubb, T.C.: Age-specific reproductive success: evidence for
the selection hypothesis. Evolution 58(4), 880–885 (2004)
11. Mazalov, V., Falko, A.: Nash equilibrium in two-sided mate choice problem. Int. Game Theory
Rev. 10(4), 421–435 (2008)
12. McNamara, J.M., Collins, E.J.: The job search problem as an employer-candidate game.
J. Appl. Prob. 28, 815–827 (1990)
13. McNamara, J.M., Forslund, P.: Divorce rates in birds: prediction from an optimization model.
Am. Nat. 147, 609–640 (1996)
14. McNamara, J.M., Forslund, P., Lang, A.: An ESS model for divorce strategies in birds. Philos.
Trans. R. Soc. Lond. B Biol. Sci. 354, 223–236 (1999)
15. Nowak, M.: Evolutionary Dynamics: Exploring the Equations of Life. Belknap Press,
Cambridge, Massachusetts (2006)
16. Parker, G.A.: Assessment strategy and the evolution of animal conflicts. J. Theor. Biol. 47,
223–243 (1974)
17. Parker, G.A.: Mate quality and mating decisions. In: Bateson, P. (ed.) Mate Choice,
pp. 227–256. Cambridge University Press, Cambridge, UK (1983)
18. Ramsey, D.M.: A large population job search game with discrete time. Eur. J. Oper. Res. 188,
586–602 (2008)
19. Real, L.A.: Search theory and mate choice. I. Models of single-sex discriminination. Am. Nat.
136, 376–404 (1990)
20. Real, L.A.: Search theory and mate choice. II. Mutual interaction, assortative mating, and
equilibrium variation in male and female fitness. Am. Nat. 138, 901–917 (1991)
21. Shimer, R., Smith, L.: Assortative matching and search. Econometrica 68, 343–370 (2000)
22. Smith, L.: The marriage model with search frictions. J. Pol. Econ. 114, 1124–1144 (2006)
Chapter 2
Signalling Victory to Ensure Dominance:
A Continuous Model
Mike Mesterton-Gibbons and Tom N. Sherratt
Abstract A possible rationale for victory displays—which are performed by the

winners of contests but not by the losers—is that the displays are attempts to
decrease the probability that the loser of a contest will initiate a future contest with
the same individual. We explore the logic of this “browbeating” rationale with a
game-theoretic model, which extends previous work by incorporating the effects
of contest length and the loser’s strategic response. The model predicts that if the
reproductive advantage of dominance over an opponent is sufficiently high, then, in
a population adopting the evolutionarily stable strategy or ESS, neither winners nor
losers signal in contests that are sufficiently short; and only winners signal in longer
contests, but with an intensity that increases with contest length. These predictions
are consistent with the outcomes of recent laboratory studies, especially among
crickets, where there is now mounting evidence that eventual winners signal far
more frequently than losers after fighting, and that post-conflict displays are more
likely to be observed after long contests.
Keywords Contest behavior • Evolutionarily stable strategies • Post-conflict

displays
M. Mesterton-Gibbons ()
Department of Mathematics, Florida State University, 1017 Academic Way,
Tallahassee, FL 32306-4510, USA
T.N. Sherratt
Department of Biology, Carleton University, 1125 Colonel By Drive,
Ottawa, ON K1S 5B6, Canada

26 M. Mesterton-Gibbons and T.N. Sherratt
2.1 Introduction
Bower [4] defines a victory display as a display performed by the winner of a contest
but not by the loser. He offers in essence two possible adaptive explanations of their
function: that they are an attempt to advertise victory to other members of a social
group that do not pay attention to contests, or cannot otherwise identify the winner,
and thus alter their behavior (“function within the network”), or that they are an
attempt to decrease the probability that the loser of a contest will initiate a new
contest with the same individual (“function within the dyad”). In an earlier paper
[20], we called the first rationale advertising, and the second one browbeating; and
we used game-theoretic models to explore the logic of both rationales. These models
showed that both rationales are logically sound; moreover, all other things being
equal, the intensity of victory displays will be highest through advertising in groups
where the reproductive advantage of dominating an opponent is low, and highest
through browbeating in groups where the reproductive advantage of dominance is
high.
Here we further consider the browbeating rationale, leaving the case of an
advertising rationale for future work. By the browbeating rationale, a victory display
is an attempt to decrease the probability that the loser of a contest will initiate a new
contest with the same individual. As long as there is a chance that the loser will
challenge the winner to another fight in the future, the winner has won a battle for
dominance, but not the war. If, on the other hand, the victory ensures that the loser
will never challenge, then victory is tantamount to dominance. Thus browbeating is
an attempt to ensure that victory equals dominance, and the essence of modelling
this phenomenon is to observe a distinction between losing and subordination.
Although we have previously demonstrated that browbeating is a plausible mech-
anism for victory displays, our earlier model assumed—as opposed to predicted—
that a loser does not display, and hence dodged the question of why victory displays
should be respected. Moreover, our original model assumed that all contests were
of equal length, which leaves open the question as to whether individuals should be
more or less likely to signal their dominance after long (close) fights than after short
(one-sided) fights.
Accordingly, our purpose here is twofold. First, it is to relax the assumption
that the loser does not display. Second, our purpose is also to address the context-
dependent nature of the display, which lies outside the scope of our original
browbeating model. Specifically, in a recent study investigating fighting behavior
in the spring field cricket, Gryllus veletis, Bertram et al. [3] found that the intensity
of post-conflict signals (aggressive song rate and body jerk rate) were dependent
on whether the individual was a winner or loser (with winners signalling more
intensely than losers) and on the duration of the contest (with short fights producing
less intense signals). Ting et al. (unpublished) came to similar conclusions after
analysing the outcomes of fights in the fall field cricket, Gryllus pennsylvanicus.
Likewise, post-conflict displays in the black-capped chickadee, Poecile atricapil-
lus—albeit more common among losers than among winners—were more likely
2 Signalling Victory to Ensure Dominance 27
to occur after highly aggressive contests [15]. Collectively, these recent studies
suggest that context dependency might be a general feature of post-conflict displays.
Clearly, if mathematical models are to be of value in understanding victory displays,
then they should help explain not only the display, but also who displays, and with
what intensity. Here we present a simple model that addresses both phenomena.
2.2 Mathematical Model
The dominance status of a victor relative to the vanquished is determined by a

combination of dominance and what we refer to as “non-subordination.” A contest
outcome is one of dominance if one individual subordinates to the other but the
second does not, and of non-subordination if neither defers to the other. To capture
the idea that dominance over an opponent contributes more to (long-term) fitness
than non-subordination, which contributes more than being dominated, let there be
a fitness benefit of 1 for dominating the other individual, of b for non-subordination,
and of 0 for being dominated. Thus b, a dimensionless parameter assumed to satisfy
0 < b < 1 (2.1)
throughout, is an inverse measure of the reproductive advantage of dominance,

which is greatest in the limit as b → 0 and least in the limit as b → 1.
We assume that a fight is inevitable; and that its cost can be ignored, by virtue
of being the same for both individuals. Neither animal has information about its
own or its opponent’s strength. So we assume that each animal is equally likely to
win, since each has probability 12 of being stronger. However, the length T of the
contest is experienced, and the shorter it is, the more likely it is that the winner is
the stronger animal.
These assumptions are most nearly satisfied when a fight is provoked between
two new neighbors with no prior knowledge of one another’s strength and hence no
established dominance relation. Of course, they are idealizations. Yet not only do
they yield a tractable analytical model, but also they enable us to avoid confounding
the evolution of post-conflict displays with the evolution of basic aggression
thresholds per se.
Let strategy u = (u1 , u2 ) mean that a u-strategist displays with intensity u1 as
winner but u2 as loser. Let q(w, l) be the probability that a display of intensity w by
the winner against a display of intensity l by the loser elicits submission on the part
of the loser. (Thus a u-strategist wins with probability 12 against a v-strategist, but
wins and dominates only with the smaller probability 12 q(u1 , v2 ).) If a u-strategist
wins, then with probability q(u1 , v2 ) it also dominates and its payoff is 1; but with
probability 1 − q(u1, v2 ) it fails to dominate and its payoff is b. Let cw (s) denote the
cost to a winner of displaying with intensity s. Then, conditional upon winning, a u-
strategist’s payoff is q(u1 , v2 ) · 1 + {1 − q(u1, v2 )} · b − cw (u1 ). Likewise, conditional
upon losing, a u-strategist’s payoff is q(v1 , u2 ) · 0 + {1 − q(v1, u2 )}b − cl (u2 ), where
cl (s) denotes the cost to a loser of displaying with intensity s. Multiplying each of
the above payoffs by 12 and adding, we find that the reward to a u-strategist in a
population of v-strategists is
1
f (u, v) = (1 − b)q(u1, v2 ) − bq(v1, u2 ) − cw (u1 ) − cl (u2 ) + b. (2.2)
2
We need to place conditions on the functions cw , cl and q. First, for cw and cl , it
seems reasonable to suppose that cw (0) = 0, cw (s) > 0, cw (s) ≥ 0 (as in [20]) and
cl (0) = 0, cl (s) > 0, cl (s) ≥ 0. For the sake of simplicity, we satisfy these conditions
by taking
cw (s) = γw θ s, cl (s) = γl θ s (2.3)
with
γw < γl (2.4)
throughout, where θ (> 0) has the dimensions of INTENSITY−1 , so that γw (> 0) and
γl (> 0) are dimensionless measures of the marginal cost of displaying for a winner
and a loser, respectively.
Second, for q, the following seem reasonable: q(∞, l) = 1 for any finite l, and
q(w, l) = δ for all w ≤ l where δ is the base probability that winning will lead
to dominance—a winner cannot increase its chance of converting its win into
dominance unless it is displaying with at least as strong an intensity as the loser. The
shorter the contest, the more likely it is that the loser will feel heavily outgunned and
concede dominance; hence δ is a decreasing function of contest length T . For the
sake of simplicity, we take
δ = e−T /μ , (2.5)
where μ is a scaling factor (the length of a contest that would reduce the probability
of achieving dominance without a display from 1 to approximately 37 %). We also
require ∂ q/∂ w > 0 and ∂ q/∂ l < 0 for all w > l. Again for the sake of simplicity,
we satisfy all conditions on q by taking

δ + (1 − δ ){1 − e−θ (w−l)} if w ≥ l
q(w, l) = (2.6)
δ if w < l
throughout. Note the asymmetry here: a display by the loser is not a second chance
to win the fight. On the contrary, it is merely an attempt to reduce the probability
that losing implies subordination.
2.3 ESS Analysis
A strategy v is an evolutionarily stable strategy or ESS in the sense of Maynard

Smith [17] if it is uniquely the best reply to itself; that is, in present circumstances, if
v1 is a winner’s best reply to a loser’s v2 and v2 is a loser’s best reply to a winner’s v1 .
From Appendix A, if the marginal cost of displaying is so high for a winner that
γw ≥ 1 − b, then v1 = 0 (i.e., not displaying) is a winner’s best reply to any v2 ; and
likewise, if the marginal cost of displaying is so high for a loser that γl ≥ b, then
v2 = 0 is a loser’s best reply to any v1 . These are not interesting cases. Accordingly,
we assume henceforward that γw < 1 − b and γl < b invariably hold. That is, we
assume min(ρ , ζ ) > 1, where
1−b b
ρ = , ζ = . (2.7)
γw γl
Then, from Appendix A, and in particular from the discussion following (A8), the
game defined by (2.2)–(2.6) has a unique ESS if

T ρ ζ
< max ln , ln , (2.8)
μ ρ −1 ζ −1
although it has no ESS if the above inequality is reversed. Subject to (2.8), if also
T ρ ζ
< min ln , ln , (2.9)

μ ρ −1 ζ −1
then from (2.5) and (A5) the ESS is v = (0, 0): neither a winner nor a loser displays.
If, on the other hand, (2.8) holds with (2.9) reversed, then one of two cases arises. If

ζ T ρ
ln < < ln , (2.10)
ζ −1 μ ρ −1
then the ESS is still v = (0, 0) by the remark following (A6). If

ρ T ζ
ln < < ln , (2.11)
ρ −1 μ ζ −1
however, then it follows from (A6) that the ESS is given by θ v = (λ , 0), where
λ = ln(ρ {1 − e−T/μ }). (2.12)
Thus the relative magnitudes of ρ and ζ determine the ESS. For ρ > ζ or
γl b
> , (2.13)
γw 1−b
ζ ζ
there is no ESS if T > μ ln( ζ −1 ); but if T < μ ln( ζ −1 ), then the unique ESS is
given by v = (0, 0) for T < μ ln( ρ −1 ) and by θ v = (λ , 0) for T > μ ln( ρ ρ−1 ), with
ρ
λ defined by (2.12). If (2.13) is reversed, or ρ < ζ , then the unique ESS for T <
ρ
μ ln( ρ −1 ) is v = (0, 0); and for T > μ ln( ρ ρ−1 ) there is no ESS.
2.4 Discussion
Fighting behavior and their associated signals have been the subject of extensive
empirical and theoretical study (see [10, 12]). However, much of this work has
focused on the behaviors that occur before and during aggressive interactions,
and relatively little is known about behaviors that occur after the outcomes have
been decided [4]. Here we have developed and explored a game-theoretic model
of post-conflict signalling, seeking to identify who should tend to signal following
termination of conflict, with what intensity, and the factors that shape this intensity.
We have focused on the hypothesis that post-conflict signalling by the victor serves
to reinforce dominance, reducing the chances that the loser will try it on again,
although there may be other complementary adaptive explanations for such displays,
including advertising of victory to bystanders, and non-adaptive explanations such
as emotional release [4].
Post-conflict victory displays [4] have been reported in a range of organisms,
including humans [22] and birds [9], but they have been most intensively researched
in crickets (Orthoptera). Crickets often perform aggressive songs and body jerks
both during and after an agonistic conflict [1, 3, 13]. In a study of the field cricket,
Teleogryllus oceanicus, Bailey and Stoddart [2] proposed that if the display of a
victorious male is sufficiently intense, then it may indicate to the loser that the
fight is unlikely to be reversed by further combat, enabling the victor to divert
its time and energy to other activities such as mating. Conversely, low signalling
intensity of the winner may suggest to the loser that re-engagement could potentially
produce a reversal, hence some future reward to the loser. This is precisely the
situation we have attempted to model here. Indeed, Bailey and Stoddart [2] went
further and argued that the winner’s post-conflict display could be used as an
indication of the winner’s position in a broader dominance hierarchy, showing that
hierarchies constructed using an index based on post-conflict signalling correlated
well with those produced by more classical methods. Intriguingly, Logue et al. [16]
recently reported that contests between male field crickets Teleogryllus oceanicus
that were unable to sing were more aggressive than interactions between males that
were free to signal, supporting the view that signalling can serve to mitigate the
costs of fighting in these species.
As predicted by our current model, there is now a considerable amount of
evidence from the cricket literature that eventual winners tend to signal far more
frequently than losers after fighting [1, 3]. One factor driving this basic result in
our model (and most likely in the experiments) is our assumption that the marginal
cost of signalling is lower for the winner than the loser, i.e., γw < γl ; see (2.4). We
consider this an entirely realistic condition given that the victor is likely to have
“more left in the tank” than the vanquished (see [4], pp. 121–122 for a similar
argument). Indeed, in these cases costly signals may serve as an honest indicator
of how much the victor has in reserve, and thereby intimidate the opponent into
submission. The “Ali shuffle” [8] is potentially one such example of an honest
demonstration of a fighter’s superiority. Analogous behaviors, which may have
evolved at least in part to provide an honest signal of an individual’s ability (in

this case to escape predators), include stotting by Thomson’s gazelles [5], push-up
displays by anolis lizards [14] and aerial singing by skylarks [7].
Another factor favoring the result that eventual winners tend to signal far more
frequently than losers after fighting is a sufficiently low value of the parameter b,
i.e., a sufficiently high reproductive advantage to dominance. Indeed it follows from
the analysis in Sect. 2.3 that there can be no post-conflict signalling at the ESS in our
model if γw /γl > (1 − b)/b. On the contrary, the winner victory-displays at the ESS
only if (2.13) holds, i.e., if γl /γw > b/(1 − b). We have already noted that a high
value of γl /γw is one way to favor this inequality, and a low value of b is clearly
another.
Our model not only predicts that winners are more likely to signal than losers,
but also that signalling should be more intense the longer and more intense the
contest. This prediction arises simply because the ease of victory is itself a signal of
dominance, a fact unlikely to be reversed through signalling either by the winner or
by the loser. This phenomenon is captured in our model by making the baseline
probability that winning will lead to outright dominance, i.e., δ , a decreasing
function of the contest duration T ; see (2.5). Recent studies on crickets ([3], Ting
et al. unpublished) support the prediction that post-conflict displays are more likely
to be observed after long contests. Further evidence for this general property comes
from a recent comparative study by Jang et al. [13], who examined post-conflict
“dominance” displays by winners in pairwise contests of males of four different
species (Gryllus pennsylvanicus, G. rubens, G. vernalis and G. fultoni, respectively).
The latter two field cricket species do not fight as intensively as the former two
species and, as might be anticipated, do not display as frequently following conflict,
or with such vigor, as the former two species. This pattern once again suggests that
only close or costly fighting selects for victory displays, although more comparative
data are clearly needed.
For the model to be compared with observations in a quantitative sense, an
obvious question is, how big is the reproductive value of dominance in crickets?
That is, how small is b? The observation of multiple fights in a laboratory setting
(Janice Ting, personal communication) and the very presence of clear dominance
hierarchies [2] suggest that there is a significant reproductive advantage to domi-
nance. Suppose, in the first instance, that the reproductive advantage of dominance
is high enough to ensure b < 12 . Then (2.13) must hold, corresponding to the shaded
ρ
triangle in Fig. 2.1, and there are two critical values of T /μ , a smaller value ln( ρ −1 ),
ζ
below which v = 0 is the ESS, and a larger value ln( ζ −1 ), above which no ESS
exists. Between these two critical values, the ESS is determined by θ v = (λ , 0),
with λ defined by (2.12). We see that the intensity of the winner’s display is zero
until contest length reaches the first critical value, after which intensity increases
with contest length until the second critical value is reached; and the corresponding
intensity of the loser’s display is zero. This behavior is illustrated in Fig. 2.2 in the
limit as ζ → 1 from above, so that the second critical value recedes towards infinity.
Fig. 2.1 ESS regions in the z

ρ − ζ plane for fixed values
of T /μ , hence fixed values of
α = 1/(1 − e−T /μ ). The
MOTIVATION FOR LOSER TO DISPLAY

shaded triangle is where
ρ = (1 − b)/γw exceeds
ζ = b/γl , i.e., where (2.13)
holds. High reproductive
No ESS
advantage of dominance
corresponds to low b and
hence low ζ , i.e., just above
the horizontal axis
a
v 0, 0 v ln z a, 0
1 r
1 a
MOTIVATION FOR WINNER TO DISPLAY
qv1
2
INTENSITY OF DISPLAY
0.6
1.7
1.5
0.3
1.3
1.1
0 Tm
0 3 6 9
CONTEST LENGTH
Fig. 2.2 Scaled intensity of winner’s victory display as a function of scaled contest length for
various values of the parameter ρ = (1 − b)/γw (assumed to exceed 1) in the limit as ζ → 1 from
above, where ζ = b/γl
In general, however, both critical values are finite. For contest lengths below the
first critical value, neither the winner nor the loser displays at the ESS. For contest
lengths between the two critical values, only the winner displays, with intensity that
increases with T . For contest lengths greater than the second critical value, the ESS
breaks down as described at the end of the appendix, and in such a way that a loser’s
optimal response will sometimes be to match the winner’s display. Thus, according
to our model, a loser should be expected to display only if the contest is so long
that its length exceeds the second critical value. Those unusual biological examples
in which only the loser displays (e.g., [15]) may potentially be explained by some
sort of subservient signal to assure dominance to the victor, thereby reducing future
conflict [4].
There is an intriguing parallel between one of our results on victory displays
and a result concerning winner effects that Mesterton-Gibbons [19] found, several
years before victory displays were first reviewed by Bower [4]. A winner effect is
an increased probability of victory in a later contest following victory in an earlier
contest [21], which in Mesterton-Gibbons [19] is mediated through increased self-
perception of strength. The greater the likelihood of a later victory, the more likely
it is that the earlier victory will eventually lead to dominance over the opponent.
Thus a winner effect may also be regarded as an attempt to convert victory into
dominance, even though there is no display. The result discovered by Mesterton-
Gibbons [19] is that there can be no winner effect unless b < 12 , where b has exactly
the same interpretation as in our current model, i.e., an inverse measure of the
reproductive advantage of dominance. Thus, to the extent that victory displays and
winner effects can both be regarded as factors favoring dominance, such factors are
most operant when b < 12 .
Finally, for the sake of tractability, we did not explicitly model the variation
of strength that supports any variation of contest length observed in nature. On
the contrary, we assumed that T is fixed for a theoretical population; and we
obtained an evolutionarily stable response to that T , which is likewise fixed for
the theoretical population. Over many such theoretical populations, each with a
different T , however, there will be many different ESS responses; and in effect
we have implicitly assumed that the variation of ESS with T thus engendered
will reasonably approximate the variation of signal intensity with contest length
observed within a single real population. Essentially this assumption—phrased more
generally, that ESS variation over many theoretical populations each characterized
by a different parameter value will reasonably approximate variation of behavior
with respect to that parameter within a single real population—is widely adopted
in the literature, although rarely made explicit, as here. Indeed essentially this
assumption is made whenever a game-theoretic model predicts the dependence of
an ESS on a parameter that varies within a real population, but whose variance is
not accounted for by the model.
Acknowledgements We are grateful to Lauren Fitzsimmons and two anonymous reviewers for
constructive feedback on earlier versions of the manuscript.
Appendix A. ESS Conditions
Strategy v is a strong, global evolutionarily stable strategy or ESS in the sense of [17]
if (and only if) it is uniquely the best reply to itself, in the sense that f (v, v) > f (u, v)
for all u
= v; or, equivalently for our model, if v1 is a winner’s best reply to a loser’s
v2 and v2 is a loser’s best reply to a winner’s v1 .1
From (2.2) and (2.6) we have
⎧
⎪ 1
∂f ⎨− γw θ if u1 < v2
= 2 (A1)
∂ u1 ⎪
⎩ 1 (1 − b)(1 − δ ) e−θ (u1−v2 ) − γw θ if u1 > v2
2
with ∂ 2 f /∂ u21 = − 12 θ 2 (1−b)(1− δ ) e−θ (u1 −v2 ) < 0 for u1 > v2 but ∂ 2 f /∂ u21 = 0 for
u1 < v2 . So, with respect to u1 , f decreases from u1 = 0 to u1 = v2 . What happens
next depends on the limit of ∂ f /∂ u1 as u1 → v2 from above, which is 12 {(1 − b)
(1 − δ ) − γw}θ . If this quantity is not positive, then f continues to decrease, and so
the maximum of f with respect to u1 occurs at u1 = 0. So a winner’s best reply is
u1 = 0 whenever δ > 1 − γw /(1 − b) (which is true in particular if γw > 1 − b). If,
on the other hand, δ < 1 − γw /(1 − b), then there is a local maximum for u1 > v2
where ∂ f /∂ u1 = 0 or

(1 − b)(1 − δ )
θ u1 = θ v2 + ln . (A2)
γw
The value of f at this local maximum exceeds the value at u1 = 0 only if

(1 − b)(1 − δ ) (1 − b)(1 − δ )
θ v2 < − 1 − ln . (A3)
γw γw
Note that the right-hand side of (A3) is always positive (because x − 1 − ln(x) > 0
for all x > 1). In sum, a winner’s best reply is u1 = 0 unless δ < 1 − γw /(1 − b)
and (A3) holds, in which case, the best reply is given by (A2). In particular, zero is
always a winner’s best reply if γw > 1 − b.
Similarly,
⎧
⎪ 1
∂f ⎨ b(1 − δ ) eθ (u2−v1 ) − γl θ if u2 < v1
= 2 (A4)
∂ u2 ⎪
⎩− 1 γl θ if u2 > v1
2
with ∂ 2 f /∂ u22 = 12 θ 2 b(1 − δ ) eθ (u2 −v1 ) > 0 for u2 < v1 but ∂ 2 f /∂ u22 = 0 for u2 > v1 .
Note that the limit of ∂ f /∂ u2 as u2 → v1 from below is 12 {b(1 − δ ) − γl }θ . Because
1 In general, strategy v is an ESS if it does not pay a potential mutant to switch from v to any other
strategy, and v need not satisfy the strong condition f (v, v) > f (u, v) for all u
= v. If there is at least
one alternative best reply u such that f (u, v) = f (v, v) but v is a better reply than u to all such u
( f (v, u) > f (u, u)), then v is called a weak ESS. For our model, however, any ESS is a strong ESS,
as is typical of continuous games ([18], p. 408).
∂ 2 f /∂ u22 > 0, if the limit is negative, i.e., if δ > 1 − γl /b, then f decreases with
respect to u2 and has its maximum where u2 = 0, so that a loser should not display.
If, on the other hand, the limit is positive, i.e., δ < 1 − γl /b, then f at least partly
increases with respect to u2 for u2 < v1 ; and so the maximum of f with respect to u2
occurs either at u2 = 0 or u2 = v1 , depending on which has the higher value of f . Let
xc denote the unique positive root of the equation (1 − δ )(1 − e−x ) = γl x/b. Then
straightforward algebra reveals that the maximum is at 0 if θ v1 > xc but at v1 if
θ v1 < xc . In sum, a loser’s best reply is u2 = 0 unless δ < 1 − γl /b and θ v1 < xc , in
which case, the best reply is v1 . Clearly, θ v1 < xc holds for v1 = 0, so that u2 = 0
is in particular the best reply to v1 = 0; however, this result follows more readily
directly from (A4). Also, note that zero is always a loser’s best reply if γl > b.
For v = (v1 , v2 ) to be an ESS it must be a best reply to itself, i.e., we require v1
to be a winner’s best reply to the loser’s v2 at the same time as v2 is a loser’s best
reply to the winner’s v1 . If

γw γl
δ > max 1 − ,1 − (A5)
1−b b
then the unique ESS is v = (0, 0), because it follows from the discussion after (A1)
that v1 = 0 is the best reply to any v2 , and hence to v2 = 0; and from the discussion
after (A4) that v2 = 0 is the best reply to any v1 , and hence to v1 = 0. If
γw γl
1− < δ < 1− (A6)
1−b b
then v = (0, 0) is still the ESS by the above discussion and the remark at the end of
the preceding paragraph. If instead
γl γw
1− < δ < 1− (A7)
b 1−b
then the ESS is given by θ v = (λ , 0) where λ is defined by (2.12), because it follows

from (A2) that v1 = λ /θ is the best reply to v2 = 0; and v2 = 0 is still the best reply
to any v1 , and hence to v1 = 0.
If, on the other hand,

γw γl
δ < min 1 − ,1 − (A8)
1−b b
then an ESS does not exist. Consider a population in which a winner displays with
small positive intensity v1 . Then θ v1 < xc ; and, from the discussion following (A4),
a loser’s best reply is to match the display. From (A2), a winner’s best reply is
now to increase the intensity of its display, because (A3) invariably holds; and a
loser’s best reply in turn is again to match the display. Continuing in this manner,
we observe an “arms race” of increasing display intensity, until either (A3) or
θ v1 < xc is violated. If the former, then a winner’s best reply is not to display,
which a loser matches, so that it pays for a winner to display at higher intensity;
if the latter, then a loser’s best reply becomes no display, but now a winner’s best
reply is to display with intensity λ /θ . Either way, the unstable cycle continues ad
infinitum. The study of victory displays is still in its infancy, and researchers are still
trying to characterize when it occurs and with what frequency. Therefore, there is
no study into its temporal dynamics (within or between generations). Furthermore,
full analysis of the evolutionary dynamics when no ESS exists is beyond the scope
of this paper. Nevertheless, we broach this issue in Appendix B.
Appendix B. What Happens When There Is No ESS?
In this appendix we remark on why, when no ESS exists, the evolutionary dynamics
require a more sophisticated approach than the one we have taken in this paper
and cannot readily be addressed by the standard framework of discrete evolutionary
games with replicator dynamics (e.g., [6, 11]). To make our point as expeditiously
as possible, we explore circumstances in which b < 12 and hence ρ > ζ but T /μ
ζ
exceeds the second critical value ln( ζ −1 ) of Sect. 2.3 (corresponding to the shaded
triangle within the no-ESS region of Fig. 2.1).
Accordingly, consider a mixture of three strategies that appear to evoke the
discussion towards the end of Appendix A, namely, a non-signalling strategy,
denoted by N or Strategy 1; the ESS signalling strategy for the dark shaded rectangle
of Fig. 2.1, denoted by S or Strategy 2; and a matching strategy, denoted by M
ρ
or strategy 3, which displays with the ESS intensity corresponding to ln( ρ −1 )<
ζ
T / μ < ln( ζ −1 ) after winning, but matches the winner’s display after losing. From
the viewpoint of a focal u-strategist against a v-strategist, these three strategies are
defined, respectively, by u = (0, 0) for N; u = (λ /θ , 0) for S; and u = (λ /θ , v1 )
for M. Let the proportions of N, S and M be x1 , x2 and x3 , respectively (so that
x1 + x2 + x3 = 1); and let ai j be the reward to strategy i against strategy j (for
1 ≤ i, j ≤ 3). Then from (2.2), (2.6) and (2.12) we have a11 = 12 {(1 − 2b)q(0, 0) −
cw (0) − cl (0)} + b = 12 δ + (1 − δ )b, a12 = 12 {(1 − b)q(0, 0) − bq(λ /θ , 0) − cw(0) −
cl (0)} + b = 12 {(1 − b)δ + b(1 + 1/ρ )}, and so on, yielding the reward matrix
⎡ ρ (1−b)δ +(ρ +1)b
⎤
2 δ + (1 − δ )b
1
2ρ a12
⎢ ρ −1+b+(1−δ )ρ b 1 ρ −1+2b ⎥
A= ⎣ 2ρ − 2 γw ln(ρ /α ) 2ρ − 2 γw ln(ρ /α )
1
a12 − 12 γw ln(ρ /α ) ⎦
a21 a21 − 2 γl ln(ρ /α )
1
a11 − 12 (γw + γl ) ln(ρ /α )
(B1)
where α = 1/(1 − δ ), as in Fig. 2.2.

Because γw ρ = 1 − b and ρ /α = ρ (1 − δ ) > 1 by assumption, a11 − a21 =
2 w {1 − ρ /α + ln(ρ /α )} must be negative (because 1 − x + ln(x) < 0 for all x > 1).
γ
1
Thus a11 < a21 , and Strategy 1 is not an ESS. However, because a11 − a21 + a22 −
a12 = 0, it also follows that a22 > a12 . So if Strategy 2 is also not an ESS, then it
must be Strategy 3 that invades. But a22 − a32 = 12 γl {(1/ρ − 1/α )ζ + ln(ρ /α )} may
have either sign, and in particular will always be positive for sufficiently small ζ ,
that is, for ζ sufficiently close to α (ζ > α having been assumed). Furthermore, if
we suppose that the point (ρ , ζ ) in Fig. 2.1 has migrated from the signalling ESS
region (dark shaded rectangle) into the no-ESS region (shaded triangle immediately
above) because environmental pressures have increased the value the value of ζ
(by decreasing γl ), then it is precisely such sufficiently small values of ζ that are
relevant. Thus S will often be an ESS of the discrete game defined by the matrix A
even though it is no longer an ESS of the continuous game described in the main
body of our paper.
The upshot is that replicator dynamics cannot readily be used to describe what
happens when S is not an ESS of our continuous game; the dynamics described
verbally towards the end of Appendix A are not adequately reflected by a mix of M,
N and S. They require a much more sophisticated approach, and we leave the matter
open for future work.
Nevertheless, let us suppose that ζ is indeed large enough for M to invade S,
that is,
ρα
ζ > ln(ρ /α ). (B2)
ρ −α
Then a22 − a32 < 0, and because a33 − a23 + a22 − a32 = 0, we must have
a33 − a23 > 0. That is, from (B1), a33 > a13 − 12 γw ln(ρ /α ). Thus a33 >
max(a13 , a23 ) will hold for sufficiently small γw , making M the unique ESS
of the discrete game defined by A. Otherwise (i.e., for larger values of γw ),
a32 > a22 , a21 = a31 > a11 and a13 > a33 will hold simultaneously, so that M
can invade S, S or M can invade N and N can invade M. In these circumstances,
the population will eventually reach a polymorphism of N and M at which
x1 = (a13 − a33 )/(a13 − a33 + a31 − a11 ), x2 = 0 and x3 = 1 − x1 . All of these
results have been verified by numerical integration, for relevant parameter values,
of the replicator equations ẋi = xi {(Ax)i − x · Ax}, i = 1, 2, 3, where x = (x1 , x2 , x3 )
and an overdot denotes differentiation with respect to time (see, e.g., [11], p. 68).
References
1. Alexander, R.D.: Evolutionary change in cricket acoustical communication. Evolution 16,

443–467 (1962)
2. Bailey, W.J., Stoddart, J.A.: A method for the construction of dominance hierarchies in the
field cricket Teleogryllus oceanicus (Le Guillou). Anim. Behav. 30, 216–220 (1982)
3. Bertram, S.M., Rook, V.L.M., Fitzsimmons, L.P.: Strutting their stuff: victory displays in the
spring field cricket, Gryllus veletis. Behaviour 147, 1249–1266 (2010)
4. Bower, J.L.: The occurrence and function of victory displays within communication networks.
In: McGregor, P. (ed.) Animal Communication Networks, pp 114–126. Cambridge University
Press, Cambridge (2005)
5. Caro, T.M.: The functions of stotting in Thomson’s gazelles: some tests of the predictions.
Anim. Behav. 34, 663–684 (1986)
6. Cressman, R.: Evolutionary Dynamics and Extensive Form Games. MIT Press, Cambridge
(2003)
7. Cresswell, W.: Song as a pursuit-deterrent signal, and its occurrence relative to other
anti-predation behaviours of skylark (Alauda arvensis) on attack by merlins (Falco
columbarius). Behav. Ecol. Sociobiol. 34, 217–223 (1994)
8. Golus, C.: Muhammad Ali. Twenty-First Century Books, Minneapolis (2006)
9. Grafe, T., Bitz, J.H.: An acoustic postconflict display in the duetting tropical boubou (Laniarius
aethiopicus): a signal of victory? BMC Ecol. 4(1) (2004). http://www.biomedcentral.com/
1472-6785/4/1
10. Hardy, I.C.W., Briffa, M. (eds.): Animal Contests. Cambridge University Press, Cambridge
(2013)
11. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge
University Press, Cambridge (1998)
12. Huntingford, F., Turner, A.K.: Animal Conflict. Chapman & Hall, London (1987)
13. Jang, Y., Gerhardt, H.C., Choe, J.C.: A comparative study of aggressiveness in eastern North
American field cricket species (genus Gryllus). Behav. Ecol. Sociobiol. 62, 1397–1407 (2008)
14. Leal, M., Rodrı́guez-Robles, J.A.: Signalling displays during predator-prey interactions in a
puerto rican anole, Anolis cristatellus. Anim. Behav. 54, 1147–1154 (1997)
15. Lippold, S., Fitzsimmons, L.P., Foote, J.R., Ratcliffe, L.M., Mennill, D.J.: Post-contest
behaviour in black-capped chickadees (Poecile atricapillus): loser displays, not victory
displays, follow asymmetrical countersinging exchanges. Acta Ethol. 11, 67–72 (2008)
16. Logue, D.M., Abiola, I.O., Rains, D., Bailey, N.W., Zuk, M., Cade, W.H.: Does signalling
mitigate the cost of agonistic interactions? a test in a cricket that has lost its song. Proc. R. Soc.
Lond. B 277, 2571–2575 (2010)
17. Maynard, S.J.: Evolution and the Theory of Games. Cambridge University Press, Cambridge
(1982)
18. McGill, B.J., Brown, J.S.: Evolutionary game theory and adaptive dynamics of continuous
traits. Annu. Rev. Ecol. Syst. 38, 403–435 (2007)
19. Mesterton-Gibbons, M.: On the evolution of pure winner and loser effects: a game-theoretic
model. Bull. Math. Biol. 61, 1151–1186 (1999)
20. Mesterton-Gibbons, M., Sherratt, T.N.: Victory displays: a game-theoretic analysis. Behav.
Ecol. 17, 597–605 (2006)
21. Rutte, C., Taborsky, M., Brinkhof, M.W.G.: What sets the odds of winning and losing? Trends
Ecol. Evol. 21, 16–21 (2006)
22. Tracy, J.L., Matsumoto, D.: The spontaneous expression of pride and shame: evidence for
biologically innate nonverbal displays. Proc. Natl. Acad. Sci. U.S.A. 105, 11655–11660 (2008)
Chapter 3
Evolutionary Games for Multiple Access Control
Quanyan Zhu, Hamidou Tembine, and Tamer Başar
Abstract In this paper, we formulate an evolutionary multiple access control game

with continuous-variable actions and coupled constraints. We characterize equilibria
of the game and show that the pure equilibria are Pareto optimal and also resilient
to deviations by coalitions of any size, i.e., they are strong equilibria. We use the
concepts of price of anarchy and strong price of anarchy to study the performance of
the system. The paper also addresses how to select one specific equilibrium solution
using the concepts of normalized equilibrium and evolutionarily stable strategies.
We examine the long-run behavior of these strategies under several classes of
evolutionary game dynamics, such as Brown–von Neumann–Nash dynamics, Smith
dynamics, and replicator dynamics. In addition, we examine correlated equilibrium
for the single-receiver model. Correlated strategies are based on signaling structures
before making decisions on rates. We then focus on evolutionary games for hybrid
additive white Gaussian noise multiple-access channel with multiple users and
multiple receivers, where each user chooses a rate and splits it over the receivers.
Users have coupled constraints determined by the capacity regions. Building upon
the static game, we formulate a system of hybrid evolutionary game dynamics using
† Thismaterial is based upon work supported in part by the U.S. Air Force Office of Scientific
Research (AFOSR) under grant number FA9550-09-1-0249, and in part by the AFOSR MURI
Grant FA9550-10-1-0573.
The material in this paper was partially presented in [9] and [10].
Q. Zhu () • T. Başar
Coordinated Science Laboratory and Department of Electrical and Computer Engineering,
University of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL 61801, USA
e-mail: [email protected]; [email protected]
H. Tembine
Department of Telecommunications, École Supérieure d’Electricité (SUPELEC),
3 rue Joliot-Curie, 91192 Gif-Sur-Yvette Cedex, France

40 Q. Zhu et al.
G-function dynamics and Smith dynamics on rate control and channel selection,
respectively. We show that the evolving game has an equilibrium and illustrate these
dynamics with numerical examples.
Keywords Game theory • Evolutionary game dynamics • Capacity region

• Access control • Hybrid dynamics • G-functions
3.1 Introduction
Recently much interest has been devoted to understanding the behavior of multiple
access controls under constraints. A considerable amount of work has been carried
out on the problem of how users can obtain an acceptable throughput by choosing
rates independently. Motivated by an interest in studying a large population of users
playing a game over time, evolutionary game theory was found to be an appropriate
framework for communication networks. It has been applied to problems such as
power control in wireless networks and mobile interference control [1, 5, 6, 11].
The game-theoretical models considered in previous studies on user behavior in
code division multiple access (CDMA) [4, 33] are static one-shot noncooperative
games in which users are assumed to be rational and optimize their payoffs
independently. Evolutionary game theory, on the other hand, studies games that are
played repeatedly and focuses on the strategies that persist over time, yielding the
best fitness of a user in a noncooperative environment on a large time scale.
In [19], an additive white Gaussian noise (AWGN) multiple-access-channel
problem was modeled as a noncooperative game with pairwise interactions, in which
users were modeled as rational entities whose only interests were to maximize their
own communication rates. The authors obtained the Nash equilibrium NE/NEs
of Nash equilibrium/equilibria of the two-user game and introduced a two-player
evolutionary game model with pairwise interactions based on replicator dynamics.
However, the case where interactions are not pairwise arises frequently in communi-
cation networks, such as the CDMA or the orthogonal frequency-division multiple
access (OFDMA) in a Worldwide Interoperability for Microwave Access (WiMAX)
environment [11].
In this work, we extend the study of [19] to wireless communication systems with
an arbitrary number of users corresponding to each receiver. We formulate a static
noncooperative game with m users subject to rate capacity constraints and extend
the constrained game to a dynamic evolutionary game with a large number of users
whose strategies evolve over time. Unlike evolutionary games with discrete and
finite numbers of actions, our model is based on a class of continuous games, known
as continuous-trait games. Evolutionary games with continuum action spaces are
encountered in a wide variety of applications in evolutionary ecology, such as
evolution of phenology, germination, nutrient foraging in plants, and predator–prey
foraging [7, 23].
3 Evolutionary Games for Multiple Access Control 41
In addition to the single-receiver model, we investigate the case with multiple

users and receivers. We first formulate a static hybrid noncooperative game with N
users who rationally make decisions on the rates as well as the channel selection
subject to rate the capacity constraints of each receiver. We extend the static game
to a dynamic evolutionary game by viewing rate selections governed by a fitness
function parameterized by the channel selections. Such a concept of a hybrid model
appeared in [32, 36] in the context of hybrid power control in CDMA systems. The
strategies of channel selections determine the long-term fitness of the rates chosen
by each user. We formulate such dynamics based on generalized Smith dynamics
and generating fitness function (G-function) dynamics.
3.1.1 Contribution
The main contributions of this work can be summarized as follows. We first intro-
duce a game-theoretic framework for local interactions between many users and a
single receiver. We show that the static continuous-kernel rate allocation game with
coupled rate constraints has a convex set of pure NEs, coinciding with the maximal
face of the polyhedral capacity region. All the pure equilibria are Pareto optimal
and are also strong equilibria, resilient to simultaneous deviation by coalition of any
size. We show that the pure NEs in the rate allocation problem are 100 % efficient in
terms of price of anarchy (PoA) and constrained strong price of anarchy (CSPoA).
We study the stability of strong equilibria, normalized equilibria, and evolutionarily
stable strategies (ESSs) using evolutionary game dynamics such as Brown–von
Neumann–Nash dynamics, generalized Smith dynamics, and replicator dynamics.
We further investigate the correlated equilibrium of the multiple-access game where
the receiver can send signals to the users to mediate the behaviors of the transmitters.
Based on the single-receiver model, we then propose an evolutionary game-
theoretic framework for the hybrid additive white Gaussian noise multiple-access
channel. We consider a communication system of multiple users and multiple
receivers, where each user chooses a rate and splits it over the receivers. Users have
coupled constraints determined by the capacity regions. We characterize the NE of
the static game and show the existence of the equilibrium under general conditions.
Building upon the static game, we formulate a system of hybrid evolutionary game
dynamics using G-function dynamics and Smith dynamics on rate control and
channel selection, respectively. We show that the evolving game has an equilibrium
and illustrate these dynamics with numerical examples.
3.1.2 Organization of the Paper
The rest of the paper is structured as follows. We present in Sect. 3.2.1 the evolu-
tionary game model of rate allocation in additive white Gaussian multiple-access
wireless networks and analyze its equilibria and Pareto optimality in Sect. 3.2.2. In
42 Q. Zhu et al.
Table 3.1 List of notations Symbol Meaning

N Set of N users
Ω Subset of N users
J Set of J receivers
Ai Action set of user i
Pi Maximum power of user i
hi Channel gain of user i
αi Rate of user i
pi j Probability that user i will select
receiver j
ui Payoff of user i
Ui Expected payoff of user i
C Capacity region of a set N of
users in a single-receiver case
C( j) Capacity region of a set N of
users at receiver j
λi Distribution over action set Ai
μ Population state
Table 3.2 List of acronyms Abbreviation Meaning

AWGN Additive white Gaussian noise
MAC Multiple-access control
MISO Multi-input and single-output
CCE Constrained correlated equilibrium
ESS Evolutionarily stable state (or strategy)
NE Nash equilibrium
PoA Price of anarchy
SPoA Strong price of anarchy
Sect. 3.2.3, we present strong equilibria and the PoA of the game. In Sect. 3.2.4,
we discuss how to select one specific equilibrium such as normalized equilibrium
and ESSs. Section 3.2.5 studies the stability of equilibria and the evolution of
strategies using game dynamics. Section 3.2.6 analyzes the correlated equilibrium
of the multiple-acess game.
In Sect. 3.3.1, we present the hybrid rate control model where users can choose
the rates and the probability of the channels to use. In Sect. 3.3.2, we characterize
the NE of the constrained hybrid rate control game model, pointing out the existence
of the NE of the hybrid model and methods to find it. In Sect. 3.3.3, we apply
evolutionary dynamics to both rates and channel selection probabilities. We use
simulations to demonstrate the validity of these proposed dynamics and illustrate
the evolution of the overall evolutionary dynamics of the hybrid model. Section 3.4
concludes the paper. For the reader’s convenience, we summarize the notations in
Table 3.1 and the acronyms in Table 3.2.
3.2 AWGN Mutiple-Access Model: Single-Receiver Case
We consider a communication system consisting of several receivers and several

senders (Fig. 3.1). At each time, there are several simultaneous local interactions
(typically, at each receiver there is a local interaction). Each local interaction
corresponds to a noncooperative one-shot game with common constraints. The
opponents do not necessarily stay the same from one given time slot to the next.
Users revise their rates in view of their payoffs and the coupled constraints (for
example, by using an evolutionary process, a learning process, or a trial-and-error
updating process). The game evolves over time. Users are interested in maximizing
a fitness function based on their own communication rates at each time, and they
are aware of the fact that the other users have the same goal. The coupled power
and rate constraints are also common knowledge. Users must choose independently
their own coding rates at the beginning of the communication, where the rates
selected by a user may be either deterministic or chosen from some distribution.
If the rate profile arrived at as a result of these independent decisions lies in the
capacity region, users will communicate at that operating point. Otherwise, either
the receiver is unable to decode any signal and the observed rates are zero or
only one of the signals can be decoded. The latter occurs when all the other users
are transmitting at or below a safe rate. With these assumptions, we can define a
constrained noncooperative game. The set of allowed strategies for user i is the set
of all probability distributions over [0, +∞[, and the payoff is a function of the rates.
In addition, the rational action (rate) sets must lie in the capacity regions (the payoff
is zero if the constraint is violated). To study the interactions between the selfish
or partially cooperative users and their stationary rates in the long run, we propose
to model the problem of rate allocation in Gaussian multiple-access channels as
an evolutionary game with a continuous action space and coupled constraints. The
development of evolutionary game theory is a major contribution of biology to
competitive decision making and the evolution of cooperation. The key concepts
Fig. 3.1 A population:

distributed receivers (blue
rectangles) and senders (red
circles)
44 Q. Zhu et al.
of evolutionary game theory are (a) evolutionarily stable state [25], which is a
refinement of equilibrium, and (b) evolutionary game dynamics, such as replicator
dynamics [29], which describes the evolution of strategies or frequencies of use of
strategies in time [7, 21].
The single population evolutionary rate allocation game is described as follows:
there is one population of senders (users) and several receivers. The number of
senders is large. At each time, there are many one-shot finite games called local
interactions, which models the interactions among a finite number of users in
the population. Each sender of the population chooses from his or her set of
strategies Ai , which is a nonempty, convex, and compact subset of R. Without
loss of generality, we can suppose that user i chooses his or her rate in the interval
Ai = [0,C{i} ], where C{i} is the rate upper bound for user i (to be made precise
subsequently) as outside of the capacity region the payoff (to be defined later) will
be zero. Let Δ (Ai ) be the set of probability distributions over the pure strategy
set Ai . The set Δ (Ai ) can be interpreted as the set of mixed strategies for the N-
person game at the local interaction. In the case where the N-person local interaction
is identical at all local interactions in the population, the set Δ (Ai ) can also
be interpreted as the set of distributions of strategies among the population. Let
λi ∈ Δ (Ai ) and E be a λi − measurable subset of R; then λi (E) represents the
fraction of users choosing a strategy from E at time t. A distribution λi ∈ Δ (Ai )
is sometimes called the “state” of the population. We denote by B(Ai ) the Borel
σ -algebra on Ai and by d(λ , λ ) the distance between two states measured with
respect to the weak topology. An example of such a distance could be the classical
Wasserstein distance or the Monge–Kantorovich distance between two measures.
Each user’s payoff depends on the opponents’ behavior through the distribution
of the opponents’ choices and of their strategies. The payoff of user i in a local
interaction with (N − 1) other users is given as a function ui : RN −→ R. The rate
profile α ∈ RN must belong to a common capacity region C ⊂ RN defined by 2N − 1
linear inequalities. The expected payoff of a sender i transmitting at a rate a when
the state of the population is μ ∈ Δ (Ai ) is given by Fi (a, μ ). The expected payoff
for user i is

Fi (λi , μ ) := ui (α ) λi (dαi ) μ (dα j ).
α ∈C =i
j
The population state is subject to the “mixed extension” of capacity constraints

M(C). This will be discussed in Sect. 3.2.5 and made more precise later.
3.2.1 Local Interactions
Local interaction refers to the problem setting of one receiver and its uplink
additive white Gaussian noise (AWGN) multiple-access channel with N senders
with coupled constraints (or actions). The signal at the receiver is given by Y =
ξ + ∑Ni=1 Xi , where Xi is a transmitted signal of user i and ξ is a zero-mean Gaussian
noise with variance σ02 . Each user has an individual power constraint E(Xi2 ) ≤ Pi and
channel gain hi . The optimal power allocation scheme is to transmit at the maximum
power available, i.e., Pi , for each user. Hence, we consider the case in which the
maximum power is attained. The decisions of the users, then, consist of choosing
their communication rates, and the receiver’s role is to decode, if possible. The
capacity region is the set of all vectors α ∈ RN+ such that users i ∈ N := {1, 2, . . . , N}
can reliably communicate at rate αi , i ∈ N. The capacity region C for this channel
is the set

Pj h j
C = α ∈ R+ ∑ αi ≤ log 1 + ∑ 2 .∀ 0/ Ω ⊆ N .
N
i∈Ω j∈Ω σ0
Example 3.1 (Example of capacity region with three users). In this example, we
illustrate the capacity region with three users. Let α1 , α2 , α3 be the rates of the users
and Pi = P, hi = h, ∀i ∈ {1, 2, 3}. Based on (3.1), we obtain a set of inequalities
⎧
⎪
⎪ α1 ≥ 0, α2 ≥ 0, α3 ≥ 0,
⎪
⎪
⎪
⎪
⎪ αi ≤ log 1 + Ph2 , i = 1, 2, 3,
⎪
⎨ σ0

⎪
⎪ α + α ≤ log 1 + 2 Ph
,i
= j, i, j = 1, 2, 3
⎪
⎪ i j σ02
⎪
⎪
⎪
⎪
⎩ α + α + α ≤ log 1 + 3 Ph ,
1 2 3 σ 2
0
or, in compact notation, M3 γ3 ≤ ζ3 , where

⎡ ⎤ ⎡ ⎤
C{1} 1 00
⎢ C ⎥ ⎢0 1 0⎥
⎡ ⎤ ⎢ {2} ⎥ ⎢ ⎥
α1 ⎢ ⎥ ⎢0 0 1⎥
⎢ C{3} ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
γ3 := ⎣ α2 ⎦ ∈ R+ ,
3
ζ3 := ⎢ C{1,2} ⎥ , M3 := ⎢ 1 1 0 ⎥ ∈ Z7×3 .
⎢ ⎥ ⎢ ⎥
α3 ⎢ C{1,3} ⎥ ⎢1 0 1⎥
⎢ ⎥ ⎢ ⎥
⎣ C{2,3} ⎦ ⎣0 1 1⎦
C{1,2,3} 1 11
Note that M3 is a totally unimodular matrix. By letting Ph = 25, σ02 = 0.1, we sketch
in Fig. 3.2 the capacity region with three users.
The capacity region reveals the competitive nature of interactions among senders:
if a user i wants to communicate at a higher rate, then one of the other users must
lower his or her rate; otherwise, the capacity constraint is violated. We let

Pi hi
ri,Ω := log 1 + 2 , i ∈ N, Ω ⊆ N
σ0 + ∑i ∈Ω ,i =i Pi hi
denote the bound on the rate of a user when the signals of the |Ω | − 1 other users
are treated as noise.
46 Q. Zhu et al.
Fig. 3.2 Capacity region

with three users
Due to the noncooperative nature of the rate allocation, we can formulate the
one-shot game
Ξ = N, (Ai )i∈N , (ui )i∈N ,
where the set of users N is the set of players, Ai , i ∈ N, is the set of actions, and
ui , i ∈ N, are the payoff functions.
3.2.2 Payoffs
We define ui : ∏Ni=1 Ai → R+ as follows:
ui (αi , α−i ) = C (α )gi (αi ) (3.1)

gi (αi ) if (αi , α−i ) ∈ C,
= (3.2)
0 otherwise,
where C is the indicator function, α−i is a vector consisting of other players’ rates,
i.e., α−i = [α1 , . . . , αi−1 , αi+1 , . . . , αN ], and gi is a positive and strictly increasing
function for each fixed α−i . Since the game is subject to coupled constraints, the
action set Ai is coupled and dependent on other players’ actions. Given the strategy
profile α−i of other players, the constrained action set Ai is given by
Ai (α−i ) := {αi ∈ [0,C{i} ], (αi , α−i ) ∈ C}. (3.3)

We then have an asymmetric game. The minimum rate that user i can guarantee in
the feasible regions is ri,N , which is different than r j,N .
Each user i maximizes ui (αi , α−i ) over the coupled constraint set. Owing to the
monotonicity of the function gi and the inequalities that define the capacity region,
we obtain the following lemma.
Lemma 3.1. Let BRi (α−i ) be the best reply to the strategy α−i , defined by
BRi (α−i ) = arg max ui (y, α−i ).

y∈Ai (α −i )
BRi is a nonempty single-valued correspondence (i.e., a standard function) and is

given by

max ri,N , min CΩ −
Ω ∈Γi
∑ αk , (3.4)
k∈Ω \{i}
where Γi = {Ω ∈ 2N , i ∈ Ω }.
Proposition 3.1. The set of NEs is

(αi , α−i ) | αi ≥ ri,N , ∑ αi = CN .
i∈N
All these equilibria are optimal in the Pareto sense.

Proof. Let β be a feasible solution, i.e., β ∈ C. If ∑Ni=1 βi < CN = log(1+ ∑i∈N Pi hi
σ02
),
then at least one of the users can improve his or her rate (and, hence, payoff) to reach
one of the faces of the capacity region. We now check the strategy profile on the face

N
(αi , α−i ) αi ≥ ri,N , ∑ αi = CN .

i=1

If β ∈ {(αi , α−i ) αi ≥ ri,N , ∑Ni=1 αi = CΩ }, then, from Lemma 3.1, BRi (β−i ) =
{βi }. Hence, β is a strict equilibrium. Moreover, this strategy β is Pareto optimal
because the rate of each user is maximized under the capacity constraint. These
strategies are social welfare optimal if the total utility ∑Ni=1 ui (αi , α−i ) = ∑Ni=1 gi (αi )
is maximized subject to constraints.

Note that the set of pure NEs is a convex subset of the capacity region.
48 Q. Zhu et al.
3.2.3 Robust Equilibria and Efficiency Measures
3.2.3.1 Constrained Strong Equilibria and Coalition Proofness
An action profile in a local interaction between N senders is a constrained k-strong

equilibrium if it is feasible and no coalition of size k can improve the rate transmis-
sions of each of its members with respect to the capacity constraints. An action is
a constrained strong equilibrium [18] if it is a constrained k-strong equilibrium for
any size k. A strong equilibrium is then a policy from which no coalition (of any
size) can deviate and improve the transmission rate of every member of the coalition
(group of the simultaneous moves) while possibly lowering the transmission rate of
users outside the coalition group. This notion of constrained strong equilibria1 is
very attractive because it is resilient against coalitions of users. Most games do not
admit any strong equilibrium, but in our case we will show that the multiple-access
channel game has several strong equilibria.
Theorem 3.1. Any rate profile on the maximal face of the capacity region C:

N
Facemax (C) := (αi , α−i ) ∈ RN αi ≥ rN , ∑ αi = CN
i=1
is a constrained strong equilibrium.

Proof. If the rate profile α is not on the maximal face of the capacity region, then
α is not resilient to deviation by a single user. Hence, α cannot be a constrained
strong equilibrium. This shows that a necessary condition for a rate profile to be
a strong equilibrium is to be in the subset Facemax (C). We now prove that the
condition α ∈ Facemax (C) is sufficient. Let α ∈ Facemax (C). Suppose that k users
deviate simultaneously from the rate profile α . Denote by Dev the set of users that
deviate simultaneously (eventually by forming a coalition). The rate constraints of
the deviants are
1. αi ≥ 0, ∀i ∈ Dev;
2. ∑i∈Ω̄ αi ≤ CΩ̄ , ∀Ω̄ ⊆ Dev;
3. ∑i∈Ω ∩Dev αi ≤ CΩ − ∑i∈Ω ,i∈Dev
/ αi , ∀Ω ⊆ N, Ω ∩ Dev
= 0.
/
In particular, for Ω = N we have ∑i∈Dev α i ≤ CN − ∑i∈Dev
/ αi . The total rate of the
deviants is bounded by CN − ∑i∈Dev
/ αi , which is not controlled by the deviants. The
deviants move to (αi )i∈Dev with
∑ αi < CN − ∑ αi .
i∈Dev i∈Dev
/
1 Note thatthe set of constrained strong equilibria is a subset of the set of NEs (by taking coalitions
of size one) and any constrained strong equilibrium is Pareto optimal (by taking coalition of full
size).
Then there exists i such that αi > αi . Since gi is nondecreasing, this implies that
gi (αi ) > gi (αi ). A user i who is a member of coalition Dev does not improve his or
her payoff. If the rates of some of the deviants are increased, then the rates of some
other users from the coalition must decrease. If (αi )i∈Dev satisfies
∑ αi = CN − ∑ αi ,
i∈Dev i∈Dev
/
then some users in coalition Dev have increased their rates compared with (αi )i∈Dev
while others in Dev have decreased their rates of transmission (because the total rate
is the constant CN − ∑i∈Dev
/ αi ). The users in Dev with a lower rate αi ≤ αi do not
benefit by being a member of the coalition (the Shapley criterion of membership of
coalition does not hold). And this holds for any 0/ Dev N. This completes the
proof.

Corollary 3.1. In the constrained rate allocation game, NEs and strong equilibria
in pure strategies coincide.
3.2.3.2 Constrained Potential Function for Local Interaction
We introduce the following function:
N
W (α ) =C (α ) ∑ gi (αi ) ,
i=1
where C is the indicator function of C, i.e., C (α ) = 1 if α ∈ C and 0 otherwise.

Function W satisfies
W (α ) − W (βi , α−i ) = gi (αi ) − gi (βi ), ∀α , (βi , α−i ) ∈ C.
If gi is differentiable, then one has
∂ ∂
W (α ) = gi (αi ) = ui
∂ αi ∂ αi
in the interior of the capacity region C, and W is a constrained potential function [3]
in pure strategies.
Corollary 3.2. The local maximizers of W in C are pure NEs. Global maximizers
of W in C are both constrained strong equilibria and social optima for the local
interaction.
50 Q. Zhu et al.
3.2.3.3 Strong Price of Anarchy
Throughout this subsection, we assume that the functions gi are the identity
function, i.e., gi (x) = id(x) := x. One metric used to measure how much the
performance of decentralized systems is affected by the selfish behavior of its
components is the price of anarchy (PoA). We present a similar price for strong
equilibria under the coupled rate constraints. This notion of PoA can be seen as
an efficiency metric that measures the price of selfishness or decentralization and
has been extensively used in the context of congestion games or routing games
where typically users have to minimize a cost function [37, 38]. In the context of
rate allocation in the multiple-access channel, we define an equivalent measure of
PoA for rate maximization problems. One of the advantages of a strong equilibrium
is that it has the potential to reduce the distance between the optimal solution and
the solution obtained as an outcome of selfish behavior, typically in cases where the
capacity constraint is violated at each time. Since the constrained rate allocation
game has strong equilibria, we can define the strong price of anarchy (SPoA),
introduced in [12], as the ratio between the payoff of the worst constrained strong
equilibrium and the social optimum value which CN .
Theorem 3.2. The SPoA of the constrained rate allocation game is 1 for gi (x) = x.
Note that for gi = id, the constrained SPoA (CSPoA) can be less than one. However,
the optimistic PoA of the best constrained equilibrium, also called the price of
stability [13], is one for any function gi , i.e., the efficiency of the “best” equilibria is
100 %.
3.2.4 Selection of Pure Equilibria
We showed in the previous sections that our rate allocation game has a continuum
of pure NEs and strong equilibria. We address now the problem of selecting
one equilibrium that has a certain desirable property: the normalized pure NE,
introduced in [26]; see also [20, 22, 28]. We introduce the problem of constrained
maximization faced by each user when the other rates are at the maximal face of
polytope C:
max ui (α ) (3.5)
α
s.t.α1 + · · · + αN = CN , (3.6)
for which the corresponding Lagrangian for user i is given by

N
Li (α , ζ ) = ui (α ) − ζi ∑ αi − CN .
i=1
From Karush–Kuhn–Tucker optimality conditions it follows that there exists ζ ∈ RN

such that
N
gi (αi ) = ζi , ∑ αi = CN .
i=1
For a fixed vector ζ with identical entries, define the normal form game Γ (ζ )
with N users, where actions are taken as rates and the payoffs given by L(α , ζ ).
A normalized equilibrium is an equilibrium of the game Γ (ζ ∗ ), where ζ ∗ is
normalized into the form ζi∗ = τci for some c > 0, τi > 0. We now have the following
result due to Goodman [20], which implies Rosen’s condition on uniqueness for
strict concave games.
Theorem 3.3. Let ui be a smooth and strictly concave function in αi , with each ui
convex in α−i , and let there exist some ζ such that the weighted nonnegative sum
of the payoffs ∑Ni=1 ζi ui (α ) is concave in α . Then, the matrix G(α , ζ ) + GT (α , ζ )
is negative definite (which implies uniqueness), where G(α , ζ ) is the Jacobian with
respect to α of
h(α , ζ ) := [ζ1 ∇1 u1 (α ), ζ2 ∇2 u2 (α ), . . . , ζN ∇N uN (α )]T
and GT is the transpose of matrix G.

This now leads to the following corollary for our problem.
Corollary 3.3. If gi are nondecreasing strictly concave functions, then the rate
allocation game has a unique normalized equilibrium that corresponds to an
equilibrium of the normal form game with payoff L(α , ζ ∗ ) for some ζ ∗ .
3.2.5 Stability and Dynamics
In this subsection, we study the stability of equilibria and several classes of evolu-
tionary game dynamics under a symmetric case, i.e., Pi = P, hi = h, gi = g, Ai = A,
∀i ∈ N. We will drop subscript index i where appropriate. We show that the
associated evolutionary game has a unique pure constrained ESS.

C C
Proposition 3.2. The collection of rates α = NN , . . . , NN , i.e., the Dirac distri-
CN
bution concentrated on the rate N , is the unique symmetric pure NE.
Proof. Since the constrained rate allocation game is symmetric, there exists a
symmetric (pure or mixed) NE. If such an equilibrium exists in pure strategies, each
user transmits with the same rate r∗ . It follows from Proposition 3.1 of Sect. 3.2.2
and the bound rN ≤ NN that r∗ satisfies Nr∗ = CN and r∗ is feasible.
C

Since the set of feasible actions is convex, we can define a convex combination
of rates in the set of feasible rates. For example, εα + (1 − ε )α is a feasible rate if
52 Q. Zhu et al.
α and α are feasible. The symmetric rate profile (r, r, . . . , r) is feasible if and only
if 0 ≤ r ≤ r∗ = NN . We say that rate r is a constrained ESS if it is feasible and for
C
every mutant strategy mut = α there exists εmut > 0 such that

rε := ε mut + (1 − ε )r ∈ C ∀ε ∈ (0, εmut )
u(r, rε , . . . , rε ) > u(mut, rε , . . . , rε ) ∀ε ∈ (0, εmut )
Theorem 3.4. The pure strategy r∗ =

CN
N is a constrained ESS.
Proof. Let mut ≤ r∗ The rate ε mut+ (1 − ε )r∗ is feasible, which implies that mut ≤
r∗ (because r∗ is the maximum symmetric rate achievable). Since mut = r∗ , mut is
∗ ∗
strictly lower than r . By monotonicity of the function g, one has u(r , ε mut + (1 −
ε )r∗ ) > u(mut, ε mut + (1 − ε )r∗), ∀ε . This completes the proof.

3.2.5.1 Symmetric Mixed Strategies
Define the mixed capacity region M(C) as the set of measure profile
(μ1 , μ2 , . . . , μN ) such that

|Ω |
R+
∑ αi μi (dαi ) ≤ CΩ , ∀Ω ⊆ 2N .
i∈Ω i∈Ω
Then, the payoff of the action a ∈ R+ satisfying (a, λ , . . . , λ ) ∈ M(C) can be

defined as

F(a, μ ) = u(a, b2 , . . . , bN ) νN−1 (db) , (3.7)
[0,∞[N−1

where νk = k1 μ is the product measure on [0, ∞[k . The constraint set becomes the
set of probability measures on R+ such that

CN
0 ≤ E(μ ) := αi μ (dαi ) ≤ < C{1} .
R+ N
Lemma 3.2. The payoff can be obtained as follows:

F(a, μ ) = [0,CN −(N−1)E(μ )] ×g(a) × νN−1 (db)
b∈Da
= [0,CN −(m−1)E(μ )] ×g(a)νN−1 (Da ),
where Da = {(b2 , . . . , bN ) | (a, b2 , . . . , bN ) ∈ C} .

Proof. If the rate does not satisfy the capacity constraints, then the payoff is 0.
Hence the rational rate for user i is lower than C{i} . Fix a rate a ∈ [0,C{i} ]. Let
DaΩ := CΩ − aδ{1∈Ω } . Then, a necessary condition to have a nonzero payoff is
N
(b2 , . . . , bN ) ∈ Da , where Da = {(b2 , . . . , bN ) ∈ RN−1 =1 bi ≤ DΩ , Ω ⊆ 2 }.
+ , ∑i∈Ω ,i
a
Thus, we have

F(a, μ ) = u(a, b2 , . . . , bN ) νN−1 (db)
RN−1
+

= g(a) νN−1 (db)
b∈RN−1
+ , (a,b)∈C

= [0,CN −(N−1)E(μ )] g(a) × νN−1 (db).

b∈Da
3.2.5.2 Constrained Evolutionary Game Dynamics
The class of evolutionary games in a large population provides a simple frame-

work for describing strategic interactions among large numbers of users. In this
subsection, we turn to modeling the behavior of the users who play such games.
Traditionally, predictions of behavior in game theory are based on some notion of
equilibrium, typically Cournot equilibrium, Bertrand equilibrium, NE, Stackelberg
solution, Wardrop equilibrium, or some refinement thereof. These notions require
the assumption of equilibrium knowledge, which posits that each user correctly
anticipates how his opponents will act. The equilibrium knowledge assumption is
too strong and is difficult to justify, in particular in contexts with large numbers
of users. As an alternative to the equilibrium approach, we propose an explicitly
dynamic updating choice, a procedure in which users myopically update their
behavior in response to their current strategic environment. This dynamic procedure
does not assume the automatic coordination of users’ actions and beliefs, and it
can derive many specifications of users’ choice procedures. These procedures are
specified formally by defining a revision of rates called revision protocol [27].
A revision protocol takes current payoffs and the current mean rate and maps to
conditional switch rates, which describe how frequently users in some class playing
rate α who are considering switching rates switch to strategy α . Revision protocols
are flexible enough to incorporate a wide variety of paradigms, including ones based
on imitation, adaptation, learning, optimization, etc.
We use here a class of continuous evolutionary dynamics. We refer to [14, 30, 31]
for evolutionary game dynamics with or without time delays. The normalized
continuous-time evolutionary game dynamics on the measure space (A, B(A), μ )
is given by

λ̇t (E) = V (x, λt )μ (dx), (3.8)
x∈E
where

λt (E)
V (x, λt ) = K βxa (λt )μ (da) − βxa (λt )μ (da) ,
a∈A μ (E) a∈A
54 Q. Zhu et al.
βxa represents the rate of mutation from a to x, and K is a growth parameter.

βax (λt ) = 0 if (x, λt ) or (a, λt ) is not in the (mixed) capacity region, and E is
a μ -measurable subset of A. At each time t, probability measure λt satisfies
dt λt (A) =0.
d
We examine the Brown–von Neumann–Nash dynamics, Smith dynamics, and

replicator dynamics, where F(a, λt ) is the payoff, as given in (3.7).
[RD-1] Constrained Brown–von Neumann–Nash dynamics.

max(F(a, λt ) − x F(x, λt ) dx, 0) if (a, λt ), (x, λt ) ∈ M(C),
βax (λt ) =
0 otherwise.
As we can see, βxa is independent of a. Thus, in the unconstrained case, the first
double integral becomes

βxa (λt )μ (dx)μ (da) = βxa (λt )μ (dx),
a∈A x∈E x∈E
and the second term

λt (E)
βax (λt )μ (dx)μ (da) = λt (E) βax (λt )μ (da).
μ (E) a∈A x∈E a∈A
The difference between the two terms gives rise to

λ̇t (E) = βxa (λt )μ (dx) − λt (E) βax (λt )μ (da).
x∈E a∈A
The unnormalized continuous-time evolutionary game dynamics on the measure

space (A, B(A), μ ) is given by

λ̇t (E) = V (x, λt )λt (dx), (3.9)
x∈E
where

V (x, λt ) = K βxa (λt )λt (da) − βax (λt )λt (da) .
a∈A a∈A
[RD-2] Constrained replicator dynamics. Let

max(F(a, λt ) − F(x, λt ), 0) if (a, λt ), (x, λt ) ∈ M(C),
βax (λt ) =
0 otherwise.
In the unconstrained case, we can easily check that

V (x, λt ) = K [max(F(x, λt )−F(a, λt ), 0)− max(F(a, λt )−F(x, λt ), 0)]λt (da),
a∈A
which has the replicator form. Using the fact that
[max(F(x, λt ) − F(a, λt ), 0) − max(F(a, λt ) − F(x, λt ), 0)] = F(x, λt ) − F(a, λt ),
we obtain

V (x, λt ) = K [F(x, λt )−F(a, λt )]λt (da)=K F(x, λt )− F(a, λt )λt (da) ,
a∈A a∈A
where the difference between the payoff at x and the average payoff.
[RD-3] Constrained θ -Smith dynamics.

max(F(a, λt ) − F(x, λt ), 0)θ if (a, λt ), (x, λt ) ∈ M(C)
βax (λt ) = , θ ≥ 1.
0 otherwise
A common property that applies to all these dynamics is that the set of NEs is
a subset of rest points (stationary points) of the evolutionary game dynamics. Here
we extend the concepts of these dynamics to evolutionary games with a continuum
action space and coupled constraints, and interactions with more than two users.
The counterparts of these results in discrete action space can be found in [21, 27].
Theorem 3.5. Any NE of a game is a rest point of the following evolutionary game
dynamics: constrained Brown–von Neumann–Nash, generalized Smith dynamics,
and replicator dynamics. Futhermore, the ESS set is a subset of the rest points of
these constrained evolutionary game dynamics.
Proof. It is clear for pure equilibria using the revision protocols β of these
dynamics. Let λ be an equilibrium. For any rate a in the support of λ , βxa = 0
if F(x, λ ) ≤ F(a, λ ). Thus, if λ is an equilibrium, then the difference between
the microscopic inflow and outflow is V (a, λ ) = 0, given that a is the support of
measure λ . The last assertion follows from the fact that the ESS (if it exists) is an
equilibrium that is a rest point of these dynamics.

Let λ be a finite Borel measure on [0,C{i} ] with full support. Suppose g is
continuous on [0,C{i} ]. Then λ is a rest point of the Brown–von Neumann–Nash
dynamics if and only if λ is a symmetric NE. Note that the choice of topology is
an important issue when defining the convergence of dynamics and the stability
of the dynamics. The most used topology in this area is the topology of the
weak convergence to measure the closeness of two states of the system. Different
56 Q. Zhu et al.
Fig. 3.3 Signaling between multiple senders and a receiver
distances (Prohorov metric, metric on bounded and Lipschitz continuous functions

on A) have been proposed. We refer the reader to [24] and references therein for
more details on evolutionarily robust strategy and stability notions.
3.2.6 Correlated Equilibrium
In this subsection, we analyze constrained correlated equilibria of multiple-access

control (MAC) games. Building on the signaling in the one-shot game, we formulate
a system of evolutionary MAC games with evolutionary game dynamics that
describe the evolution of signaling, beliefs, rate control, and channel selection.
We focus on correlated equilibrium in the single-receiver case. Correlated
strategies are based on signaling structures before decisions are made on rates.
Different scenarios (with or without a mediator, virtual mediator, or cryptographic
multistage signaling structure) have been proposed in the literature [15–17, 35].
Figure 3.3 illustrates the signaling between multiple transmitters and one
receiver. The receiver can act as a signaling device to mediate the behaviors of the
transmitters. The correlated equilibrium has a strong connection with cryptography
in that the private signal sent to the users can be realized by the coding and decoding
in the network [35].
Let B be the set of signals β = [βi , β−i ] ∈ RN . The values β from the set of
signals need to be in the feasible set C ⊂ RN . Let μ ∈ Δ B be a probability measure
over the set B. A constrained correlated equilibrium (CCE) μ ∗ needs to satisfy the
following set of inequalities:

dμ ∗ (βi , β−i ) ui (αi , α−i , | βi ) − ui(αi , α−i | βi ) ≥ 0, ∀i ∈ N, αi ∈ Ai (α−i ).
Define a rule of assignment of user i as a map from his signals to his action set
r̄i : βi
−→ αi . A CCE is then characterized by

dμ ∗ (β ) [ui (αi , α−i | βi )−ui (ri (βi ), α−i )]
≥ 0, ∀i∈N, ∀ri such that r̄i (.) ∈ Ai (α−i ). (3.10)
Theorem 3.6. The set of constrained pure NEs of the MISO game is given by

max-face(C) = (α1 , . . . , αN ) | αi ≥ 0, ∑ αk = CN .
k∈N
We can characterize the CCE using the preceding results as follows.

Lemma 3.3. Any mixture of constrained pure NEs of the MISO game is a CCE.
Note that the set of CCEs is bigger than the set of constrained NEs. For example, in
a two-user case, the distribution 12 δ(r1 ,C{1,2} −r1 ) + 12 δ(C{1,2} −r2 ,r2 ) is different than the
Dirac distribution δ r1 +C{1,2} −r2 r2 +C{1,2} −r1 .
( 2 , 2 )
Proof. Let μ be a probability distribution over some constrained pure equilibria.

Then μ ∈ Δ (max-face(C)). For any β such that μ (β )
= 0 one has
[ui (αi , α−i | βi ) − ui (r̄i (βi ), α−i )] ≥ 0
for any measurable function r̄i : [0,C{i} ] −→ [0,C{i} ]. Thus, μ is a CCE.

Corollary 3.4. Any convex combination of extreme points of the convex compact set

max-face(C) = α = (α1 , . . . , αN ) αi ≥ ri , ∑ αk = CN
k∈N
is a CCE. Moreover, any probability distribution over the maximal face of the
capacity region max-face(C) is a CCE distribution.
3.3 Hybrid AWGN Multiple-Access Control
In this section, we extend the single-receiver case to one with multiple receivers. The
multi-input and multioutput (MIMO) channel access game has been studied in the
context of power allocation and control. For instance, the authors in [6] formulate
a two-player zero-sum game where the first player is the group of transmitters and
the second one is the set of MIMO subchannels. In [5], the authors formulate an N-
person noncooperative power allocation game and study its equilibrium under two
different decoding schemes.
58 Q. Zhu et al.
Here, we formulate a hybrid multiple-access game where users are allowed

to select their rates and channels under capacity constraints. We first obtain
general results on the existence of the equilibrium and methods to characterize it.
In addition, we investigate the long-term behavior of the strategies and apply
evolutionary game dynamics to both rates and channel selection probabilities.
We show that G-function-based dynamics is appropriate for our hybrid model by
viewing the channel selection probabilities as strategies that determine the fitness
of rate selection. Using the generalized Smith dynamics for channel selection, we
are able to build an overall hybrid evolutionary dynamics for the static model.
Based on simulations, we confirm the validity of the proposed dynamics and the
correspondence between the rest point of the dynamics and the NE.
3.3.1 Hybrid Model with Rate Control and Channel Selection
In this subsection, we establish a model for multiple users and multiple receivers.
Each user needs to decide the rate at which to transmit and the channel to pick.
We formulate a game Ξ = N, (Ai )i∈N , (U i )i∈N , in which the decision variable
is (αi , pi ), and pi = [pi j ] j∈J is a J-dimensional vector, where pi j is the probability
that user i ∈ N will choose channel j ∈ J and pi j needs to satisfy the probability
measure constraints
∑ pi j = 1, pi j ≥ 0, ∀i ∈ N. (3.11)
j∈J
The game Ξ is asymmetric in the sense that the sets of strategies of the users are
different and the payoffs are not symmetric.
P h
Let C j,Ω := log(1 + ∑i∈Ω iσj 2i j ) be the capacity for a subset Ω ⊆ N of users at
0
Pi j hi j
receiver j ∈ J and ri j,Ω := log(1 + σ 2 + ) the bound rate of a user i
0 ∑i ∈Ω ,i =i Pi j hi j
when the signals of the |Ω | − 1 other users are treated as noise at receiver j. Each
receiver j has a capacity region C( j) given by

C( j) = (α , p j ) ∈ [0, 1] N
× RN+
∑ αi pi j ≤ C j,Ω , ∀ 0/ ⊂ Ω j ⊆ N, j∈J .
i∈N
(3.12)
The expected payoff function U i : ∏Ni=1 Ai −→ R+ of the game is given by
U i (αi , pi , α−i , p−i ) = Epi [ui j (α , P)] = ∑ pi j ui j (α , P), (3.13)

j∈J
where α = (αi , α−i ) ∈ RN+ and P = (pi , p−i ) ∈ [0, 1]N×J , with pi ∈ [0, 1]J , p−i ∈
[0, 1](N−1)×J . Assume that the utility ui j of a user i transmitting to receiver j is only
dependent on the user himself and is described by a positive and strictly increasing
function gi : R+ → R+ , i.e., ui j = gi , ∀ j ∈ J, when capacity constraints are satisfied.
With the presence of coupled constraints (3.12) from each receiver and proba-
bility measure constraint (3.11), each sender has his own individual optimization
problem (IOP) given as follows:
max ∑ j∈J pi j gi (αi pi j );

pi ,αi
s.t. ∑ j∈J pi j = 1, ∀i ∈ N;
pi j ≥ 0, ∀i ∈ N, j ∈ J;
(α , p j ) ∈ C( j), ∀ j ∈ J.
Denote the feasible set of IOP by F = F1 × F2 , where

N
F1 = α ∈ R+ (α , P) ∈ ∩ j∈J C( j), P ∈ F2 , (3.14)

F2 = P∈R N×J
∑ pi j = 1, pi j ≥ 0, ∀i ∈ N, j ∈ J . (3.15)
j∈J
The action set of each user can thus be described by
Ai (α−i , p−i ) = {(αi , α−i ) ∈ F1 , (pi , p−i ) ∈ F2 } . (3.16)
3.3.1.1 An Example
Suppose we have three users and three receivers, that is, N = {1, 2, 3} and
J = {1, 2, 3}. The capacity region at receiver 1 is given by
⎧ ⎫
⎪
⎪ αi ≥ 0, i ∈ {1, 2, 3} ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ p11 α1 ≤ log(1 + Pσ1 h21 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
0
⎪
⎪
⎪
⎪ p21 α2 ≤ log(1 + Pσ2 h22 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
0 ⎪
⎪
⎪
⎪ p31 α3 ≤ log(1 + Pσ3 h23 ) ⎪
⎪
⎪
⎪ ⎪
⎪
⎨ 0 ⎬
C(1) = p11 α1 + p21 α2 ≤ log(1 + P1 h1σ+P 2 h2
) .
⎪
⎪
2
⎪
⎪
⎪ ⎪
0
⎪
⎪ α + α ≤ + P1 h1 +P2 h2
) ⎪
⎪
⎪
⎪ p 11 1 p 31 3 log(1 σ 2 ⎪
⎪
⎪
⎪ 0 ⎪
⎪
⎪
⎪ P1 h1 +P2 h2 ⎪
⎪
⎪
⎪ p 21 α 2 + p 31 α 3 ≤ log(1 + ) ⎪
⎪
⎪
⎪ σ02
⎪
⎪
⎪
⎪ P1 h1 +P2 h2 +P3 h3 ⎪⎪
⎪
⎪ p α + p α + p α ≤ log(1 + ) ⎪
⎪
⎪
⎪
11 1 21 2 31 3 σ02 ⎪
⎪
⎪
⎩ ⎪
⎭
0 ≤ pi1 ≤ 1, i ∈ {1, 2, 3}
60 Q. Zhu et al.
This can be rewritten as

⎧ ⎡ ⎤⎫
⎪
⎪ C1,{1} ⎪ ⎪
⎪
⎪ ⎢ C ⎥⎪
⎪
⎪
⎪ 1,{2} ⎥⎪
⎪
⎪ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎢ ⎥
⎪
⎪
⎪
⎪
⎨ p11 α1 p α ⎢ C ⎥⎪
⎬
11 1
⎢
1,{3}
⎥
C(1) = p1 = ⎣ p21 ⎦ ∈ [0, 1]3 , ⎣ α2 ⎦ ∈ R3+ M3 ⎣ p21 α2 ⎦ ≤ ⎢ C1,{1,2} ⎥ ,
⎪ ⎢ ⎥⎪
⎪
⎪ α3 p31 α3 ⎢ C1,{1,3} ⎥⎪ ⎪
⎪
⎪
p31
⎢ ⎥⎪
⎪
⎪
⎪ ⎣ C1,{2,3} ⎦⎪ ⎪
⎪
⎪ ⎪
⎪
⎩ ⎭
C1,{1,2,3}

Pi1 hi1
where C1,Ω = log 1 + ∑i∈Ω σ02
and M3 is a totally unimodular matrix: M3 :=
⎡ ⎤
1 0 0
⎢0 1 0⎥
⎢ ⎥
⎢0 1⎥
⎢ 0 ⎥
⎢ ⎥
⎢1 1 0 ⎥ . Capacity regions at receivers 2 and 3 can be obtained in a similar way.
⎢ ⎥
⎢1 0 1⎥
⎢ ⎥
⎣0 1 1⎦
1 1 1
3.3.2 Characterization of Constrained Nash Equilibria
In this subsection, we characterize the NEs of the defined game Ξ under the given
capacity constraint. We use the following theorem to prove the existence of an NE
for the case where the rates are predetermined; this result is then used to solve the
game for the case where both the rates and the connection probabilities are (joint)
decision variables.
Theorem 3.7 (Başar and Olsder [34]). Let A = A1 × A2 · · · × AN be a closed,
bounded, and convex subset of RN , and for each i ∈ N let the payoff functional U i :
A → R be jointly continuous in A and concave in ai for every a j ∈ A j , j ∈ N, j
= i.
Then, the associated N-person non-zero-sum game admits an NE in pure strategies.
Applying Theorem 3.7, we have the following results immediately.
Proposition 3.3. Suppose αi , i ∈ N, are predetermined feasible rates. Let feasible
set F be closed, bounded, and convex. If gi in the IOP are continuous on F and
concave in pi (without the assumption of their being positive and strictly increasing)
and the expected payoff functions U i : RN+ × [0, 1]N×J → R are concave in pi and
continuous on F , then the static game admits an NE.
The existence result in Proposition 3.3 only captures the case where the rates
αi are predetermined, and it relies on the convexity requirement of the utility
functions gi . We can actually obtain a stronger existence result by observing that

the formulated game Ξ is a potential game with a potential function given by
Ψ (α , P) = ∑ fi (αi , pi ) = ∑ ∑ pi j gi (αi pi j ), (3.17)

i i j
where fi = ∑ j pi j gi (αi pi j ), the expected payoff to user i. This is captured by the

proposition below.
Proposition 3.4. The hybrid rate control game Ξ admits an NE.
Proof. Let us formulate a centralized optimization problem (COP) as follows:
maxα ,P Ψ (α , P),
s.t. (α , P) ∈ F .
Using the result in [3], we can conclude that if there exists a solution to the COP,
then there exists an NE to the game Ξ . Since F is compact and nonempty, and the
objective function is continuous, then there exists a solution to the COP and thus an
NE to the game.

The foregoing problem is generally not convex, and the uniqueness of the NE
may not be guaranteed. However, we can still further characterize the NE through
the following propositions.
Proposition 3.5. Let βi j := αi pi j . Without predetermining α , suppose that
(p−i , α−i ) is feasible. A best response strategy at receiver j ∈ J for user i must
satisfy
0 ≤ pi j αi ≤ C j,Ω j − ∑ αk pk j , ∀Ω j , (3.18)
=i
k
where Ω j := {Ω ∈ 2N | i ∈ Ω , pi j > 0} is the set of users transmitting to receiver j.

Since gi is a nondecreasing function, the best correspondence at j is

βi∗j = αi∗ p∗i j = max ri j,N , min C j,Ω j − ∑ αi pi j , (3.19)
Ωj
i
=i
where ri j,N is the bound on the rate of user i when signals of |N| − 1 other users
are treated as noise.
Proof. The proof is immediate by observing that the rate of user i at receiver j must
satisfy (3.18) due to the coupled constraints. Thus, the maximum rate that user i can
use to transmit to receiver j without violating the constraints is clearly the minimum
of C j,Ω j − ∑i =i αi pi j over all Ω j . Since the payoff is a nondecreasing function, the
best response for i at receiver j is given by (3.19).

62 Q. Zhu et al.
Proposition 3.6. Let Ki∗ = arg max j∈J gi j (βi j ). If Ki∗ = {k∗ } is a singleton, then
the best response for user i is to choose

pi j = 1 if j = k∗ ,
pi j = 0 otherwise,
β
and we can determine αi by αi = pik∗∗ .
ik
If |Ki∗ | ≥ 2, then the best response correspondence is

pi ∈ Δ (Ki∗ ) if j ∈ Ki∗ ,
0 otherwise.
We can determine αi from βi j by αi = ∑ j∈Ki∗ βi j .

Proof. Since the expected utility is given in the form of
Ui (αi , pi , α−i , p−i ) = Epi [ui j (αi pi j )],
the expected utility under the best response is Ui = Epi [ui j (βi j )]. If for a singleton k∗
such that k∗ = arg maxi∈N gi j (βi j ), we can assign all the weight pik∗ = 1 to maximize
the expected utility. Since βik∗ = αi pik∗ , then α = βik∗ /pik∗ = βik∗ . If the set K ∗ is
not a singleton, then without loss of generality, we can pick two indices j, j ∈ K ∗
such that βi j = αi pi j and βi j = αi pi j , leading to ui j (βi j ) = ui j (βi j ). Since the
utilities to transmit using j and j are the same, we can assign an arbitrary (two-
point) distribution, pi j and pi j , over them, with pi j + pi j = 1. Therefore, βi j + βi j =
αi (pi j + pi j ) = αi .

3.3.3 Multiple-Access Evolutionary Games
Interactions among users are dynamic, and users can update their rates and channel
selection with respect to their payoffs and the known coupled constraints. Such a
dynamic process can generally be modeled by an evolutionary process, a learning
process, or a trial-and-error updating process. In classical game theory, the focus
is on strategies that optimize payoffs to the players, whereas in evolutionary game
theory, the focus is on strategies that will persist through time. In this subsection,
we formulate evolutionary game dynamics based on the static game discussed
in Sect. 3.3.1. We use generalized Smith dynamics for channel selection and G-
function-based dynamics for rates. Combining them, we set up a framework of
hybrid dynamics for the overall system.
The action of each user has two components (αi , pi ) ∈ R+ × [0, 1]J . We use
pi as strategies that determine the fitness of user i’s rate αi to receiver j. The
rate selection evolves according to the channel selection strategy P. We may view
channel selection as an inner game that involves a process on a short time scale, but
rate selection is an outer game that represents the dynamical link via fitness on a
longer time scale [7, 8].
3.3.3.1 Learning the Weight Placed on Receiver
Let α be a fixed rate on the capacity region. We assume that user i occasionally tests
the weights pi j with alternative receivers, keeping the new strategy if and only if it
leads to a strict increase in payoff. If the choice of receivers’ weights of some users
decreases the payoff or violates the constraints due to a strategy change by another
user, then the user starts a random search for a new strategy, eventually settling on
one with a probability that increases monotonically with its realized payoff. For
the foregoing generating-function-based dynamics, the weight of switching from
receiver j to receiver j is given by
η ij j (α , P) = max(0, ui j (α , P) − ui j (α , P))θ , θ ≥ 1,
if the payoff obtained at receiver j is greater than the payoff obtained at receiver j
and the constraints are satisfied; otherwise, η ij j (p, α ) = 0. The frequency of uses of
each receiver is then seen as the selection strategy of receivers.
The expected change at each receiver is the difference between the incoming and
the outgoing flows. The dynamics is also called generalized Smith dynamics [2] and
is given by
ṗi j (t) = ∑

pi j (t)η ij j (α , P(t)) − pi j (t) ∑ η ij j (α , P(t)). (3.20)
j ∈J j ∈J
Let χi j (α , P(t)) := ∑ j ∈J pi j (t)η ij j (α , P(t))− pi j (t) ∑ j ∈J η ij j (α , P(t)). Hence,

the dynamics can be rewritten as ṗi j = χi j (α , P(t)). For θ = 1 the dynamics is
known as Smith dynamics and has been used to describe the evolution of road traffic
congestion in which the fitness is determined by the strategies chosen by all drivers.
It has also been studied in the context of resource selection in hybrid systems and a
migration constraint problem in wireless networks in [2].
Proposition 3.7. Any equilibrium of the game Ξ with predetermined rates is a rest
point of the generalized Smith dynamics (3.20).
Proof. The transition rate between receivers preserves the sign in the sense that,
for every user, the incoming flow from receiver j to j is positive if and only if
the constraints are satisfied and the payoff to j exceeds the payoff to j . Let α
be a feasible point. If the right-hand side of (3.20) is nonzero for some splitting
strategy P, then
d := ∑ ṗi j ui j (α , P) = ∑ χi j ui j (α , P)
j∈J j∈J
!
= ∑

pi j ui j (α , P) − ui j (α , P) η ij j
j, j ∈J
!
= ∑

pi j max 0, ui j (α , P) − ui j (α , P) η ij j ,
j, j ∈J
64 Q. Zhu et al.
which is strictly positive. Thus, if (α , P) is an NE, then (α , P) satisfy the

constraints, and pi j = 0, or η ij j (α , P) = 0. This implies that (α , P) satisfies also
χ (α , P)) = 0.

The following proposition says that the equilibria are exactly the rest points
of (3.20).
Proposition 3.8. Any rest point of the dynamics (3.20) is an NE of the game Ξ .
The proof of Proposition 3.8 can be obtained using Theorem III in [2]. Since the
probability of switching from receiver j to j is proportional to η ij j , which preserves
the sign of payoff difference, we can use Theorems III in [2]. It follows that the
dynamics generated by η satisfy the Nash stationarity property.
3.3.3.2 G-Function-Based Dynamics
We introduce here the generating fitness function (G-function)-based dynamics

with projection onto the capacity region. The G-function approach has been
successfully applied to nonlinear continuous games by Vincent and Brown [7, 8].
It is appropriate for our hybrid model because we can regard the channel selection
as the variables in a fitness function. Users choose channel selection probabilities
to try to increase the fitness of their rate choice. In our rate allocation game, to deal
with constraints, we use projection onto capacity regions to maintain the feasibility
of the trajectories. Starting from a point in the polytope C, users revise and update
their strategies according to a rate proportional to the gradient and its payoff subject
to the capacity constraints. Let Gi j be the fitness-generating function of user i at
receiver j defined on RN × RN×J satisfying

Gi j (v, α , P) = C j,N − pi j βi j (t) − ∑ pi j βi j (t)
v=pi i ∈N\{i}
if the pair (α , P) is in the hybrid capacity region. Notice that the term C j,N −
∑i =i pi j βi j (t) is the maximum rate of i using channel j at time t. Hence, the
G-function-based dynamics is given by
" #
β̇i j = −μ̄ pi j βi j − C j,N + ∑ pi j βi j pi j βi j , (3.21)
i
=i
with initial conditions βi j (0) ≤ C j,{i} , where β = [βi j ] is defined in Proposition 3.5,
which is of the same dimension as α , and αi (t) = ∑ j∈J βi j (t); μ̄ is an appropriate
parameter chosen for the rate of convergence.
Fig. 3.4 Two users and three

receivers
3.3.3.3 Hybrid Dynamics
We now combine the two evolutionary game dynamics described in the previous
subsections. Variables α and P are both evolving in time. The overall dynamics are
given by
⎧
⎪
⎪ ṗi j (t) = ∑ j ∈J pi j (t)η ij j (α (t), P(t)) − pi j (t) ∑ j ∈J η ij j (α (t), P(t)),
⎪
⎨
β̇i j (t) = −μ̄ pi j (t)βi j (t) − C j,N + ∑i =i pi j (t)βi j (t) pi j (t)βi j (t), (3.22)
⎪
⎪
⎪
⎩
αi (t) = ∑ j∈J βi j (t), βi j (0) ≤ C j,{i} , ∀ j ∈ J, i ∈ N.
All the equilibria of the hybrid evolutionary rate control and channel selection are
rest points of the preceding hybrid dynamics. The following result can be obtained
directly from Proposition 3.7 and (3.21).
Proposition 3.9. Let (β ∗ , P∗ ) be interior rest points of the hybrid dynamics, i.e.,
βi∗j > 0, p∗i j > 0 and χ (α ∗ , P) = 0. Then for all j,

N N
∑ p∗i j βi∗j = C j,N ; χ ∑ βi∗j , P∗ = 0.
i=1 j=1
3.3.4 Numerical Examples
In this subsection, we illustrate the evolutionary dynamics in (3.21) and (3.22)

by examining a two-user and three-receiver communication system as depicted in
Fig. 3.4. Let hi1 = 0.1, hi2 = 0.2, hi3 = 0.3, for i ∈ {1, 2}. Each transmission power
Pi is set to 1 mW for all i = 1, 2 and the noise level is set to σ 2 = −20 dBm.
In the first experiment, we assume that the rates of the users are predetermined to
be α = [10, 20]T, the Smith dynamics in (3.21) yield in Figs. 3.5 and 3.6 the response
of p1 and p2 . It can be seen that the dynamics converge very fast within less than
half a second.
66 Q. Zhu et al.
Transmitter 1: Probabilities v.s. Time For Fixed alpha=10

0.5
p
11
p12
p13
0.45
0.4
p1
0.35
0.3
0.25
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time
Fig. 3.5 Transmitter 1: probabilities vs. time for fixed α1 = 10
Transmitter 2: Value v.s. Time For Fixed alpha=20

0.5
p21
p22
0.45 p23
0.4
0.35
p2
0.3
0.25
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time
Fig. 3.6 Transmitter 2: probabilities vs. time for fixed α2 = 20
In the second experiment, we assume that the probability matrix P was optimally
found by the users using (3.20). Figures 3.7 and 3.8 show that the β values converge
to an equilibrium from which we can find the optimal value for α . Since these
dynamics are much slower compared to Smith dynamics on P, our assumption of
knowledge of optimal P for a slowly varying α becomes valid.
Transmitter 1: Value v.s. Time

14
11
12
12
13
10
8
Value
0
0 2 4 6 8 10 12 14 16 18 20
Time
Fig. 3.7 Transmitter 1: β value vs. time
Transmitter 2: Value v.s. Time

20
21
18 22
23
16
14
12
Value
10
0
0 2 4 6 8 10 12 14 16 18 20
Time
Fig. 3.8 Transmitter 1: β value vs. time
In the next experiment, we simulate the hybrid dynamics in (3.22). Let the
probability pi j of user i choosing transmitter j and the transmission rates be
initialized as follows:

0.2 0.3 0.5 0.2
P(0) = , α (0) = .
0.25 0.5 0.25 0.1
68 Q. Zhu et al.
Probability of Transmitter 1 Choosing Receivers

0.5
p
11
p
12
p
0.45 13
0.4
Probability
0.35
0.3
0.25
0.2
0 2 4 6 8 10 12 14 16 18 20
Time
Fig. 3.9 Probability of Transmitter 1 choosing receivers
We let the parameter μ̄ = 0.9. Figure 3.9 shows the evolution of the weights of
user 1 on each of the receivers. The weights converge to p1 j = 1/3 for all j
within 2 s, leading to an unbiased choice among receivers. In Fig. 3.10, we show the
evolution of the weights of the second user on each receiver. At equilibrium, p2 =
[0.3484, 0.4847, 0.1669]T. It appears that user two favors the second transmitter over
the other ones. Since the utility ui j is of the same form, the optimal response set Ki∗
is naturally nonempty and contains all the receivers. As shown in Proposition 3.6,
the probability of choosing a receiver at equilibrium is randomized among the three
receivers and can be determined by the rates α chosen by the users.
The β -dynamics determines the evolution of α in (3.22). In Fig. 3.11, we see
that the evolutionary dynamics yield α = [15.87, 23.19]T at equilibrium. It is easy
to verify that they satisfy the capacity constraints outlined in Sect. 3.2. It converges
within 5 s and appears to be much slower than in Figs. 3.9 and 3.10. Hence, it can be
seen that P-dynamics may be seen as the inner-loop dynamics, whereas β -dynamics
can be seen as an outer-loop evolutionary dynamics. They evolve on two different
time scales. In addition, thanks to Proposition 3.8, finding the rest points for the
preceding dynamics ensures that we will find the equilibrium.
3.4 Concluding Remarks
In this paper, we have studied an evolutionary multiple-access channel game with a

continuum action space and coupled rate constraints. We showed that the game has
a continuum of strong equilibria that are 100 % efficient in the rate optimization
problem. We proposed the constrained Brown–von Neumann–Nash dynamics,
Probability of Transmitter 2 Choosing Receivers

0.5
0.45 p21
p22
p23
0.4
Probability
0.35
0.3
0.25
0.2
0 2 4 6 8 10 12 14 16 18 20
Time
Fig. 3.10 Probability of Transmitter 2 choosing receivers
Rates of Each Transmitter

35
30
1
25
20
Rates
15
10
0
0 2 4 6 8 10 12 14 16 18 20
Time
Fig. 3.11 Rates of each transmitter
Smith dynamics, and the replicator dynamics to study the stability of equilibria
in the long run. In addition, we introduced a hybrid multiple-access game model
and its corresponding evolutionary game-theoretic framework. We analyzed the NE
for the static game and suggested a system of evolutionary-game-dynamics-based
method to find it. It was found that the Smith dynamics for channel selections are
a lot faster than the β -dynamics, and the combined dynamics yield a rest point that
corresponds to the NE. An interesting extension that we leave for future research is
70 Q. Zhu et al.
to introduce a dynamic channel characteristic: the gains hi j (t) are time-dependent

random variables. Another interesting question is to find an equilibrium structure in
the case of multiple-access games with nonconvex capacity regions.
References
1. Tembine, H., Altman, E., ElAzouzi, R., Hayel, Y.: Evolutionary games in wireless networks.
IEEE Trans. Syst. Man Cybern. B Cybern. 40(3), 634–646 (2010)
2. Tembine, H., Altman, E., El-Azouzi, R., Sandholm, W.H.: Evolutionary game dynamics with
migration for hybrid power control game in wireless communications. In: Proceedings of 47th
IEEE Conference on Decision and Control (CDC), Cancun, Mexico, pp. 4479–4484 (2008)
3. Zhu, Q.: A Lagrangian approach to constrained potential games: theory and example.
In: Proceedings of 47th IEEE Conference on Decision and Control (CDC), Cancun, Mexico,
pp. 2420–2425 (2008)
4. Saraydar, C.U., Mandayam, N.B., Goodman, D.J.: Efficient power control via pricing in
wireless data networks. IEEE Trans. Commun. 50(2), 291–303 (2002)
5. Belmega, E.V., Lasaulce, S., Debbah, M., Jungers, M., Dumont, J.: Power allocation games
in wireless networks of multi-antenna terminals. Springer Telecomm. Syst. J. 44(5–6),
1018–4864 (2010)
6. Palomar, D.P., Cioffi, J.M., Lagunas, M.A.: Uniform power allocation in MIMO channels: a
game theoretic approach. IEEE Trans. Inform. Theory 49(7), 1707–1727 (2003)
7. Vincent, T.L., Vincent, T.L.S.: Evolution and control system design. IEEE Control Syst. Mag.
20(5), 20–35 (2000)
8. Vincent, T.L., Brown, J.S.: Evolutionary Game Theory, Natural Selection, and Darwinian
Dynamics. Cambridge University Press, Cambridge (2005)
9. Zhu, Q., Tembine, H., Başar, T.: A constrained evolutionary Gaussian multiple access channel
game. In: Proceedings of the International Conference on Game Theory for Networks
(GameNets), Istanbul, Turkey, 13–15 May 2009
10. Zhu, Q., Tembine, H., Başar, T.: Evolutionary games for hybrid additive white Gaussian noise
multiple access control. In: Proceedings of GLOBECOM (2009)
11. Altman, E., El-Azouzi, R., Hayel, Y., Tembine, H.: Evolutionary power control games
in wireless networks. In: NETWORKING 2008 Ad Hoc and Sensor Networks, Wireless
Networks, Next Generation Internet, pp. 930–942. Springer, New York (2008)
12. Andelman, N., Feldman, M., Mansour, Y.: Strong price of anarchy, Games and Economic
Behavior, 65(2), 289–317 (2009)
13. Anshelevich, E., Dasgupta, A., Kleinberg, J., Tardos, E., Wexler, T., Roughgarden, T.:
The price of stability for network design with fair cost allocation. In: Proceedings of the FOCS,
pp. 59–73 (2004)
14. Tembine, H., Altman, E., El-Azouzi, R., Hayel, Y.: Multiple access game in ad-hoc networks.
In: Proceedings of 1st International Workshop on Game Theory in Communication Networks
(GameComm) (2007)
15. Forges, F.: Can sunspots replace a mediator? J. Math. Econ. 17, 347–368 (1988)
16. Forges, F.: Sunspot equilibrium as a game-theoretical solution concept. In: Barnett, W.A.,
Cornet, B., Aspremont, C., Gabszewicz, J.J., Mas-Colell, A. (eds.) Equilibrium Theory and
Applications: Proceedings of the 6th International Symposium in Economic Theory and
Econometrics, pp. 135–159. Cambridge University Press, Cambrdige (1991)
17. Forges, F., Peck, J.: Correlated equilibrium and sunspot equilibrium. Econ. Theory 5, 33–50
(1995)
18. Aumann R. J.: Acceptable points in general cooperative n-person games, Contributions to the
Theory of Games IV, Annals of Mathematics Study 40, Princeton University Press, 287–324
(1959)
19. Gajic, V., Rimoldi, B.: Game theoretic considerations for the Gaussian multiple access channel.
In: Proceedings of the IEEE International Symposium on Information Theory (ISIT) (2008)
20. Goodman, J.C.: A note on existence and uniqueness of equilibrium points for concave N-person
games. Econometrica 48(1), 251 (1980)
21. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge
University Press, Cambridge (1998)
22. Ponstein, J.: Existence of equilibrium points in non-product spaces. SIAM J. Appl. Math. 14(1),
181–190 (1966)
23. McGill, B.J., Brown, J.S.: Evolutionary game theory and adaptive dynamics of continuous
traits. Annu. Rev. Ecol. Evol. Syst. 38, 403–435 (2007)
24. Shaiju, A.J., Bernhard, P.: Evolutionarily robust strategies: two nontrivial examples and
a theorem. In: Proceedings of 13-th International Symposium on Dynamic Games and
Applications (ISDG) (2006)
25. Maynard Smith, J., Price, G.M.: The logic of animal conflict. Nature 246, 15–18 (1973)
26. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave N-person games.
Econometrica 33, 520–534 (1965)
27. Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Cambridge
(2010)
28. Takashi, U.: Correlated equilibrium and concave games. Int. J. Game Theory 37(1), 1–13
(2008)
29. Taylor, P.D., Jonker, L.: Evolutionarily stable strategies and game dynamics. Math. Biosci. 40,
145–156 (1978)
30. Tembine, H., Altman, E., El-Azouzi, R., Hayel, Y.: Evolutionary games with random number
of interacting players applied to access control. In: Proceedings of IEEE/ACM International
Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
(WiOpt), March 2008.
31. Tembine, H., Altman, E., El-Azouzi, R.: Delayed evolutionary game dynamics applied to the
medium access control. In: Proceedings of the 4th IEEE International Conference on Mobile
Ad-hoc and Sensor Systems (MASS) (2007)
32. Alpcan, T., Başar, T.: A hybrid noncooperative game model for wireless communications. In:
Proceedings of 11th International Symposium on Dynamic Games and Applications, Tucson,
AZ, December 2004
33. Alpcan, T., Başar, T., Srikant, R., Altman, E.: CDMA uplink power control as a noncooperative
game. Wireless Networks 8, 659–670 (2002)
34. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, SIAM Series in Classics in
Applied Mathematics, 2nd ed. SIAM, Philadelphia, (1999)
35. Dodis, Y., Halevi, S., Rabin, T.: A cryptographic solution to a game theoretic problem. In:
Annual International Cryptology Conference, Santa Barbara CA, USA. Lecture Notes in
Computer Science, vol. 1880, pp. 112–130 Springer-Verlag London, UK (2000)
36. Alpcan, T., Başar, T.: A hybrid system model for power control in multicell wireless data
networks. Perform. Eval. 57, 477–495 (2004)
37. Başar, T., Zhu, Q.: Prices of anarchy, information, and cooperation in differential games.
J. Dynamic Games Appl. 1(1), 50–73 (2011)
38. Roughgarden, T.: Selfish Routing and the Price of Anarchy. MIT Press, Cambridge (2005)
Chapter 4
Join Forces or Cheat: Evolutionary Analysis
of a Consumer–Resource System
Andrei R. Akhmetzhanov, Frédéric Grognard, Ludovic Mailleret,

and Pierre Bernhard
Abstract In this contribution we consider a seasonal consumer–resource system

and focus on the evolution of consumer behavior. It is assumed that consumer
and resource individuals live and interact during seasons of fixed lengths separated
by winter periods. All individuals die at the end of the season and the size of
the next generation is determined by the the consumer–resource interaction which
took place during the season. Resource individuals are assumed to reproduce at a
constant rate, while consumers have to trade-off between foraging for resources,
which increases their reproductive abilities, or reproducing. Firstly, we assume that
consumers cooperate in such a way that they maximize each consumer’s individual
fitness. Secondly, we consider the case where such a population is challenged by
selfish mutants who do not cooperate. Finally we study the system dynamics over
many seasons and show that mutants eventually replace the original cooperating
population, but are finally as vulnerable as the initial cooperating consumers.
A.R. Akhmetzhanov ()

Department of Biology (Theoretical Biology Laboratory), McMaster University,
Life Sciences Building, 1280 Main St. West, Hamilton, ON L8S 4K1, Canada
INRIA Sophia Antipolis – Méditerranée (Project BIOCORE), 2004 route des Lucioles,
BP 93, 06902 Sophia Antipolis, France
F. Grognard • P. Bernhard
INRIA Sophia Antipolis – Méditerranée (Project BIOCORE), 2004 route des Lucioles,
BP 93, 06902 Sophia Antipolis, France
L. Mailleret
UMR ISA, INRA (1355), University of Nice Sophia Antipolis, CNRS (7255), 400 route des
Chappes, BP 167, 06903, Sophia Antipolis, France
INRIA Sophia Antipolis - Méditerrane (Project BIOCORE), 2004 route des Lucioles, BP 93,
06902 Sophia Antipolis, France

74 A.R. Akhmetzhanov et al.
Keywords Consumer–resource system • Mutant invasion • Hierarchical

differential games
4.1 Introduction
Among the many ecosystems found on Earth, one can easily identify many
examples of resource–consumer systems like e.g. plant–grazer, prey–predator or
host–parasitoid systems known in biology [13]. Usually, individuals involved in
such systems (bacteria, plants, insects, animals, etc.) have conflicting interests
and models describing such interactions are based on principles of game theory
[2, 7, 8, 16]. Hence, the investigation of such models is of interest to both game
theoreticians and behavioral and evolutionary biologists.
One of the main topics of evolutionary theory is addressing whether individuals
should behave rationally throughout their lifetime. Darwin’s statement of the
survival of the fittest indicates that evolution selects the best reproducers, so that
the evolutionary process should result in selecting organisms which appear to
behave rationally, even though they may know little about rationality. Evolutionary
processes may thus result in organisms which actually maximize their number
of descendants [19]; this is true in systems in which density dependence can be
neglected, or in which the relation between the organisms and their environment is
fairly simple [14]. Otherwise, such a rule may not apply and evolution is expected to
yield a population which employs an evolutionarily stable strategy; such a strategy
will not allow them to get the maximum possible number of descendants, but cannot
be beaten by any strategy a deviant organism may choose to follow [10, 21]. In the
following, since we will be concerned with populations in which some organisms
may deviate from the others, we will use the terminology from Adaptive Dynamics
[6] and designate by “mutants” the organisms adopting a strategy different from the
one of the main population, which will be referred to as the resident population.
In this work we study the fate of mutants based on an example of a seasonal
consumer–resource system with optimal consumers as introduced by [1] using a
semi-discrete approach [9]. In such a system, consumer and resource individuals
are active during seasons of fixed length T separated by winter periods. To give
an idea of what such a system could represent, the resource population could be
annual plants and the consumer population some univoltine phytophagous insect
species. All consumers and resources die at the end of the season and the size of
the next generation is determined by the number of offspring produced during the
previous season (i.e. offspring are made of seeds or eggs which mature into active
stages at the beginning of the season). We assume that consumers have to share their
time between foraging for resources, which increases their reproductive abilities, or
reproducing. The reproduction of the resource population is assumed to occur at a
constant rate.
In nature several patterns of life-history can be singled out, but they frequently
contain two main phases: growth phase and reproduction phase. The transition
between these two phases is said to be strict when the consumers only feed
4 Join Forces or Cheat: Evolutionary Analysis of a Consumer–Resource System 75
at the beginning of their life and only reproduce at the end, or there could
exist an intermediate phase between them where growth and reproduction occur
simultaneously. Such types of behaviors are called determinate and indeterminate
growth patterns respectively [17]. Time-sharing between laying eggs and feeding
for the consumers will be modeled by the variable u: u = 1 means feeding, u = 0 on
the other hand means reproducing. Intermediate values u ∈ (0, 1) describe a situation
where, for some part of the time, the individual is feeding and, for the other part of
the time, it is reproducing.
Firstly, we consider a population of consumers maximizing their common fitness,
all consumers being individuals having the same goal function and acting for the
common good; these will be the residents. We then suppose that a small fraction
of the consumer population starts to behave differently from the main population,
and accordingly will call them mutants. The aim of this paper is to investigate
how mutants will behave in the environment shaped by the residents, and what
consequences can be expected for multi-season consumer–resource systems.
4.2 Main Model
4.2.1 Previous Work
Let us first consider a system of two populations: resources and consumers without
any mutant. The consumer population is modeled with two state variables: the
average energy of one individual p and the number of consumers c present in
the system, while the resource population is described solely by its density n. We
suppose that both populations are structured in mature (adult insects/plants) and
immature stages (eggs/seeds). During the season, mature consumers and resources
interact and reproduce themselves. Between seasons (during winter periods) all
mature individuals die and immature individuals become mature in the next season.
We suppose that no consumers have any energy (p = 0) at the beginning of the
season. The efficiency of reproduction is assumed to be proportional to the value
of p; it is thus intuitive that consumers should feed on the resource at the beginning
and reproduce at the end once they have gathered enough energy. The consumers
thus face a trade-off between investing their time in feeding (u = 1) or laying eggs
(u = 0). According to [1], the within season dynamics are given by
ṗ = −κ p + η nu, ṅ = −δ cnu, (4.1)
where we assume that neither population suffers from intrinsic mortality; κ , η and
δ are constants. After rescaling the time and state variables, the constants κ and η
can be eliminated and the system of Eq. (4.1) can be rewritten in the simpler form:
ṗ = −p + nu, ṅ = −cnu, (4.2)
where c is a rescaled parameter which is proportional to the number of consumers

present in the system.
x
1
u=0
0.5
Sσ
S
u=1
τ
0 T1 T
t = T− τ
Fig. 4.1 Optimal collective behavior of the residents illustrated in the (τ , x) plane [see Eq. (4.4)]
where τ is reverse time. On the figure, solutions are then initiated at (T, p(0)/n(0)) where T is the
length of the season
The amount of immature offspring produced during the season depends on the
sizes of the populations
T T
J= θ c(1 − u(t))p(t) dt, Jn = γ n(t) dt, (4.3)
0 0
where θ and γ are constants.

We assume that consumers maximize the value J, their number of descendants,
which is a classical measure of fitness. We see that this is an optimal control problem
which can be solved using dynamic programming [3] or the Pontryagin maximum
principle [18]. To compute the solution of this problem, the constants c, θ and γ can
be omitted from (4.3), without loss of generality.
All the equations describing the problem are homogeneous of degree one in the
state variables, which can be only positive. This is a particular case of Noether’s
theorem [15] in the calculus of variations about problems which are invariant under
a group of transformations [4]. Hence, the dimension of the phase space of the
optimal control problem (4.2)–(4.3) can be lowered by one unit by the introduction
of a new variable x = p/n. In this case its dynamics can be written in the form:
ẋ = −x(1 − cu) + u, (4.4)

and the Bellman function Ũ(p, n,t) = TT−t (1 − u(s))p(s)ds with the starting point
at (p(t), n(t)) = (p, n) can be expressed as Ũ(p, n,t) = nU(x,t).
The solution of the optimal control problem (4.2)–(4.3) has been obtained in
[1] and the optimal behavioral pattern for c = 1.5 and T = 2 is shown in Fig. 4.1.
These solutions are not restricted to the case where consumers have no energy at
the initial time. The region with u = 1 is separated from the region with u = 0 by a
switching curve S and a singular arc Sσ such that
S: x = 1 − e−τ (4.5)
2 4
Sσ : τ = − log x + − , (4.6)
xc c
where τ = T − t. They are shown in Fig. 4.1 by thick curves. Along the singular arc
Sσ the consumer uses intermediate control u = û:
2x
û = . (4.7)
2 + xc
When p(0) = 0, one might identify a bang-bang control pattern for short seasons
T ≤ T1 and a bang-singular-bang pattern for long seasons T > T1 . The value T1 is
computed as
log(c + 1) + (c − 2) log2
T1 = , (4.8)
c−1
so that it depends on the number of consumers present in the system.

The optimal value of the amount of offspring produced by an individual can
be computed using this solution. In the following, we focus on the behavior of
mutants appearing in a population of consumers adopting the type of behavior given
in Fig. 4.1.
4.2.2 Consumer–Mutant–Resource System
Suppose that there is a subpopulation of consumers that deviate from the residents’
behavior. Let us assume that these are selfish and maximize their own fitness, and
not the fitness of the whole population, taking into account that the main resident
population acts as if the mutants were kin (i.e. residents do not understand that
mutants are selfish). This means that the residents adjust their strategy by changing
the control whenever its level is intermediate. Such adjustment is possible only when
some certain conditions are satisfied and mutant subpopulation is small enough (see
Sect. 4.3.2).
Denote the proportion of mutants in the whole population of consumers by ε and
the variables describing the state of the mutant and resident populations by symbols
with subscripts “m” and “r” respectively. Then the number of mutants and residents
will be cm = ε c and cr = (1 − ε )c and the dynamics of the system can written as
ṗr = −pr + nur, ṗm = −pm + num, ṅ = −nc [(1 − ε )ur + ε um ], (4.9)
similarly to (4.2). The variable um ∈ [0, 1] defines the decision pattern of the mutants.
The control ur ∈ [0, 1] is the decision pattern of the residents and defined by the
solution of the optimal control problem (4.2)–(4.3).
The number of offspring in the next season is defined similarly to (4.3):
T T T
Jr = θ (1 − ur(t))cr pr (t) dt, Jm = θ (1 − um(t))cm pm (t) dt, Jn = γ n(t) dt,
0 0 0
(4.10)
where the mutant chooses its control um striving to maximize its fitness Jm .
We can see that the problem under consideration is described in terms of a two-
step optimal control problem (or a hierarchical differential game): in the first step
we define the optimal behavior of the residents (see Sect. 4.2.1), in the second step
we identify the optimal response of the mutants to this strategy.
4.3 Optimal Free-Riding
Since θ and γ are constants, they can be omitted from the description of the
optimization problem Jm → max. In this case the functional Jm /(θ cm ) can be taken
um
instead of the functional Jm .
Let one introduce the Bellman function Ũm for the mutant population. It satisfies
the Hamilton–Jacobi–Bellman (HJB) equation

∂ Ũm ∂ Ũm ∂ Ũm
+ max (−pr + nur ) + (−pm + num)
∂t u m ∂ pr ∂ pm

∂ Ũm
− nc((1 − ε )ur + ε um ) + pm(1 − um) = 0. (4.11)
∂n
Introducing new variables xr = pr /n and xm = pm /n and using a transformation

of the Bellman function of the form Ũm (pr , pm , n,t) = nUm (xr , xm ,t), we can reduce
the dimension of the problem by one using Noether’s theorem [15]. The modified
HJB-equation (4.11) takes the following form

.
H = −ν + max λr [−xr (1 − c((1 − ε )ur + ε um)) + ur ]
um
+ λm [−xm (1 − c((1 − ε )ur + ε um )) + um]

− Um c((1 − ε )ur + ε um ) + xm (1 − um) = 0, (4.12)
where the components of the gradient of the Bellman function are denoted by
∂ Um /∂ xr = λr , ∂ Um /∂ xm = λm and ∂ Um /∂ τ = ν , variable τ denotes backward
time, τ = T − t. The optimal control can be defined as um = Heav(Am ), where
Am = ∂ H /∂ um = λr xr ε c + λm(1 + xm ε c) − Um ε c − xm and Heav(·) is a unit step

function whose value is zero for negative argument and one for positive argument.
One of the efficient ways to solve the HJB-equation is to use the method of
characteristics (see e.g. [12]). The system of characteristics for Eq. (4.12) is
xr = xr (1 − c((1 − ε )ur + ε um )) − ur , xm = xm (1 − c((1 − ε )ur + ε um )) − um,
λr = −λr , λm = −λm + 1 − um, Um = −Um c((1 − ε )ur + ε um ) + xm (1 − um),
(4.13)
where the prime denotes differentiation with respect to backward time τ . The
terminal condition Um (xr , xm , T ) = 0 gives λr (T ) = λm (T ) = 0. Thus Am (T ) < 0
and um (T ) = 0 (mutants should reproduce at the very end of their life).
4.3.1 First Steps
If we emit the characteristic field from the terminal surface t = T with ur = um = 0,

then
xr = xr , xm = xm , λr = −λr , λm = −λm + 1, Um = xm ,
λr (T ) = λm (T ) = 0, Um (T ) = 0 .
We get the following equations for state and conjugate variables and for the Bellman
function: xr = xr (T ) eτ , xm = xm (T ) eτ , λr = 0, λm = 1 − e−τ , Um = xm (1 − e−τ ).
From this solution we can see that there could exist a switching surface Sm :
Sm : xm = 1 − e−(T−t) , (4.14)
such that Am = 0 on it, where the mutant changes its control. Equation (4.14)
is similar to (4.5). However, we should take into account the fact that there is
also a hypersurface Sr , where the resident changes its control from ur = 0 to
ur = 1 independently of the decision of the mutant. Hence it is important to define
which surface, Sr or Sm the characteristic intersects first, see Fig. 4.2. Suppose
that this is the surface Sr . Since the control ur changes its value on Sr , the HJB-
equation (4.12) also changes and, as a consequence, the conjugate variables ν , λr
and λm could possibly be discontinuous. Let us denote the incoming characteristic
field (in backward time) by “−” and the outcoming field by “+”. Consider a point
of intersection of the characteristic and the surface Sr with coordinates (xr1 , xm1 , τ1 ).
Thus xr1 = 1 − e−τ1 and the normal vector ϑ to the switching surface is written in
the form
ϑ = ∇Sr = (∂ Sr /∂ xr , ∂ Sr /∂ xm , ∂ Sr /∂ τ )T = (−1, 0, 1 − xr1 )T .

1
xr
0.5
Sr
0
1
Sm 0.5 xm
l1
1.5
1
0.5
0 τ
Fig. 4.2 Some family of optimal trajectories emanating from the terminal surface
From the incoming field we have the following information about the co-state
λr− = 0, λm− = xr1 , ν − = xm1 (1 − xr1 ). Since the Bellman function is continuous
on the surface Sr , we have: Um+ = Um− = Um = xm1 xr1 . The gradient ∇Um has a jump
in the direction of the normal vector ϑ : ∇Um+ = ∇Um− + kϑ . Here k is an unknown
scalar. Thus
λr+ = −k, λm+ = xr1 , ν + = xm1 (1 − xr1 ) + k(1 − xr1 ). (4.15)
If we suppose that the control of the mutant is the same, u+ +

m = 0 (in this case Am
should be negative), the HJB-equation (4.12) has the form
−ν + + λ + [−xr1 (1 − (1 − ε )c) + 1] − λm+xm1 (1 − (1 − ε )c) − (1 − ε )cUm + xm1 = 0.

(4.16)
Substituting the values from (4.15) into (4.16) we get: k[−2(1 − xr1 ) − xr1
(1 − ε )c] = 0, which leads to k = 0 and, actually, there is no jump in the conjugate
variables. They keep the same values as in (4.15) and A+ −
m = Am .
Conversely, the mutant may react to the decision of the resident and also change
its control on Sr from u− + +
m = 0 to um = 1. This is fulfilled if Am > 0. Substitution
+ + +
of the values ν , λr and λm from (4.15) to the HJB-equation (4.12) gives k =
(xr1 − xm1 )/(xr1 c + (1 − xr1 )) and
(1 − ε )xr1 c + (1 − xr1 )
A+ + +
m = λr xr1 ε c+ λm (xm1 ε c+1)− ε cUm −xm1 = (xr1 −xm1 ) ,
xr1 c + (1 − xr1 )
xr
1
0.5
0
σ 1
Sr
Sr
σ
S1
0.5 xm
Sm l1
1.5
1
0.5
τ
0
Fig. 4.3 Construction of the singular arc S1σ
which is positive when xr1 > xm1 . In Fig. 4.2 this corresponds to the points of the
surface Sr which are below the line l1 : xr = xm = 1 − e−τ . For the optimal trajectories
which go through such points: ur (τ1 + δ ) = um (τ1 + δ ) = 1, where δ is arbitrarily
small. One can show that there will be no more switches of the control. However,
if we consider a trajectory going from a point above l1 , then ur (τ1 + δ ) = 1 and
um (τ1 + δ ) = 0; a switch of the control um from zero to one then takes place later
(in backward time). After that, there will be no more switches.
Now consider a trajectory emitted from the terminal surface which first intersects
the surface Sm rather than the surface Sr . In this case the situation depicted in
Fig. 4.3 takes place: one might expect the appearance of a singular arc S1σ there.
The following are necessary conditions for its existence
H = 0 = H0 + Am um , H 0 = − ν − λ xr − λ m xm + xm , (4.17)
Am = 0 = λr xr ε c + λm (xm ε c + 1) − ε cUm − xm , (4.18)
.
Am = {Am H0 } = 0 = Am1 , (4.19)
where the curly brackets denote the Poisson (Jacobi) brackets. If ξ is a vector of
state variables and ψ is a vector of conjugate ones (in our case ξ = (xr , xm , τ ) and
ψ = (λr , λm , ν )), then the Poisson brackets of two functions F = F(ξ , ψ ,Um ) and
G = G(ξ , ψ ,Um ) are given by the formula: {F G} = Fξ + ψ FUm , Gψ − Fψ , Gξ +
ψ GUm . Here ·, · denotes the scalar product and e.g. Fψ = ∂ F/∂ ψ .
After some algebra, (4.19) takes the form
Am1 = νε c + xm + λr xr ε c − (xm + 1)(1 − λr) = 0 (4.20)

We can derive the variable ν from (4.17) and substitute it into (4.20). We get
Am1 = xm − 1 + λm = 0. This leads to λm = 1 − xm and
xm + ε Um + (1 − xm)(xm ε c + 1)
λr = ,
xr ε c
which is obtained from (4.18).

To derive the singular control um = ũm ∈ (0, 1) along the singular arc, one
should write the second derivative: Am = 0 = {{Am H }H } = {{Am H0 }H0 } +
{{Am H0 }Am }ũm . Thus
{{Am H0 }H0 } 2xm

ũm = = , (4.21)
{Am {Am H0 }} 2 + xmε c
which has the same form as (4.7).

The equation for the singular arc S1σ can be obtained from the system of dynamic
equations (4.13) by substituting ur = 0 and um = ũm from (4.21):
x2m ε c
xm = − , xm (τ = log 2) = 1/2 .
2 + xm ε c
Finally, we have the analogous expression to (4.6):
2 4
S1σ : T − t = − log xm + − (4.22)
xm ε c ε c
for ε
= 0. If ε = 0, the surface Sm is a hyperplane xm = 1/2.
After these steps we have the structure of the solution shown in Fig. 4.3.
4.3.2 Optimal Motion Along the Surface Srσ
According to the computations done in Sect. 4.2.1, resident consumers must adopt
a behavior ur which keeps the surface Srσ invariant (see Fig. 4.3). In a mutant-free
population, this is done by playing the singular control (4.7), but if mutants are
present in the population, the dynamics of the system are modified and the mutant-
free singular control (4.7) does not make Srσ invariant any more. However, residents
may still make Srσ invariant by adopting a different behavior, denoted ûr , as long
as the mutants’ influence, i.e. ε , is not too large. To compute ûr , we notice that it
should make xr follow the dynamics depicted in Fig. 4.1, i.e. ẋr = −xr (1 − cur) + ur
with ur = û defined in Eq. (4.7). We get that ûr should be computed from:
x2r c
xr = − = xr (1 − c((1 − ε )ûr + ε um )) − ûr ,
2 + xr c
so that
2xr (1 + xr c) xr ε cum
ûr = − . (4.23)
(1 + (1 − ε )xrc)(2 + xrc) 1 + (1 − ε )xrc
Thus, the residents will be able to keep Srσ invariant provided ûr ∈ [0, 1] for all points
belonging to Srσ and for all possible values of um ∈ [0, 1].
To identify for which parameters of the model this is possible, we may notice
that ûr is a linear function of um and decreasing. Moreover,
2xr (1 + xr c) 1 + xr c
ûr (um = 0) = ≤ 2xr ≤ 1,
(1 + (1 − ε )xrc)(2 + xr c) 2 + xr c
2xr 1+xr c−ε c−ε cxr c/2

since xr ≤ 1/2. Conversely, when ûm = 1, ûr = 2+x rc 1+(1−ε )xr c . If this value
is larger than 0 for any xr belonging to Srσ , invariance of Srσ is ensured. A condition
for this to occur is
ε < 1/c. (4.24)
It is interesting to notice that ûr (um = 0) is larger than the original û in (4.7), since
the residents must compensate for the non-eating mutants. Conversely, when um = 1,
ûr < û. The tipping point takes place when um = û, which ensures ûr = um ; mutants
behaving like the original residents allow the residents to behave as such.
In this paper we consider only the values of ε satisfying (4.24), i.e. such that
the residents are able to adopt their optimal behavior, in spite of the presence of
mutants. Otherwise, the influence of the mutants on the system may be too large,
and the residents would not have the possibility to stick to their fitness maximization
program.
The control ûr = ûr (xr , xm , τ , um ) is defined in feedback form, i.e. it depends on
the time and on the state of the system. The corresponding Hamiltonian (4.12) needs
to be modified to
Ĥ = H (xr , xm ,Um , λr , λm , ν , ûr (xr , xm , τ , um ), um ), (4.25)
so that the coefficient multiplying the control um becomes
∂ Ĥ λm (1 + xr(1 − ε )c + xmε c) − ε cUm

Âm = = − xm . (4.26)
∂ um 1 + (1 − ε )xrc
This expression allows us to compute the optimal behavior of the mutants on the
surface Srσ , but the calculations are quite complicated. To make things simpler, let
us first consider the particular case of vanishingly small values of ε and study the
optimal behavioral pattern.
4.3.3 Particular Case of a Vanishingly Small Population

of Mutants
4.3.3.1 On the Singular Surface Sσr
If ε ∼
= 0, the mutants’ influence on the system is negligible and, to make Srσ invariant,
the resident should apply the mutant-free singular behavior computed in (4.7): ûr =
2xr /(2 + xrc). In addition, Eqs. (4.25) and (4.26) take the following form

λr x2r c 2 − xr c 2xr c
Ĥ = −ν + + λm −xm + um − Um + xm (1 − um) (4.27)
2 + xr c 2 + xr c 2 + xr c
Âm = λm − xm . (4.28)
If the trajectory originates (in backward time) from some point belonging to Srσ
.
such that xσm = xm (τ = log 2) > 1/2, then um (τ = log 2) = 0 and the system of
characteristics for the Hamiltonian (4.27) is
x2r c 2 − xr c 2xr c
xr = − , x = xm , λ = −λm + 1 , Um = −Um + xm (4.29)
2 + xr c m 2 + xr c m 2 + xr c
with boundary conditions: τ = log 2, xr = 1/2, xm = xσm , λm = 1/2, Um = x2m /2. Thus
λm = 1 − e−τ and there exists a switching curve Ŝ, which is defined as: xm = 1 − e−τ
in addition to τ = − log xr + 2/(xr c) − 4/c. Thus Ŝ = Sm ∩ Srσ .
The switching curve Ŝ ends at the point with coordinates (xr2 , xm2 , τ2 ) where the
characteristics become tangent to it and the singular arc Ŝσ appears (see Fig. 4.4).
Before determining the coordinates of this point, let us define the singular arc,
denoted Ŝσ . From (4.27)–(4.28) we get
λr x2r c 2 − xr c 2xr c
ν= − λ m xm − Um + xm , λ m = xm (4.30)
2 + xr c 2 + xr c 2 + xr c
along the singular arc. Substitution of (4.30) into equation Âm = 0 gives
xm = (2 + xr c)/4.
In addition, the intermediate control ûm can be derived from Âm = 0 and is
equal to
1
ûm = ,
2 + xr c
which is positive and belongs to the interval between zero and one.
We see that the coordinates xr2 , xm2 and τ2 can be defined by the following
equations
2 + x r2 c 2 4
xm2 = = 1 − e − τ2 , τ2 = − log xr2 + − ,
4 x r2 c c
τ
2.5
2
1.5
1
0.5
Srσ
1 Sr
(xr2,xm2,τ2) σ
S
σ
S1
S
xm 0.5
Sm
1
0.5
xr
Fig. 4.4 Optimal behavior on the surface Srσ
which comes from the fact that the point (xr2 , xm2 , τ2 ) belongs to Ŝσ and is located
on the intersection of the curves Ŝσ and Ŝ. This result is illustrated in Fig. 4.4.
4.3.3.2 Outside the Singular Surface Sσr
If the state is outside the surface Srσ , things are a little easier since at least the
behavior of the residents, ur , is constant and equal to 0 or 1, depending on the
respective value of τ and xr .
We can actually show that the surface S1σ (where ur = 0) can be extended further
by considering the situation in Fig. 4.3. Indeed, the following conditions are fulfilled
for this region:

H
= −ν − λr xr − λmxm + xm = 0, Am = λm − xm = 0, Am = 0 .
ur =0
Therefore, ν = −λr xr − λm xm + xm , λm = xm and the condition Am = 0:

−1 + 2xm = 0 gives xm = 1/2, which is precisely the definition of S1σ when ε = 0
[see Eq. (4.22)].
Consider now the region where xr is smaller than on the surface Srσ (see Fig. 4.4),
where ur = 1. There is a switching surface which extends the surface Sm and is
defined by the same Eq. (4.14). However, there could also exist a singular arc S2σ
starting from some points of Sm . Such an arc must satisfy the following conditions
xr
1 0.5
σ
Sr
σ
S
1 Sr
σ
S1
xm 0.5
2.5
2
1.5
Sm
1
τ
0.5
Fig. 4.5 Optimal behavioral pattern for c = 3
H
= −ν − λr (xr (1 − c) − 1) − λmxm (1 − c) − zUm + xm = 0 (4.31)
ur =1
Am = λm − xm = 0, Am = 0, (4.32)
which give a possible candidate for a singular arc S2σ : xm = 1/(2 − c). We see that
its appearance is possible only for c < 1, since xm must belong to Sm . For c > 1
the structure of the solution in the domain below the surface Srσ is actually simpler
and consists only of the switching surface Sm , see Fig. 4.5. Notice that in the case
xr (0) = xm (0) = 0 investigated below, the existence of the singular arc S2σ is not
relevant, since it cannot be reached from such initial conditions.
4.3.4 Computation of the Value Functions in the Case ε = 0
Following [1], we assume that at the beginning of the season the energy of
consumers is zero: xr (0) = xm (0) = 0. Therefore, we should take into account only
the trajectories coming from these initial conditions. The phase space is reduced
in this case to the one shown in Fig. 4.6. One can see that there are three different
regions depending on the length of the season T . If it is short enough, i.e. T ≤ T1
(see Eq. (4.8)), then the behavior of the mutant coincides with the behavior of the
resident and the main population cannot be invaded: actually, the behavior of the
mutant coincides with the behavior of the residents. If the length of the season is
between T1 and T2 , there is a period in the life-time of a resident when it applies
3
τ
2
(ur , um) C
1 B
C
(ur, 0)
D
(ur , 1)
A
xm 0.5
B
(1,1)
T2
A
(0,0) (1,1)
T1
1
0.5
O
xr
Fig. 4.6 The reduced optimal pattern for trajectories satisfying the initial conditions xr (0) =
xm (0) = 0 with c = 3
an intermediate strategy and spares some amount of the resource for its future use.
Mutants are able to use this fact and there exists a strategy that guarantees them
better results.
Let us introduce the analogue of the value function Ũm for the resident and denote
it by Ũr :
T
Ũr (pr , pm , n,t) = pr (s)(1 − ur (s)) ds .
T −t
The value Ũr (0, 0, n(0), T ) represents the amount of eggs laid by the resident
during a season of length T . Its value depends on the state of the system and
the following transformation can be done Ũr (pr , pm , n,t) = nUr (xr , xm ,t). In the
following, we omit some parameters and write the value function in the simplified
.
form Ur (T ) = Ur (0, 0, T ) where the initial conditions xr (0) = xm (0) = 0 have been
taken into account.
In the region A (see Fig. 4.6) the value functions for both populations (of mutants
and residents) are equal to each other Um (T ) = Ur (T ) = x1 e−c(T −τ1 ) . Here the value
τ1 can be defined from the intersection of the trajectory and the switching curve
Sr ∩ Sm :
e(c−1)(T −τ1 ) − 1
1 − e − τ1 = .
c−1
To obtain the value functions in the regions B and C, one must solve the
system of characteristics (4.29) in the case when the characteristics move along
the surface Srσ and um = 1. This leads to the following characteristic equations for
the Hamiltonian (4.27):
x2r c 2 − xr c x2r c
xr = − , xm = xm − 1, Um = −Um ,
2 + xr c 2 + xr c 2 + xr c
and consequently
xm = C1 x2r eτ + xr z + 1, Um = C2 x2r , C1 ,C2 = const , (4.33)
where C1 and C2 are defined from the boundary conditions, while Eq. (4.6) is also
fulfilled.
Along the singular arc Ŝσ the mutant uses the intermediate strategy (4.21). In this
case,
2xr c 1 + xr c
Um = −Um cûr + xm (1 − ũm) = −Um + .
2 + xr c 4
2 (1+xr c)(2+xr c)
Since xr = − 2+x
xr c
rc
, we have dUm
dr = 2Um
xr − 4x2r c
. Thus
4 + 3xrc(3 + 2xrc)
Um = C3 x2r + , C3 = const. (4.34)
24xr c
We now undertake to compute the limiting season length T2 that separates the
region B from the region C. The coordinates of the point B were obtained in the
previous section. To define the coordinates of the point (xσr2 , xσm2 , τ2σ ) of intersection
of the optimal trajectory with the curve AD, we use the dynamics of motion along the
surface Srσ with ur = ûr and um = 1 (4.33): xm = C1 x2r eτ + xr z+ 1, where the constant
2+xr c
C1 should be chosen such that: xm2 = C1 x2r2 eτ2 + xr2 c + 1, xm2 = 4 2 = 1 − e−τ2 .
(xr2 c−2)(3xr2 c+2)
Therefore C1 = 16x2r
. After that the coordinates: xσr2 , xσm2 and τ2σ can be
2
defined from the following conditions
σ 2 4
xσm2 = xσ2 = C1 (xσr2 )2 eτ2 + xσr2 c + 1, τ2σ = − log xσr2 + − . (4.35)
xσr2 c c
The boundary value T2 can be obtained from T2 = τ2σ + log(xσr2 (c − 1) + 1)/(c − 1).
Now we compute the value functions Ur (T ) and Um (T ) for the region B (T1 <
T ≤ T2 ), where only the mutant uses bang-bang control. For the resident population
we have
1 − 2xr2
Ur (T ) = Ur2 e−c(T −τ2 ) , Ur2 = xr2 (1 − xr2 ) + , (4.36)
c
where the point with coordinates (xr2 , xr2 , τ2 ) defines the intersection of the
trajectory and surface Srσ :
2 4 e(c−1)(T −τ2 ) − 1
τ2 = − log xr2 + − , x r2 = . (4.37)
x r2 c c c−1
For the mutant population the value function Um in the region with u = û and um = 1
satisfies the equation resulting from (4.33):
(û,1)
Um = x2m1 (xr /xr1)2 , (4.38)
where (xr1 , xm1 , τ1 ) is the point of intersection of the trajectory with the curve
AB (see Fig. 4.6). Using (4.38) and notation from (4.37), we can write Um (T ) =
Um2 e−c(T −τ2 ) , Um2 = x2m1 (xr2 /xr1)2 , which is analogous to (4.36).
In the region C the value function for the resident has the same form as in (4.36),
but it has a different form for the mutant. Suppose that the optimal trajectory
intersects the surface Sσ at the point with coordinates (x̃r2 , x̃m2 , τ̃2 ). Then the
Bellman function at this point is given by

c2 4 + 3x̃r2 c 3x̃r c(2x̃r2 c + 3) + 4
Ũm2 = x̃2r2 − + 2 ,
16 24x̃3r2 c 24x̃r2 c
which is written using (4.34) with definition of the constant C3 from the given
boundary conditions.
When the optimal trajectory moving along the surface Sσ intersects the curve AD
at some point with coordinates (x̃σr2 , x̃σm2 , τ̃2σ ) (see Fig. 4.6), the Bellman function can
σ
be expressed as follows: Ũmσ2 = Ũm2 x̃σr2 /x̃r2 . Thus Um (T ) = Ũmσ2 e−c(T −τ2 ) .
The difference in the value functions (number of offspring per mature individual)
of themutant and optimally behaving resident is presented in Fig. 4.7. It is shown
that as soon as the season length is longer than T1 , residents may be out-competed
by selfish “free riding” mutants (see also Fig. 4.8). Otherwise the pay-off functions
of the mutants and residents are the same. Therefore, if the season length is shorter
than T1 , the optimal strategy of the resident is evolutionary stable in the sense that it
cannot be beaten by any other strategy [11]. Thus, in the present example, collective
optimal strategies of the bang-bang type are also evolutionary stable, while those of
the bang-singular-bang type may always be outcompeted by alternative strategies.
Whether such properties also hold in more general settings is an important topic of
future research.
0.4
Um
Ur
U
0.2
0 T1 2 4 6
T
Fig. 4.7 Difference in the value functions of the resident and the mutant (c = 3)
x
xm
xr
um= um
um= 0
um= 1
0.5
ur = 0 ur = ur
ur= um = 1
τ
0 1 2 T 3
t = T− τ
Fig. 4.8 Optimal free-riding by the mutant
4.3.5 Generalization to Sufficiently Small But Non-zero

Values of ε
In this section we consider the case of non-zero ε such that the condition (4.24)
remains fulfilled. This means that the trajectory intersecting the singular surface
Srσ does not cross it, but moves along it due to the residents who make it invariant
through the behavior ûr (4.23).
In this case, the phase space can also be divided into two regions: according to
whether xr is smaller or larger than on Srσ . In both of these regions the structure
of the solution has similar properties to the case considered above when ε is
arbitrarily small. On the surface Srσ the optimal behavior is also similar to that of the
previous case.
In the region with larger xr values than the ones on the surface Srσ , there is a
part of the switching surface Sm and a singular arc S1σ where the mutant uses an
intermediate strategy. The surface S1σ can be defined using the expression (4.22). In
the other region, we also have a part of Sm and a singular arc S2σ which is different
from S1σ and may not exist for some values of the parameters c and ε .
To identify the values for which the surface S2σ is a
part of the solution, let
us write the necessary conditions as in (4.31)–(4.32): H
ur =1 = 0, Am = 0, Am
= {Am H } = 0. Using these equations, we are able to obtain the values of λr ,
λm and ν on the surface S2σ and substitute them into the second derivative Am =
{{Am H }H } = 0 to derive the expression for the singular control applied by the
mutant on this surface:
2xm − (1 − ε )c(1 + xm)

um = . (4.39)
2 − (1 − ε )c + xmε c
There are several conditions which must be satisfied. First of all, the control (4.39)
should be between zero and one
2xm − (1 − ε )c(1 + xm)

0≤ ≤ 1. (4.40)
2 − (1 − ε )c + xmε c
Second, the Kelley condition should also be fulfilled [12, p. 200]:
∂ d2 ∂ H
= {Am {Am H }} ≤ 0 .
∂ um dt 2 ∂ um
This leads to the inequality
2 − (1 − ε ) + xmε c ≥ 0 . (4.41)
In particular, conditions (4.40) and (4.41) together give xm ≤ 2/(2 − c).

To construct the singular arc S2σ , we should substitute the singular control um
from (4.40) and ur = 1 into the equation describing the dynamics (4.13): xm =
xm (1 − c((1 − ε )ur + ε um )) − um , with the boundary conditions obtained from
the tangency condition for the optimal trajectory from the domain um = ur = 1
intersecting the switching surface Sm :

1 1
xm − log 1 − = .
2 − c(1 − ε ) 2 − c(1 − ε )
Such tangency occurs only if 0 ≤ 2−c(1− 1

ε ) ≤ 1, which comes from the condition
that a singular surface Sm exists only for 0 ≤ xm < 1. This gives the following
inequality: 1 − c(1 − ε ) ≥ 0 for the existence of the surface S2σ . One can check that
the inequalities (4.40)–(4.41) are fulfilled as well. The optimal behavioral pattern
for a particular case is shown in Fig. 4.9.
Srσ
S2σ
Sr
1
Sσ
S1σ S
0.5 xm
3
2.5
2
0
1.5
1
τ Sm
0.5
0.5 xr
1
Fig. 4.9 Structure of the optimal behavioral pattern for c = 1.25 and ε = 0.35
4.4 Long-Term Evolution of the System
Model (4.2) was introduced in [1] as the intra-seasonal part of a more complex
multi-seasonal model of population dynamics in which consumers and resources
live for one season only. It was assumed that the (immature) offspring produced
by the consumers and resources in season i and defined by the system of Eq. (4.3),
mature during the inter-season to form the initial consumer and resource populations
of season (i + 1), up to some overwintering mortality. The consumer and resource
population densities at the beginning of season i + 1 is thus ci+1 = μc Ji , ni+1
(t = 0) = μn Jn,i , with Ji and Jn,i defined in (4.3) (μn , μc < 1 allow for overwintering
mortality).
In the presence of a mutant invasion, things differ slightly as the total consumer
population is structured into cri = (1 − εi )ci residents and cmi = εi ci mutants that
have different reproduction strategies. Assuming that reproduction is asexual and
an offspring simply inherits the strategy of their parent, the inter-seasonal dynamics
are as follows: cri+1 = α Ũr (ci , εi , ni , T ) = (1 − i+1 )ci+1 , cmi+1 = α Ũm (ci , εi , ni , T ) =
i+1 ci+1 and ni+1 = β Ṽ (ci , εi , ni , T ), where α = μc θ , β = μn γ , and the functions
Ũr = (1 − εi )ci 0T (1 − ur (t))pr (t) dt, Ũm = εi ci 0T (1 − um (t))pm (t) dt, Ṽ = 0T n(t) dt
can be computed from the solution of the optimal control problem (4.10) with the
dynamics given by (4.9). As stated earlier, the energies of both the mutants and
residents are zero at the beginning of each season (pr (0) = pm (0) = 0). For the
particular case ε = 0, the values Ũr and Ũm were derived analytically in Sect. 4.3.4,
but these are not useful in a multi-season study where the frequency of mutants is
bound to evolve. In the following, we therefore resorted to a numerical investigation,
in order to decipher the long-term fate of the mutants’ invasion.
Density
1.2
Resource (n)
Consumers (c)
0.8
mutants (cm=ε c)
0.4
residents (cr=(1−ε) c)
Season
0 50 100 150 200 250
Fig. 4.10 Effect of an invasion by mutants on the system
Here, we follow an adaptive dynamics type approach and assume that, among
all possible behaviors [1], the resident consumer and the resource population are at
a (globally stable) equilibrium. We investigate what happens when a small fraction
of mutants appear in the resident consumer population. We actually assume that
resident consumers are “naive” in the sense that even if the mutant population
becomes large through the season-to-season reproduction process, the resident
consumers keep their collective optimal strategy and treat mutants as cooperators,
even if they do not cooperate.
The case that we investigated is characterized by α = 2, β = 0.5 and T = 4.
Initially, the system is near the all-residents long-term stable equilibrium point c =
0.9055 and n = 1.0848. At the beginning of some season, a mutant population of
small size cm = 0.001 then appears (ε ≈ 1.1 10−3 < 1/c). We see in Fig. 4.10 that
the mutant population increases its frequency within the consumer population and
modifies the dynamics of the system. Despite this drastic increase, it is however
noteworthy to underline that ci < 1 in all seasons, so that ε < 1/ci is true and the
analysis presented in this paper is valid for all seasons.
The naive behavior of the consumers is detrimental to their progeny: as the
seasons pass, mutant consumers progressively take the place of the collectively
optimal residents and even replace them in the long run (Fig. 4.10), making the
mutation successful. We should however point out that the mutants’ strategy, as
described in (4.10), is also a kind of “collective” optimum: in some sense, it is
assumed that mutants cooperate with other mutants. If the course of evolution
drives the resident population to 0 and only mutants survive in the long run, this
means that the former mutants become the new residents, with exactly the same
strategy as the one of the former residents they replaced. Hence, they are also prone
to being invaded by non-cooperating mutants. The evolutionary dynamics of this
naive resident-selfish mutant-resource thus appears to be a never-ending process:

selfish mutants can invade and replace collectively optimal consumers, but at the
end transform into collectively optimal consumers as well, and a new selfish mutant
invasion can start again. We are actually not in a “Red Queen Dynamics” context,
since we focused on the evolution of one species only, and not co-evolution [20].
Yet, what the Red Queen said to Alice seems to fit the situation we have just
described very well: “here, you see, it takes all the running you can do to keep
in the same place” [5].
Acknowledgements This research has been supported by grants from the Agropolis Foundation
and RNSC (project ModPEA, covenant support number 0902-013), and from INRA (call for
proposal “Gestion durable des résistances aux bio-agresseurs”, project Metacarpe, contract number
394576). A.R.A. was supported by a Post-Doctoral grant from INRIA Sophia Antipolis –
Méditerrannée and by the grant of the Russian Federal Agency on Education, program 1.2.1,
contract P938.
References
1. Akhmetzhanov, A.R., Grognard, F., Mailleret, L.: Optimal life-history strategies in seasonal
consumer–resource dynamics. Evolution 65(11), 3113–3125 (2011). doi:10.1111/j.1558-
5646.2011.01381.x
2. Auger, P., Kooi, B.W., de la Parra, R.B., Poggiale, J.-C.: Bifurcation analysis of a predator-prey
model with predators using hawk and dove tactics. J. Theor. Biol. 238(3), 597–607 (2011).
doi:10.1016/j.jtbi.2005.06.012
3. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
4. Carathéodory, C.: Calculus of variations and partial equations of the first order. Holden-Day,
San Francisco (1965)
5. Carroll, L.: Through the Looking-Glass, and What Alice Found There. MacMillan and Co.,
London (1871)
6. Dercole, F., Rinaldi, S.: Analysis of Evolutionary Processes: The Adaptive Dynamics
Approach and its Applications. Princeton University Press, Princeton (2008)
7. Hamelin, F., Bernhard, P., Wajnberg, É.: Superparasitism as a differential game. Theor. Popul.
Biol. 72(3), 366–378 (2007). doi:10.1016/j.tpb.2007.07.005
8. Houston, A., Székely, T., McNamara, J.: Conflict between parents over care. Trends Ecol. Evol.
20(1), 33–38 (2005). doi:10.1016/j.tree.2004.10.008
9. Mailleret, L., Lemesle, V.: A note on semi-discrete modelling in life sciences. Phil. Trans.
R. Soc. A 367(1908), 4779–4799 (2009). doi:10.1098/rsta.2009.0153
10. Maynard Smith, J.: Evolution and the Theory of Games. Cambridge University Press,
Cambridge (1982)
11. Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246(5427), 15–18 (1973).
doi:10.1038/246015a0
12. Melikyan, A.A.: Generalized Characteristics of First Order PDEs: Applications in Optimal
Control and Differential Games. Birkhäuser, Boston (1998)
13. Murray, J.D.: Mathematical Biology. Springer, Berlin (1989)
14. Mylius, S.D., Diekmann, O.: On evolutionarily stable life histories, optimization and the need
to be specific about density dependence. Oikos 74(2), 218–224 (1995)
15. Noether, E.: Invariante Variationsprobleme. Nachrichten der Königlichen Gesellschaft der
Wissenschaften zu Göttingen. Math.-phys. Klasse 235–257 (1918)
16. Perrin, N., Mazalov, V.: Local competition, inbreeding, and the evolution of sex-biased
dispersal. Am. Nat. 155(1), 116–127 (2000). doi:10.1086/303296
17. Perrin, N., Sibly, R.M.: Dynamic-models of energy allocation and investment. Annu. Rev. Ecol.
Syst. 24, 379–410 (1993)
18. Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mishchenko, E.F.: The Mathematical
Theory of Optimal Processes. Wiley, New York (1962)
19. Schaffer, W.M.: The application of optimal control theory to the general life history problem.
Am. Nat. 121, 418–431 (1983)
20. Van Valen, L.: A New evolutionary law. Evol. Theory 1, 1–30 (1973)
21. Vincent, T.L., Brown, J.S.: Evolutionary Game Theory, Natural Selection and Darwinian
Dynamics. Cambridge University Press, Cambridge (2005)
Chapter 5
Strong Strategic Support of Cooperative
Solutions in Differential Games
Sergey Chistyakov and Leon Petrosyan
Abstract The problem of strategically provided cooperation in m-person

differential games for a prescribed duration and integral payoffs is considered.
The Shapley value operator is chosen as the cooperative optimality principle. It
is shown that components of Shapley value are absolutely continuous and, thus,
differentiable functions along any admissible trajectory. The main result consists in
the fact that if in any subgame along the cooperative trajectory the Shapley value
belongs to the core of this subgame, then the payoffs as components of the Shapley
value can be realized in a specially constructed strong Nash equilibrium, i.e., an
equilibrium that is stable against the deviation of coalitions.
Keywords Strong Nash equilibrium • Time consistency • Shapley value

• Cooperative trajectory
5.1 Introduction
Like the analysis in [9], in this paper the problem of strategic support of cooperation
in a differential m-person game with prescribed duration T and independent motions
is considered. Based on the initial differential game, a new associated differential
game (CD game) is designed. In addition to the initial game, it models players’
actions in connection with the transition from a strategic form of the game to a
cooperative one with the principle of optimality chosen in advance. The model
makes it possible to refuse cooperation at any time instant t for any coalition of
players. As the cooperative principle of optimality, the Shapley value operator is
considered. Under certain assumptions, it is shown that the components of the
S. Chistyakov () • L. Petrosyan

Faculty of Applied Mathematics and Control Processes, St. Petersburg University,
Universitetskiy pr. 35, St. Petersburg 198504, Russia

100 S. Chistyakov and L. Petrosyan
Shapley value along any admissible trajectory are absolutely continuous functions
of time. In the foundation of the CD-game construction lies the so-called imputation
distribution procedure described in [9] (see also [1]). The theorem established
by the authors states that if at each time instant along the conditionally optimal
(cooperative) trajectory future payments to each coalition of players according to
the imputation distribution procedure exceed the maximal guaranteed value that this
coalition can achieve in the CD game, then there exists a strong Nash equilibrium
in the class of recursive strategies first introduced in [2]. In other words, the
aforementioned equilibrium exists if in any subgame along the conditionally optimal
trajectory the Shapley value belongs to its core. The proof of this theorem uses
results and methods published in [2, 3]. The proved theorem is also true for other
value operators possessing the property of absolute continuity along admissible
trajectories of the differential game under consideration. The motions of players
in the game are independent. Thus the motion equations have the form
dx(i)
= f (i) (t, x(i) , u(i) ), i ∈ I = [1; m], (5.1)
dt
x(i) ∈ Rn(i) , u(i) ∈ P(i) ∈ CompRk(i)
(i)
x(i) (t0 ) = x0 , i ∈ I. (5.2)
The payoffs of players i ∈ I = [1 : m] have the integral form

T
(i)
Ht0 ,x0 (u(·)) = h(i) (t, x(t,t0 , x0 , u(·))) dt. (5.3)
t0
Here u(·) = (u(1) (·), . . . , u(m) (·)) is a given m-vector of open-loop controls:

x(t,t0 , x0 , u(·)) = x(1) (t,t0 , x0 , u(1) (·)), . . . , x(m) (t,t0 , x0 , u(m) (·)) ,
(i)
where x(i) (·) = x(·,t0 , x0 , u(i) (·)) is the solution of the Cauchy problem for the ith
subsystem of (5.1) with corresponding initial conditions (5.2) and admissible open-
loop control u(i) (·) of player i.
Admissible open-loop controls of players i ∈ I are Lebesgue measurable open-
loop controls
u(i) (·) : t
→ u(i) (t) ∈ Rk(i)
such that
u(i) (t) ∈ P(i) for all t ∈ [t0 , T ].
It is supposed that all of the functions
f (i) : R × Rk(i) × P(i) → Rk(i) , i ∈ I,

5 Strong Strategic Support of Cooperative Solutions in Differential Games 101
are continuous, locally Lipschitz with respect to x(i) , and satisfy the condition
∃λ (i) > 0 such that
|| f (i) (t, x(i) , u(i) )|| ≤ λ (i) (1 + ||x(i)||) ∀x(i) ∈ Rk(i) , ∀u(i) ∈ P(i) .
Each of the functions
h(i) : R × Rk(i) × P(i) → R, i∈I
is also continuous.
It is supposed that at each time instant t ∈ [t0 , T ], the players have information
about the trajectory (solution) x(i) (τ ) = x(τ ,t0 , x0 , u(i) (·)) of the system (5.1), (5.2)
on the time interval [t0 ,t] and use recursive strategies [1, 2].
5.2 Recursive Strategies
Recursive strategies were first introduced in [1] to justify the dynamic programming
approach in zero-sum differential games, known as the method of open-loop
iterations in nonregular differential games with a nonsmooth value function. The
ε -optimal strategies constructed with the use of this method are universal in the
sense that they remain ε -optimal in any subgame of the previously defined differen-
tial game (for every ε > 0). Exploiting this property it became possible to prove the
existence of ε -equilibrium (Nash equilibrium) in non-zero-sum differential games
(for every ε > 0) using the so-called “punishment strategies” [4].
The basic idea is that when one of the players deviates from the conditionally
optimal trajectory, other players after some small time delay start to play against the
deviating player. As a result, the deviating player is not able to obtain much more
than he could have gotten using the conditionally optimal trajectory. Punishment of
the deviating player at each time instant using the same strategy is possible because
of the universal character of ε -optimal strategies in zero-sum differential games.
In this paper the same approach is used to verify the stability of cooperative
agreements in the game Γ (t0 , x0 ) and, as in the aforementioned case, the principal
argument is the universal character of ε -optimal recursive strategies in specially
defined zero-sum games ΓS (t0 , x0 ), S ⊂ I, associated with the non-zero-sum game
Γ (t0 , x0 ).
The recursive strategies lie somewhere in between piecewise open-loop strategies
[6] and ε -strategies introduced by Pshenichny [10]. The difference from piecewise
open-loop strategies consists in the fact that, as in the case of Pshenichny’s
ε -strategies, the moments of correction of open-loop controls are not prescribed
from the beginning of the game but are defined during the course of the game. At the
same time, they differ from Pshenichny’s ε -strategies in the fact that the formation
of open-loop controls occurs in a finite number of steps.
(n)
The recursive strategies Ui of player i with the maximal number of control
corrections n is a procedure for an admissible open-loop formation by player i in
the game Γ (t0 , x0 ), (t0 , x0 ) ∈ D.
(n)
At the beginning of the game Γ (t0 , x0 ), player i, using the recursive strategy Ui ,
(i)
defines the first correction instant t1 ∈ (t0 , T ] and his admissible open-loop control
(i) (i)
u(i) = u(i) (t) on the time interval [t0 ,t1 ]. Then, if t1 < T , possessing information
(i)
about the state of the game at time instant t1 , he chooses the next moment of
(i)
correction t2 and his admissible open-loop control u(i) = u(i) (t) on the time interval
(i) (i)
(t1 ,t2 ] and so on. Then whether the admissible control on the time interval [t0 , T ]
is formed at the kth step (k ≤ n − 1) or at step n, player i will end up with the process
(i)
by choosing at time instant tn−1 his admissible control on the remaining time interval
(i)
(tn−1 , T ].
5.3 Associated Zero-Sum Games and Corresponding

Solutions
For each given state (t∗ , x∗ ) ∈ D and nonvoid coalition S ⊂ I consider the zero-sum
differential game ΓS (t∗ , x∗ ) between coalition S and I\S with the same dynamics as
in Γ (t∗ , x∗ ) and the payoff of coalition S equal to the sum of payoffs of the players
i ∈ S in the game Γ (t∗ , x∗ ):
T
∑ Ht∗ x∗ (u(S) (·), u(I\S) (·)) = ∑ Ht∗ x∗ (u(·)) = ∑
(i) (i)
h(i) (t, x(t), u(t)) dt,
i∈S i∈S i∈S t0
where
u(S) (·) = {u(i) (·)}i∈S ,
u(I\S) (·) = {u( j) (·)} j∈I\S ,
u(·) = (u(S) (·), u(I\S) (·)) = (u(1) (·), . . . , u(m) (·)).
The game ΓS (t∗ , x∗ ), S ⊂ I, (t∗ , x∗ ) ∈ D, as Γ (t∗ , x∗ ), (t∗ , x∗ ) ∈ D we consider in the

class of recursive strategies. Under the conditions formulated previously, each of the
games ΓS (t∗ , x∗ ), S ⊂ I, (t∗ , x∗ ) ∈ D has a value
valΓS (t∗ , x∗ ).
If S = I, then the game ΓS (t∗ , x∗ ) becomes a one-player optimization problem.

We suppose that in this game there exists an optimal open-loop solution. We denote
the corresponding trajectory-solution of (5.1), (5.2) on the time interval [t0 , T ] by
(1) (m)
x0 (·) = (x0 (·), . . . , x0 (·))
and call it the conditionally optimal cooperative trajectory. This trajectory may not
necessarily be unique. Then on the set D the mapping
I
v(·) : D → R2
is defined with coordinate functions
vS (·) : D → R, S ⊂ I,
vS (t∗ , x∗ ) = valΓS (t∗ , x∗ ).
This mapping attributes to each state (t∗ , x∗ ) ∈ D a characteristic function v(t∗ , x∗ ) :

2I → R of a non-zero-sum game Γ (t∗ , x∗ ) and thus an m-person classical cooperative
game (I, v(t∗ , x∗ )).
Let E(t∗ , x∗ ) be the set of all imputations in the game (I, v(t∗ , x∗ )). The multivalue
mapping
M : (t∗ , x∗ )
→ M(t∗ , x∗ ) ⊂ E(t∗ , x∗ ) ⊂ Rm ,
=Λ
M(t∗ , x∗ )
∀(t∗ , x∗ ) ∈ D,
is called an optimality principle (defined over the family of games Γ (t∗ , x∗ ),

(t∗ , x∗ ) ∈ D) and the set M(t∗ , x∗ ) a cooperative solution of the game Γ (t∗ , x∗ )
corresponding to this principle.
In what follows we shall consider only single-valued mappings of the form
M : (t∗ , x∗ )
→ M(t∗ , x∗ ) ∈ E(t∗ , x∗ ),
(t∗ , x∗ ) ∈ D.
Concretely speaking we shall consider the Shapley value as the optimality

principle, i.e., a mapping Sh(·) : D → Rm in which to each state (t∗ , x∗ ) ∈ D
corresponds the Shapley value Sh(t∗ , x∗ ) in the game (I, v(t∗ , x∗ )).
As follows from [8], under the preceding conditions, the following lemma holds.
Lemma 5.1 (Fridman [5]). The functions vS (·) : D → R, S ∈ I, are locally
Lipschitz.
Since the solution of the Cauchy problem (5.1), (5.2) in the sense of
Caratheodory is absolutely continuous, the next theorem follows from Lemma 5.1.
Theorem 5.1. For every solution of the Cauchy problem (5.1), (5.2) in the sense of
Caratheodory
x(·) = (x(1) (·), . . . , x(m) (·)),
corresponding to the m-system of open-loop controls
u(·) = (u(1) (·), . . . , u(m) (·))

(i)
(x(i) (·) = x(·,t0 , x0 , u(i) (·)), i ∈ I),
the functions
ϕS : [t0 , T ] → R, S ⊂ I, ϕS (t) = vS (t, x(t))
are absolutely continuous functions on the time interval [t0 , T ].

Since each of the coordinate functions of the mapping Sh(·) is a linear combina-
tion of vS (·), S ⊂ I,
(|S| − 1)!(m − |S|)!

Shi (t∗ , x∗ ) = ∑ m!
[vS (t∗ , x∗ ) − vS\{i} (t∗ , x∗ )],
S⊂I: i∈S
from Theorem 5.1 we obtain the following corollary.

Corollary 5.1. For each solution of the Cauchy problem (5.1), (5.2) in the sense of
Caratheodory
x(·) = (x(1) (·), . . . , x(m) (·)),
the functions
αiSh : [t0 , T ] → R, αiSh (t) = Shi (t, x(t)), i ∈ I,
are absolutely continuous.
5.4 Realization of Cooperative Solutions
We shall connect the realization of the single-valued solution of the game Γ (t0 , x0 )
with the known imputation distribution procedure (IDP) [7, 8].
By the IDP of the solution M(t0 , x0 ) of the game Γ (t0 , x0 ) along the conditionally
optimal trajectory x0 (·) we understand function
β (t) = (β1 (t), . . . , βm (t)), t ∈ [t0 , T ], (5.4)
satisfying
T
M(t0 , x0 ) = β (t) dt (5.5)
t0
and
T
β (t) dt ∈ E(t, x0 (t)) ∀t ∈ [t0 , T ], (5.6)
t
where E(t, x0 (t)) is the set of imputations in the game (I, v(t, x0 (t))).
The IDP β (t), t ∈ [t0 , T ] of the solution M(t0 , x0 ) of the game Γ (t0 , x0 ) is called
dynamically stable (time-consistent) along the conditionally optimal trajectory
x0 (·) if
T
β (t) dt = M(t, x0 (t)), ∀t ∈ [t0 , T ]. (5.7)
t
The solution M(t0 , x0 ) of the game Γ (t0 , x0 ) is dynamically stable (time-
consistent) if along at least one conditionally optimal trajectory the dynamically
stable IDP exists.
Using the corollary from Theorem 5.1 we have the following result.
Theorem 5.2. For any conditionally optimal trajectory x0 (·) the following IDP of
the solution Sh(t0 , x0 ) of the game Γ (t0 , x0 )
d
β (t) = − Sh(t, x0 (t)), t ∈ [t0 , T ], (5.8)
dt
is the dynamically stable IDP along this trajectory. Therefore, the solution Sh(t0, x0 )
of the game Γ (t0 , x0 ) is dynamically stable.
5.5 Strategic Support of the Shapley Value
If in the game a cooperative agreement is reached and each player receives his
payoff according to the IDP (5.8), then it is natural to suppose that those who violate
this agreement are to be punished. The effectiveness of the punishment (sanctions)
comes to question of the existence of the strong Nash equilibrium in the differential
game Γ Sh (t0 , x0 ), which differs from Γ (t0 , x0 ) only by player payoffs.
The payoff of player i in Γ Sh (t0 , x0 ) is equal to
t(u(·)) T
(Sh, i) d
Ht0 ,x0 (u(·)) = − Shi (t, x0 (t)) dt + h(i) (t, x(t,t0 , x0 , u(·))) dt,
t0 dt t(u(·))
where t(u(·)) is the last time instant t ∈ [t0 , T ] for which
x0 (τ ) = x(τ ,t0 , x, u(·)) ∀τ ∈ [t0 ,t].
In this paper we use the following definition of the strong Nash equilibrium.
Definition 5.1. Let γ = I, {Xi }i∈I , {Ki }i∈I be an m-person game in normal form;
here I = [1 : m] is the set of players, Xi the set of strategies of player i, and
Ki : X = X1 × X2 × · · · × Xm → R
the payoff function of player i. We shall say that in the game γ there exists a strong
Nash equilibrium if
∀ε > 0 ∃xε = (xε1 , xε2 , . . . , xεm ) ∈ X
such that
∀S ⊂ I, ∀xS ∈ XS = ∏ Xi ,
i∈S
∑ Ki (xS , xεI\S ) − ε ≤ ∑ Ki (xε ),

i∈S i∈S
where
xεI\S = {xεj } j∈I\S (xεI\S ∈ XI\S ).
Let C(t∗ , x∗ ) be the core of the game (I, v(t∗ , x∗ )).

Theorem 5.3. If for at least one conditionally optimal trajectory x0 (·) in the game
Γ (t0 , x0 ) the condition
Sh(t, x0 (t)) ∈ C(t, x0 (t)) ∀t ∈ [t0 , T ], (5.9)
holds, then in the game Γ Sh (t0 , x0 ) there exists a strong Nash equilibrium.
The idea of the proof is as follows. Condition (5.9) can be rewritten in the form
∑ Shi(t, x0 (t)) ≥ vS (t, x0 (t)), ∀S ⊂ I, ∀t ∈ [t0 , T ]. (5.10)

i∈S
This means that at each time instant t ∈ [t0 , T ], moving along the conditionally
optimal trajectory x0 (·), no coalition can guarantee itself a payoff [t, T ] more than
according to IDP (5.8), i.e., more than
T T
d
∑ β (τ ) dτ = − ∑
dt
Sh(τ , x0 (τ )) dτ = ∑ Shi (t, x0 (t));
i∈S t i∈S t i∈S
at the same time, on the time interval [t0 ,t], according to the IDP, the coalition
already received a payoff equal to
t t
d
∑ βi (τ ) dτ = − ∑
dt
Shi (τ , x0 (τ )) dτ
i∈S t0 i∈S t0
= ∑ Shi(t0 , x0 ) − ∑ Shi(t, x0 (t)).

i∈S i∈S
Consequently, in the game Γ Sh (t0 , x0 ), no coalition can guarantee a payoff of

more than
∑ Shi(t0 , x0 )
i∈S
i.e., more than Sh(t0, x0 ). According to the cooperative solution x0 (·) but moving
always in the game Γ Sh (t0 , x0 ) along the conditionally optimal trajectory, each
coalition will receive its payoff according to the Shapley value. Thus no coalition
can benefit from the deviation from the conditionally optimal trajectory, which in
this case is natural to call a “strongly equilibrium trajectory.”
5.6 Conclusion
Let us conclude with some remarks about the limits of our approach. The main
condition that guarantees a strong strategic support of the Shapley value in the
m-person differential game under consideration is the fact that the Shapley value
belongs to the core of any subgame along a cooperative trajectory. This can
be guaranteed only if the cores are not void and the characteristic functions in the
subgames are convex. At the same time, one can easily verify that if instead of
the Shapley value any fixed imputation from the core is taken as the optimality
principle, then for strong strategic support of this imputation the principle condition
is that the cores in subgames along the cooperative trajectory will be nonempty.
In addition, strategic support of the cooperation proposed here based on the
notion of a strong Nash equilibrium is coalition proof in the sense that no coalition
can force its members to deviate from the cooperative trajectory because in any
deviating coalition there will be at least one player who is not interested in the
deviation.
References
1. Chistyakov, S.V.: To the solution of game problem of pursuit. Prikl. Math. i Mech. 41(5),
825–832 (1977) (in Russian)
2. Chistyakov, S.V.: Operatory znacheniya antagonisticheskikx igr (Value Operators in Two-
Person Zero-Sum Differential Games). St. Petersburg Univ., St. Petersburg (1999)
3. Chentsov, A.G.: On a game problem of converging at a given instant of time. Math. USSR
Sbornic 28(3), 353–376 (1976)
4. Chistyakov, S.V.: O beskoalizionnikx differenzial’nikx igrakx (On coalition-free differential
games). Dokl. Akad. Nauk. 259(5), 1052–1055 (1981). English transl. in Soviet Math. Dokl.
24(1), 166–169 (1981)
5. Fridman, A.: Differential Games. Wiley, New York (1971)
6. Petrosjan, L.A.: Differential Games of Pursuit. World Scientific, Singapore (1993)
7. Petrosjan, L.A.: The shapley value for differential games. In: Olsder, G.J. (ed.) Annals of the
International Society of Dynamic Games, vol. 3, pp. 409–417, Birkhauser (1995)
8. Petrosjan, L.A., Danilov, N.N.: Stability of solutions in nonzero-sum differential games with
integral payoffs, pp. 52–59. Viestnik Leningrad University, no. 1 (1979)
9. Petrosjan, L.A., Zenkevich, N.A.: Principles of stable cooperation. Math. Games Theory Appl.
1(1), 102–117 (2009) (in Russian)
10. Pschenichny, B.N.: E-strategies in differential games. In: Topics in Differential Games,
pp. 45–56. North-Holland Pub. Co. New York (1973)
Chapter 6
Characterization of Feedback Nash Equilibrium
for Differential Games
Yurii Averboukh
Abstract We investigate the set of Nash equilibrium payoffs for two-person

differential games. The main result of the paper is the characterization of the set
of Nash equilibrium payoffs in terms of nonsmooth analysis. In addition, we obtain
the sufficient conditions for a couple of continuous functions to provide a Nash
equilibrium. This result generalizes the method of the system of Hamilton–Jacobi
equations.
Keywords Nash equilibrium • Differential games • Nonsmooth analysis
6.1 Introduction
In this paper, we characterize Nash equilibrium payoffs for two-person differential

games. We consider non-zero-sum differential games in the framework of dis-
continuous feedback strategies. This approach was first proposed by Krasovskii
for zero-sum differential games [12]. The existence of the Nash equilibrium was
established in the works of Kononenko [11] and Kleimenov [10]. The proof is
based on the punishment strategy technique. This technique makes it possible to
characterize a set of Nash equilibrium payoffs [8, 10]. Further, the technique was
applied for the Nash equilibrium of stochastic differential games [5]. Another
approach is based on the system of Hamilton–Jacobi equations. This approach was
developed in [1] in the case of differentiable value functions. The Nash equilibrium
strategies for some class of the game in the one-dimensional case was constructed
on the basis of generalized solutions of the system of Hamilton–Jacobi equations by
Cardaliaguet and Plaskacz [6]. In addition, Cardaliaguet investigated the stability of
Y. Averboukh ()
Institute of Mathematics and Mechanics UrB RAS, S. Kovalevskaya Street 16,
GSP-384, Ekaterinburg 620990, Russia
110 Y. Averboukh
constructed solutions [4]. Bressan and Shen investigated the Nash equilibrium using
a hyperbolic system of conservation laws [2, 3]. An approach based on singular
surfaces was considered by Olsder [13].
In this paper, we develop the approach of Kononenko [11], Kleimenov [10], and
Chistyakov [8]. The main result is the characterization of the set of Nash equilibrium
payoffs in terms of nonsmooth analysis. In addition we obtain the sufficient
conditions for a pair of continuous functions to provide a Nash equilibrium. This
result generalizes the method of the systems of Hamilton–Jacobi equations.
6.2 Main Result
We consider the doubly controlled system
ẋ = f (t, x, u, v), t ∈ [t0 , ϑ0 ], x ∈ Rn , u ∈ P, v ∈ Q. (6.1)
Here u and v are the controls of Players I and II, respectively. Payoffs are terminal.
Player I wants to maximize σ1 (x(ϑ0 )), whereas Player II wants to maximize
σ2 (x(ϑ0 )). We assume that sets P and Q are compacts, function f , σ1 , and σ2 are
continuous, and f is Lipschitz continuous with respect to the phase variable and
satisfies the sublinear growth condition with respect to x.
We use the control design suggested in [10]. This control design follows the
Krasovskii discontinuous feedback formalization. A feedback strategy of Player I
is a pair of functions U = (u(t, x, ε ), β1 (ε )). Here u(t, x, ε ) is a function of position
(t, x) ∈ [t0 , ϑ0 ] × Rn and the precision parameter ε , β1 (ε ) is a continuous function
of the precision parameter. We suppose that β1 (ε ) → 0, ε → 0. Analogously, a
feedback strategy of Player II is a pair V = (v(t, x, ε ), β2 (ε )).
Let a position (t∗ , x∗ ) be chosen. The step-by-step motion is defined in the
following way. We suppose that the ith player chooses his own precision parameter
εi . Let Player I choose the partition of the interval [t∗ , ϑ0 ] Δ1 = {τ j }rj=0 . Assume
that the mesh of the partition Δ is less than ε1 . Suppose that Player II chooses the
partition Δ2 = {ξk }νk=1 of the mesh to be less than ε2 . The solution x[·] of Eq. (6.1)
with initial date x[t∗ ] = x∗ such that the control of Player I is equal to u(τ j , x[τ j ], ε1 )
on [τ j , τ j+1 ), and the control of Player II is equal to v(ξk , x[ξk ], ε2 ) on [ξk , ξk+1 )
is called a step-by-step motion. Denote it by x[·,t∗ , x∗ ;U, ε1 , Δ1 ;V, ε2 ; Δ2 ]. The set
of all step-by-step motions from the position (t∗ , x∗ ) under strategies U and V and
precision parameters ε1 and ε2 is denoted by X(t∗ , x∗ ;U, ε1 ;V, ε2 ). The step-by-step
motion is called consistent if ε1 = ε2 .
A limit of step-by-motions x[·,t k , xk ;U, ε1k , Δ1k ;V ε2k , Δ2k ] is called a constructive
motion if t k → t∗ , xk → x∗ , ε1k → 0, ε2k → 0, as k → ∞. Denote by X(t∗ , x∗ ;U,V ) the
set of constructive motions. By the Arzela–Ascoli theorem, the set of constructive
motions in nonempty. A consistent constructive motion is a limit of step-by-step
motions x[·,t k , xk ;U, ε k , Δ1k ;V, ε k , Δ2k ] such that t k → t∗ , xk → x∗ , ε1k → 0, ε2k → 0,
as k → ∞. Denote the set of all consistent constructive motions by X c (t∗ , x∗ ;U,V ).
This set is also nonempty.
6 Characterization of Feedback Nash Equilibrium for Differential Games 111
The following definition of the Nash equilibrium is used.

Definition 6.1. Let (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn . The pair of strategies U N and V N is said
to be a Nash equilibrium solution at the position (t∗ , x∗ ) if, for all strategies U and
V , the following inequalities hold:

max σ1 (x[ϑ0 ]) : x[·] ∈ X(t∗ , x∗ ,U,V N )

≤ min σ1 (xc [ϑ0 ]) : xc [·] ∈ X c (t∗ , x∗ ,U N ,V N ) .

max σ2 (x[ϑ0 ]) : x[·] ∈ X(t∗ , x∗ ,U N ,V )

≤ min σ2 (xc [ϑ0 ]) : xc [·] ∈ X c (t∗ , x∗ ,U N ,V N ) .
The payoff (σ1 (x[ϑ0 ]), σ2 (x[ϑ0 ])) determined by a Nash equilibrium solution is
called a Nash equilibrium payoff of a game. In the typical case, there are many Nash
equilibria with different payoffs. The set of all Nash equilibrium payoffs is called a
Nash value of a game and is denoted by N(t∗ , x∗ ). One can consider a multivalued
map taking (t∗ , x∗ ) to N(t∗ , x∗ ).
The set N(t∗ , x∗ ) is nonempty under the Isaacs condition [10, 11]. The proof is
based on the punishment strategy technique. If the Isaacs condition is not fulfilled,
then the Nash equilibrium solution exists in the class of mixed strategies or in the
class of the pair counterstrategy/strategy [10].
Below we suppose that the Isaacs condition holds: for all t ∈ [t0 , ϑ0 ], x, s ∈ Rn
min maxs, f (t, x, u, v) = max mins, f (t, x, u, v).
u∈P v∈Q v∈Q u∈P
Remark 6.1. If the Isaacs condition does not hold, then one can consider the
solution in the class of mixed strategies. To this end, we consider the doubly
controlled system

ẋ = f (t, x, u, v)ν (dv)μ (du), t ∈ [t0 , ϑ ], x ∈ Rn , μ ∈ rpm(P), ν ∈ rpm(Q).
P Q
(6.2)
Here μ is a generalized control of Player I, ν is a generalized control of Player II,

and rpm(P) and rpm(Q) are sets of regular probabilistic measures on P and Q,
respectively. We endow the sets rpm(P) and rpm(Q) with ∗-weak topology. The
obtained topology spaces are compacts. It is easy to show that the Isaacs condition
is fulfilled for system (6.2). Henceforth we will not mention the change from
system (6.1) to system (6.2).
Consider the zero-sum differential game Γ1 with its dynamic determined by (6.1)
and the payoff determined by σ1 (x(ϑ0 )). We assume that Player I wants to maximize
σ1 (x(ϑ0 )), while Player II is interested in minimizing it. There exists a value of the
game Γ1 . Denote it by ω1 . Analogously, consider the zero-sum differential game
with dynamics (6.1) and the payoff σ2 . We assume that Player II wants to maximize
112 Y. Averboukh
σ2 (x(ϑ0 )), whereas Player I want to minimize it. Denote the value function of this
game by ω2 .
Consider the initial value problem
ẋ ∈ F (t, x) co{ f (t, x, u, v) : u ∈ P, v ∈ Q}, x(t∗ ) = x∗ .
By Sol(t∗ , x∗ ) denote the set of its solutions.

Proposition 6.1. Let the multivalued map T : [t0 , ϑ0 ] × Rn → P(R2 ) satisfy the
following conditions:
(N1) T (ϑ0 , x) = {(σ1 (x), σ2 (x))} for all x ∈ Rn .
(N2) T (t, x) ⊂ [ω1 (t, x), ∞) × [ω2 (t, x), ∞) for all (t, x) ∈ [t0 , ϑ0 ] × Rn.
(N3) for all (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn , (J1 , J2 ) ∈ T (t∗ , x∗ ), there exists a motion
y(·) ∈ Sol(t∗ , x∗ ) such that
(J1 , J2 ) ∈ T (t, y(t)), t ∈ [t∗ , ϑ0 ].
Then T (t, x) ⊂ N(t, x) for all (t, x) ∈ [t0 , ϑ0 ] × Rn.

This proposition follows from [10, Theorem 1.4].
Henceforth, we limit our attention to closed multivalued maps. The map T :
[t0 , ϑ0 ] × Rn → P(R2 ) is called closed if its graph is closed, i.e., Cl[T ] = T . Here
Cl denotes the closure of the graph:
∞
[ClT ](t, x) (J1 , J2 ) : ∃ t k , xk ⊂ [t0 , ϑ0 ] × Rn ∃ zk1 , zk2 ⊂ R2 :
k=1

zk1 , zk2 ∈ T (t k , xk ), t k , xk → (t, x), zk1 , zk2 → (J1 , J2 ), as k → ∞ .
Let I be an indexing set. Let multivalued maps T α : [t0 , ϑ0 ]× Rn → P(R2 ), α ∈ I,

satisfy conditions (N1)–(N3). Define the map T ∗ : [t0 , ϑ0 ]× Rn → P(R2 ) by the rule
T ∗ = ClT , where

T (t, x) T α (t, x) .
α ∈I
The multivalued map T ∗ is closed, has compact images, and satisfies conditions
(N1)–(N3). By T + denote the closure of the pointwise union of all upper semicon-
tinuous multivalued maps from [t0 , ϑ0 ] × Rn to R2 satisfying conditions (N1)–(N3).
It follows [10] that T + (t, x) = N(t, x) for all (t, x) ∈ [t0 , ϑ0 ] × Rn.
Condition (N1) is a boundary condition, and condition (N2) is connected with
the theory of zero-sum differential games. Further, we formulate condition (N3) in
terms of viability theory and obtain the infinitesimal form of this condition.
Theorem 6.1. Let the map T : [t0 , ϑ0 ] × Rn → P(R2 ) be closed. Then condi-
tion (N3) is equivalent to the following one: for all (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn , (J1 , J2 ) ∈
T (t∗ , x∗ ) there exist θ > t∗ and y(·) ∈ Sol(t∗ , x∗ ) such that
(J1 , J2 ) ∈ T (t, y(t)), t ∈ [t∗ , θ ].
Theorem 6.1 is proved in Sect. 6.4.

To obtain the infinitesimal form of condition (N3), we define a derivative of a
multivalued map. By dist denote the following planar distance between the point
(J1 , J2 ) ∈ R2 and the set A ⊂ R2 :
dist[(J1 , J2 ), A] inf{|ζ1 − J1 | + |ζ2 − J2 | : (ζ1 , ζ2 ) ∈ A}.
Define the directional derivative of the multivalued map by the rule
dist[(J1 , J2 ), T (t + δ , x + δ w )]
DH T (t, x; (J1 , J2 ), w) lim inf .
δ ↓0,w →w δ
Theorem 6.2. Let T : [t0 , ϑ0 ] × Rn → P(R2 ) be closed. Then condition (N3) at the
position (t∗ , x∗ ) ∈ [t0 , ϑ0 ] × Rn is equivalent to the following one:
sup inf DH T (t∗ , x∗ ; (J1 , J2 ), w) = 0. (6.3)

(J1 ,J2 )∈T (t∗ ,x∗ ) w∈F (t∗ ,x∗ )
Theorem 6.2 is proved in Sect. 6.4.

Introduce the set
∂ T (t∗ , x∗ ; (J1 ; J2 )) {w : DH T (t∗ , x∗ ; (J1 , J2 ), w) = 0} .
Condition (6.3) can be formulated in the following way:
∂ T (t∗ , x∗ ; (J1 ; J2 )) ∩ F (t∗ , x∗ ) = 0, ∀(J1 , J2 ) ∈ T (t∗ , x∗ ).
This statement follows from the proof of Theorem 6.2.

Let us introduce a sufficient condition for the function (c1 , c2 ) : [t0 , ϑ0 ]×Rn → R2
to provide a Nash equilibrium. Denote
H1 (t, x, s) max mins, f (t, x, u, v),

u∈P v∈Q
H2 (t, x, s) max mins, f (t, x, u, v).

v∈Q u∈P
Let (c1 , c2 ) : [t0 , ϑ0

] × Rn→ (t, x) ∈ [t0 , ϑ0 ] × Rn , w ∈ Rn . Define a modulus
R2 ,
derivative at the position (t, x) in the direction w ∈ Rn by the rule
dabs (c1 , c2 )(t, x; w)

|c1 (t + δ , x + δ w ) − c1(t, x)| + |c2 (t + δ , x + δ w ) − c2 (t, x)|
lim inf .
δ ↓0,w →w δ
114 Y. Averboukh
Corollary 6.1. Suppose that the function (c1 , c2 ) : [t0 , ϑ0 ]× Rn → R2 is continuous,

(c1 (ϑ0 , ·), c2 (ϑ0 , ·)) = (σ1 (·), σ2 (·)), for each i the function ci is a viscosity
supersolution of the equation
∂ ci
+ Hi (t, x, ∇ci ) = 0, (6.4)
∂t
and for all (t, x) ∈ [t0 , ϑ0 ] × Rn
inf dabs (c1 , c2 )(t, x; w) = 0.

w∈F (t,x)
Then for all (t, x) ∈ [t0 , ϑ0 ] × Rn the couple (c1 (t, x), c2 (t, x)) is a Nash equilibrium
payoff of the game.
Corollary 6.1 follows from the definition of modulus derivative and the
property of the upper solution of equation (6.4) [14]: ωi (t, x) ≤ ci (t, x) for all
(t, x) ∈ [t0 , ϑ0 ] × Rn.
Let us show that the proposed method is a generalization of the method based on
the system of Hamilton–Jacobi equations. This method provides a Nash solution in
the class of continuous strategies [1].
Proposition 6.2. Let the function (ϕ1 , ϕ2 ) : [t0 , ϑ0 ]×Rn → R2 be differentiable, and
(ϕ1 (ϑ0 , ·), ϕ2 (ϑ0 , ·)) = (σ1 (·), σ2 (·)). Suppose that the function (ϕ1 , ϕ2 ) satisfies the
following condition: for all positions (t, x) ∈ [t0 , ϑ0 ] × Rn there exist un ∈ P, vn ∈ Q
such that

max ∇ϕ1 (t, x), f (t, x, u, vn ) = ∇ϕ1 (t, x), f (t, x, un , vn ) , (6.5)
u∈P

max ∇ϕ2 (t, x), f (t, x, un , v) = ∇ϕ2 (t, x), f (t, x, un , vn ) (6.6)
v∈Q
∂ ϕi (t, x)
+ ∇ϕi (t, x), f (t, x, un , vn ) = 0, i = 1, 2. (6.7)
∂t
Then the function (ϕ1 , ϕ2 ) satisfies the conditions of Corollary 6.1.

This proposition is proved in Sect. 6.4.
If for each position (t, x) ∈ [t0 , ϑ0 ]× Rn and the pair of directions s1 , s2 ∈ Rn there
exists the pair of controls (un , vn ) such that

max s1 , f (t, x, u, vn ) = s1 , f (t, x, un , vn ) ,
u∈P

max s2 , f (t, x, un , v) = s2 , f (t, x, un , vn ) ,
v∈Q
then the Hamiltonians Hi are well defined by the rule

Hi (t, x, s1 , s2 ) si , f (t, x, un , vn ) , i = 1, 2.
In this case, condition (6.7) is equal to the following one: (ϕ1 , ϕ2 ) is a solution of
the system
∂ ϕi
+ Hi(t, x, ∇ϕ1 , ∇ϕ2 ) = 0, i = 1, 2.
∂t
6.3 Example
Consider the non-zero-sum differential game with the dynamic

ẋ = u,
(6.8)
ẏ = v,
t ∈ [0, 1], u, v ∈ [−1, 1]. Payoffs are determined by the formulas σ1 (x, y) −|x − y|,
σ2 (x, y) y. We recall that each player wants to maximize his payoff.
To determine the multivalued map N : [0, 1] × R2 → P(R2 ), we use auxiliary
multivalued maps Si : [0, 1] × R2 → P(R) such that

Si (t, x∗ , y∗ ) z ∈ R : ωi (t, x∗ , y∗ ) ≤ z ≤ c+
i (t, x∗ , y∗ ) .
Here
⎛ ⎞
ϑ0 ϑ0
c+
i (t, x, y) sup σi ⎝x + u(ξ ) dξ , y + v(ξ ) dξ ⎠.
u∈U,v∈V
t t
Obviously,
N(t, x∗ , y∗ ) ⊂ S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ ). (6.9)
First we determine the map S2 . The value function of the game Γ2 is equal to
ω2 (t, x∗ , y∗ ) = y∗ + (1 − t). In addition, c+
2 (t, x∗ , y∗ ) = y∗ + (1 − t). Consequently,
S2 (t, x∗ , y∗ ) = {y∗ + (1 − t)}. (6.10)
Let us determine the set S1 . The programmed iteration method [7] yields that
ω1 (t, x∗ , y∗ ) = −|x∗ − y∗ |.
Moreover,

c+
1 (t, x∗ , y∗ ) = min − |x∗ − y∗ | + 2(1 − t), 0 .
We obtain that

S1 (t, x∗ , y∗ ) = ω1 (t, x∗ , y∗ ), c+
1 (t, x∗ , y∗ ) . (6.11)
116 Y. Averboukh
Now we determine the map N(t, x∗ , y∗ ). The linearity of the right-hand side
of (6.8) and the convexity of control spaces yield that any measurable control
functions u(·) ∈ U and v(·) ∈ V can be substituted by the constant controls u ∈ P,
v ∈ Q. We have that for all (J1 , J2 ) ∈ S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ )
DH N(t, x∗ , y∗ ; (J1 , J2 ), (u, v))

|y∗ + δ v + (1 − t − δ ) − y∗ − (1 − t)|
≥ lim inf = |v − 1|.
δ ↓0,v →v δ
Therefore, if DH N(t, x∗ , y∗ ; (J1 , J2 ), w) = 0 for a couple w = (u, v), then v = 1.

First we consider the case y∗ ≥ x∗ . Let (J1 , J2 ) ∈ N(t, x∗ , y∗ ). There exists a
motion (x(·), y(·)) ∈ Sol(t, x∗ , y∗ ) such that (J1 , J2 ) ∈ N(θ , x(θ ), y(θ )), θ ∈ [t, 1].
Since DH N(t, x∗ , y∗ ; (J1 , J2 ), w) = 0 only if v = 1, there exists u ∈ [−1, 1] such
that x(1) = x∗ + u(1 − t), y(1) = y∗ + (1 − t). Consequently, y(1) ≥ x(1). From
condition (N2) we obtain that
J1 = σ1 (x(1), y(1)) = −y(1) + x(1) = −y∗ − (1 − t) + x∗ + u(1 − t)

= −y∗ + x∗ + (1 − t)(u − 1) ≤ −y∗ + x∗ = −|x∗ − y∗ |.
The equality is achieved only if u = 1. Condition (N1) yields that the inclusion
N(t, x∗ , y∗ ) ⊂ {(−|x∗ − y∗ |, y∗ + (1 − t))}
is fulfilled. Substituting the value (1, 1) for w in the formula for DH N(t, x∗ , y∗ ;
(−|x∗ − y∗ |, y∗ + (1 − t)), w) we claim that for y∗ ≥ x∗

N(t, x∗ , y∗ ) = (−|x∗ − y∗ |, y∗ + (1 − t)) .
Now let y∗ < x∗ . We shall show that
N(t, x∗ , y∗ ) = S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ )

= − |x∗ − y∗ |, min − |x∗ − y∗ | + 2(1 − t), 0 × y∗ + (1 − t) .
Clearly, conditions (N1) and (N2) hold for this map. Let γ0 be a maximal number
of segment [0, 2] such that −|x∗ − y∗ | + γ0 (1 − t) ≤ 0. If (J1 , J2 ) ∈ N(t, x∗ , y∗ ), then
J2 = y∗ + (1 − t), J1 = −|x∗ − y∗ | + d(1 − t) for some d ∈ [0, γ0 ]. Let us prove that
there exists a number δ > 0 with the property
(J1 , J2 ) ∈ N(t + δ , x∗ + δ u, y∗ + δ ) (6.12)

for u = 1 − d. It is sufficient to prove that

J1 ∈ y∗ − x∗ + δ d, min y∗ − x∗ + δ d + 2(1 − t − δ ), 0 .
Indeed, y∗ − x∗ + d(1 −t) ≥ y∗ − x∗ + δ d for δ < (1 −t). Since d ≤ γ0 , we obtain that
J1 = y∗ − x∗ + d(1 − t) ≤ y∗ − x∗ + δ d + γ0 (1 − t − δ ).
Also,

y∗ − x∗ + δ d + γ0 (1 − t − δ ) ≤ min y∗ − x∗ + d δ + 2(1 − t − δ ); 0 .
Actually, since γ0 ≤ 2, the following inequality is fulfilled:
y∗ − x∗ + δ d + γ0 (1 − t − δ ) ≤ y∗ − x∗ + d δ + 2(1 − t − δ ).
Moreover, y∗ − x∗ + δ d + γ0 (1 − t − δ ) ≤ δ d − γ0 δ ≤ 0. Thus the condition

J1 ≤ min y∗ − x∗ + δ d + 2(1 − t − δ ), 0 = y∗ − x∗ + δ d + γ1 (1 − t − δ )
is valid also. It follows from (6.12) that
DH N(t, x∗ , y∗ ; (J1 , J2 ), (1 − d, 1)) = 0.
Since N(t, x∗ , y∗ ) coincide with the set S1 (t, x∗ , y∗ ) × S2 (t, x∗ , y∗ ) in this case, we
claim that the set N(t, x∗ , y∗ ) is a Nash value of the game at the position (t, x∗ , y∗ ).
Let us compare the obtained result with the method based on the system of
Hamilton–Jacobi equations [1]. In the considered case the system of equations is
given by
⎧
⎪
⎪ ∂ ϕ1 ∂ ϕ1 n ∂ ϕ1 n
⎨ ∂ t + ∂ x u (t, x, y) + ∂ y v (t, x, y) = 0,
⎪
(6.13)
⎪
⎪ ∂ ϕ ∂ ϕ ∂ ϕ
⎪
⎩ 2
+
2
u (t, x, y) +
n 2
v (t, x, y) = 0.
n
∂t ∂x ∂y
Here the values un (t, x, y) and vn (t, x, y) are determined by the following conditions:
∂ ϕ1 (t, x, y) n ∂ ϕ (t, x, y)
1
u (t, x, y) = max u ,
∂x u∈P ∂x

∂ ϕ2 (t, x, y) n ∂ ϕ2 (t, x, y)
v (t, x, y) = max v .
∂y v∈Q ∂y
118 Y. Averboukh
It follows from Proposition 6.2 that if a pair of functions (ϕ1 , ϕ2 ) is a solution

of system (6.13), then ϕ2 (t, x, y) = y + (1 − t). Thus, vn (t, x, y) = 1. Consequently,
system (6.13) reduces to the equation

∂ ϕ1 ∂ ϕ1 ∂ ϕ1
+ + = 0. (6.14)
∂t ∂x ∂y
By [14, Theorem 5.6] we obtain that the function

⎧
⎨ x − y, x ≤ y,
ϕ1 (t, x, y) = −x + y + 2(1 − t), x > y, −x + y + 2(1 − t) < 0,
⎩
0, x > y, −x + y + 2(1 − t) ≥ 0
is a minimax (viscosity) solution of Eq. (6.14). Indeed if ϕ1 is smooth at (t, x, y),

then Eq. (6.14) is fulfilled in the classical sense. On the planes {(t, x, y) : x = y}
and {(t, x, y) : −x + y + 2(1 − t) = 0} we have that the Clarke subdifferential is the
convex hull of two limits of partial derivatives of the function ϕ1 . By the well-known
properties of subdifferentials and superdifferentials, the continuity, and the positive
homogeneity of Eq. (6.14), we obtain that ϕ1 satisfies conditions U4 and L4 of [14].
The function ϕ1 is nonsmooth. Since the minimax solution is unique, and any
classical solution is minimax, we claim that system (6.13) has no classical solution.
One may obtain from the formulae for N(t, x, y) that (ϕ1 (t, x, y), ϕ2 (t, x, y)) ∈
N(t, x, y). Moreover,

ϕ1 (t, x, y) = max J1 ∈ R : ∃J2 ∈ R (J1 , J2 ) ∈ N(t, x, y) ,

{ϕ2 (t, x, y)} = J1 ∈ R : ∃J2 (J1 , J2 ) ∈ N(t, x, y) .
In other words, the value (ϕ1 (t, x, y), ϕ2 (t, x, y)) is the maximal Nash equilibrium
payoff of the game at the position (t, x, y).
One can check that the pair of functions (ϕ1 , ϕ2 ) satisfies the conditions of
Corollary 6.1. Simultaneously, there exists a family of functions satisfying the
conditions of Corollary 6.1. Actually, for γ ∈ [0, 2] define

γ −|x∗ − y∗ |, y∗ ≥ x∗ ,
c1 (t, x∗ , y∗ ) =
min − |x∗ − y∗| + γ (1 − t); 0 , y∗ < x∗ ,
γ
c2 (t, x∗ , y∗ ) = y∗ + (1 − t).
γ γ
Let us show that the pair of functions (c1 , c2 ) satisfies the conditions of Corol-
lary 6.1. We have that in our case,
H1 (t, x, y, sx , sy ) = |sx | − |sy |, H2 (t, x, y, sx , sy ) = |sy | − |sx |.

γ
First we prove that the functions ci are the supersolutions of equations (6.4). By [14,
condition U4] it suffices to show that for all (t, x, y) ∈ [t0 , ϑ0 ] × R2 (a, sx , sy ) ∈
γ
D− ci (t, x, y) the following inequality holds:
a + Hi(sx , sy ) ≤ 0, i = 1, 2. (6.15)
Here D− denotes the subdifferential [14, (6.10)]. The computing of subdifferentials

gives that
⎧
⎪
⎪ {(0, −1, 1)}, y > x,
⎪
⎪
⎪ y < x < y + γ (1 − t),
⎨ {(0, 0, 0)},
⎪
− γ
D c1 (t, x, y) = {(−γ , −1, 1)}, x > y + γ (1 − t),
⎪
⎪
⎪
⎪ {(0, λ , −λ ) : λ ∈ [0, 1]}, x = y,
⎪
⎪
⎩
{(−λ γ , −λ , λ ) : λ ∈ [0, 1]}, x = y + γ (1 − t),
γ
D− c2 (t, x, y) = {(−1, 0, 1)}.
Substituting the values of subdifferentials, we obtain that (6.15) is valid for i = 1, 2.

γ γ
Also ci (1, x∗ , y∗ ) = σi (x∗ , y∗ ). Moreover, dabs (c1 , c2 )(t, x∗ , y∗ ; 1 − d, 1) = 0 for
⎧
⎨ 0, y ≥ x∗ ;
d= ∗
⎩ max r ∈ [0, γ ] : −|x∗ − y∗| + r(1 − t) ≤ 0 , y∗ < x∗ .
Note that (ϕ1 , ϕ2 ) = (c21 , c22 ).
6.4 Weak Invariance of the Set of Values
In this section, the statements formulated in Sect. 6.2 are proved.

Proof of Theorem 6.1. If condition (N3) holds, then one can set θ = ϑ0 .
Now suppose that for all (t∗ , x∗ ), (J1 , J2 ) ∈ T (t∗ , x∗ ) there exist θ ∈ [t∗ , ϑ0 ] and a
motion y(·) ∈ Sol(t∗ , x∗ ) such that the following condition is fulfilled:
(J1 , J2 ) ∈ T (t, y(t)), t ∈ [t∗ , θ ]. (6.16)
Let Θ be a set of moments θ satisfying condition (6.16) for some y(·) ∈

Sol(t∗ , x∗ ). Denote τ sup Θ . We have that τ ∈ Θ . Indeed, let a sequence
{θk }∞
k=1 ⊂ Θ tend to τ . One can assume that θk < θk+1 ≤ τ . For every k condi-
tion (6.16) is valid under θ = θk , y(·) = yk (·) ∈ Sol(t∗ , x∗ ). The compactness of
Sol(t∗ , x∗ ) yields that yk (·) → y∗ (·), as k → ∞; here y∗ (·) is an element of Sol(t∗ , x∗ ).
120 Y. Averboukh
The closeness of the map T gives that for all k (J1 , J2 ) ∈ T (t, y∗ (t)), t ∈ [t∗ , θk ]. By
the same argument we claim that (J1 , J2 ) ∈ T (τ , y∗ (τ )). Denote x∗ = y∗ (τ ).
Let us show that τ = ϑ0 . If τ < ϑ0 , then there exist a motion ŷ(·) ∈ Sol(τ , x∗ ) and
a moment θ > τ such that (J1 , J2 ) ∈ T (t, ŷ(t)), t ∈ [τ , θ ]. Consider a motion

y∗ (t), t ∈ [t∗ , τ ],
ỹ(t)
ŷ(t), t ∈ [τ , θ ].
By the definition of θ it follows that (6.16) is valid under θ = θ , y(·) = ỹ(·).

Thus θ ∈ Θ , but this contradicts with the choice of τ . Consequently, τ = ϑ0 , and
condition (N3) holds.

Proof of Theorem 6.2. Let us introduce a graph of the map T
grT {(t, x, J1 , J2 ) : (t, x) ∈ [t0 , ϑ0 ] × Rn, (J1 , J2 ) ∈ T (t, x)}.
One can reformulate the condition of Theorem 6.1 in the following way: the
graph of T is weakly invariant under the differential inclusion
⎛ ⎞ ⎧⎛ ⎞ ⎫
ẋ ⎨ f (t, x, u, v) ⎬
(t, x) co ⎝
⎝ J˙1 ⎠ ∈ F 0 ⎠ : u ∈ P, v ∈ Q .
⎩ ⎭
J˙2 0
The condition of weak invariance of the multivalued map T under differential

is equivalent [9, 14] to the condition
inclusion F
(t, x) = ∅
Dt (grT )(t, x, J1 , J2 ) ∩ F (6.17)
for all (t, x) ∈ [t0 , ϑ0 ] × Rn , (J1 , J2 ) ∈ T (t, x). Here Dt denotes the right-hand
derivative in t. It is defined in the following way. Let G ⊂ [t0 , ϑ0 ] × Rm , G[t] denote
a section of G by t:
G[t] {w ∈ Rm : (t, x) ∈ G},
and the symbol d denote the Euclidian distance between a point and a set. Following
[9, 14] set
#
d(y + δ h; G[t + δ ])
(Dt G)(t, y) h ∈ R : lim inf
m
=0 .
δ →0 δ
Let us show that conditions (6.3) and (6.17) are equivalent.

Condition (6.3) means that for every couple (J1 , J2 ) ∈ T (t, x) the following
condition holds:

dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
inf lim inf = 0.
w∈F (t,x) δ ↓0,γ ∈Rn ,γ ↓0 δ
The lower boundary by w in the formula

dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
inf lim inf
w∈F (t,x) δ ↓0,γ ∈Rn ,γ ↓0 δ
is attained for all (J1 , J2 ) ∈ T (t, x). Indeed, let {wr }∞

r=1 be a minimizing sequence.
By the compactness of F (t, x) one can assume that wr → w∗ , r → ∞, w∗ ∈ F (t, x).
Let us show that

dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
b̃ inf lim inf
w∈F (t,x) δ ↓0,γ ∈Rn ,γ ↓0 δ

dist (J1 , J2 ), T (t + δ , x + δ (w∗ + γ ))
= lim inf . (6.18)
δ ↓0,γ ∈Rn ,γ ↓0 δ
Indeed, for every r ∈ N there exist sequences {δ r,k }∞ r,k ∞

k=1 , {γ }k=1 such that
δ , γ → 0, as k → ∞, and
r,k r,k

dist (J1 , J2 ), T (t + δ ,t + δ (wr + γ ))
b
r
lim inf
δ ↓0,γ ∈Rn ,γ ↓0 δ

dist (J1 , J2 ), T (t + δ r,k ,t + δ r,k (wr + γ r,k ))
= lim .
k→∞ δ r,k
Let k̂(r) be a number such that

dist(J , J ), T (t + δ r,k̂(r) ,t + δ r,k̂(r) (wr + γ r,k̂(r) ))
r
− b < 2−r .
1 2
δ r,k̂(r)
, γ r,k̂(r)
,
δ r,k̂(r)
Set δ̂ r δ r,k̂(r) , γ̂ r γ r,k̂(r) + wr − w∗ . Note that δ̂ r , γ̂ r → 0, r → ∞.

We have that

dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
inf lim inf
w∈F (t,x) δ ↓0,γ ∈Rn ,γ ↓0 δ

dist (J1 , J2 ), T (t + δ , x + δ (w∗ + γ ))
≤ lim inf
δ ↓0,γ ∈Rn ,γ ↓0 δ

dist (J1 , J2 ), T (t + δ̂ r , x + δ̂ r (w∗ + γ̂ r ))
≤ lim . (6.19)
r→∞ δ̂ r
122 Y. Averboukh
Further,

dist (J1 , J2 ), T (t + δ̂ r , x + δ̂ r (w∗ + γ̂ r ))
δ̂ r

dist (J1 , J2 ), T (t + δ r,k̂(r) , x + δ r,k̂(r) (w∗ + γ r,k̂(r) + wr − w∗ ))
=
δ r,k̂(r)

dist (J1 , J2 ), T (t + δ r,k̂(r) , x + δ r,k̂(r) (wk + γ r,k̂(r) ))
=
δ r,k̂(r)
≤ br + 2−r → b̃, r → ∞.
We have that in (6.19) the right- and left-hand sides are equal. This means that
condition (6.18) is valid.
Thus, condition (6.3) is equivalent to the following one: for all (J1 , J2 ) ∈ T (t, x)
there exists w ∈ F (t, x) such that

dist (J1 , J2 ), T (t + δ , x + δ (w + γ ))
lim inf
δ ↓0,γ ∈Rn ,γ ↓0 δ
#
|ζ1 − J1 | + |ζ2 − J2 |
= lim inf inf : (ζ1 , ζ2 ) ∈ T (t + δ , x + δ (w + γ )) = 0.
δ ↓0,γ ∈Rn ,γ ↓0 δ
(6.20)
Now let us prove that this condition is equivalent to condition (6.17).

First we assume that condition (6.17) is valid. This means that there exist
sequences {δ k }∞ k ∞ k ∞ k ∞
k=1 ⊂ R, {γ }k=1 ⊂ R , {ε1 }k=1 , {ε2 }k=1 ⊂ R such that
n
δ k , γ k , ε1k , ε2k → 0, as k → ∞;
•
• t + δ k , x + δ k (w + γ k ), J1 + δ k ε1k , J2 + δ k ε2k ∈ grT .
One can reformulate the second condition as
(J1 + δ k ε1k , J2 + δ k ε2k ) ∈ T (t + δ k ,t + δ k (w + γ k )).
Thus,
#
|ζ1 − J1 | + |ζ2 − J2|
inf : (ζ1 , ζ2 ) ∈ T (t + δ , x + δ (w + γ )) = ε1k + ε2k .
k k
δk
By the choice {ε1k }, {ε2k } we obtain that condition (6.20) holds.

Now let condition (6.20) be fulfilled, and prove that (6.17) is valid. Indeed,$ let
{δ k }∞ ∞
k=1 , {γ }k=1 be a minimizing sequence. By the compactness of the sets T t +
%
δ , x + δ (w + γ k ) for each k there exist ε1k and ε2k such that
k k
$ %
J1 + δ k ε1k , J2 + δ k ε2k ∈ T (t, x + δ k (w + γ k )).
It follows from (6.20) that ε1k , ε2k → 0, k → ∞. Let us estimate d((w + δ k w, J1 , J2 ),

grT [t + δ k ]). We have that

t + δ k , x + δ k (w + γ k ), J1 + δ k ε1k , J2 + δ k ε2k ∈ grT .
Consequently,
&
d (w + δ k w, J1 , J2 ), grT [t + δ k ] ≤ δ k γ k 2 + (ε1k )2 + (ε2k )2 .
The convergence δ k , γ k , ε1k , ε2k → 0 as k → ∞ yields the equality

⎛ ⎞
w
⎝ 0 ⎠ ∈ Dt (grT )(t, x, J1 , J2 ).
0
(t, x) = F × {(0, 0)}, we claim that (6.17) is fulfilled.

Since F

Proof of Proposition 6.2. It follows from (6.5) and the Isaacs condition that

∇ϕ1 (t, x), f (t, x, un , vn ) ≥ max min ∇ϕ1 (t, x), f (t, x, u, v) = H1 (t, x, ∇ϕ1 (t, x)).
u∈P v∈Q
Analogously, it follows from (6.6) and the Isaacs condition that

∇ϕ2 (t, x), f (t, x, un , vn ) ≥ max min ∇ϕ2 (t, x), f (t, x, u, v) = H2 (t, x, ∇ϕ2 (t, x)).
v∈Q u∈P
Therefore, using (6.7) we claim that
∂ ϕi (t, x)
+ Hi (t, x, ∇ϕi (t, x)) ≤ 0, i = 1, 2.
∂t
Since the function ϕi is differentiable, its subdifferential at the position (t, x) is equal
to {∂ ϕ1 (t, x)/∂ t, ∇ϕ1 (t, x)}. Consequently, the function ϕ1 is the upper solution of
Eq. (6.4) for i = 1 [14, Condition (U4)]. Analogously, the function ϕ2 is the upper
solution of Eq. (6.4) for i = 2.
Now let us show that dabs (ϕ1 , ϕ2 )(t, x; w) = 0 for w ∈ F (t, x). Put w =
f (t, x, un , vn ). Indeed,
dabs (ϕ1 , ϕ2 )(t, x; w) = lim inf

δ ↓0,γ →0
|ϕ1 (t + δ , x + δ (w + γ )) − ϕ1(t, x)| + |ϕ2 (t + δ , x + δ (w + γ )) − ϕ2(t, x)|

.
δ
124 Y. Averboukh
Let {δ k }∞ k ∞
k=1 ⊂ R, {γ }k=1 ⊂ R be a minimizing sequence. Then
n
dabs (ϕ1 , ϕ2 )(t, x; w) = lim

k→∞
|ϕ1 (t + δ k , x + δ k (w + γ k )) − ϕ1 (t, x)| + |ϕ2 (t + δ k , x + δ (w + γ k )) − ϕ2 (t, x)|

δk

1 ∂ ϕ1 (t, x) k

= lim k δ + ∇ϕ1 (t, x), δ k (w + γ k ) + o(δ k )
k→∞ δ ∂t
∂ ϕ (t, x)
2 k
+ δ + ∇ϕ2 (t, x), δ (w + γ ) + o(δ )
k k k
∂t
∂ ϕ (t, x) ∂ ϕ (t, x)
1 2
= + ∇ϕ1 (t, x), w + + ∇ϕ2 (t, x), w.
∂t ∂t
By choice of w = f (t, x, un , vn ) and condition (6.7) we have that
∂ ϕ1 (t, x) ∂ ϕ2 (t, x)
+ ∇ϕ1 (t, x), w = + ∇ϕ2 (t, x), w = 0.
∂t ∂t
Thus dabs (ϕ1 , ϕ2 )(t, x; w) = 0.

Acknowledgements This work was supported by the Russian Foundation for Basic Research
(Grant No. 09-01-00436-a), a grant of the president of the Russian Federation (Project MK-
7320.2010.1), and the Russian Academy of Sciences Presidium Programs of Fundamental
Research, Mathematical Theory of Control.
References
1. Basar, T., Olsder G.J.: Dynamic Noncooperative Game Theory. SIAM, Philadelphia (1999)
2. Bressan, A., Shen, W.: Semi-cooperative strategies for differential games. Int. J. Game Theory
32, 561–59 (2004)
3. Bressan, A., Shen, W.: Small BV solutions of hyperbolic noncooperative differential games.
SIAM J. Control Optim. 43, 194–215 (2004)
4. Cardaliaguet, P.: On the instability of the feedback equilibrium payoff in a nonzero-sum
differential game on the line. Ann. Int. Soc. Dyn. Games 9, 57–67 (2007)
5. Cardaliaguet, P., Buckdahn, R., Rainer, C.: Nash equilibrium payoffs for nonzero-sum
stochastic differential games. SIAM J. Control Optim. 43, 624–642 (2004)
6. Cardaliaguet, P., Plaskacz, S.: Existence and uniqueness of a Nash equilibrium feedback for a
simple nonzero-sum differential game. Int. J. Game Theory 32, 33–71 (2003)
7. Chentsov, A.G.: On a game problem of converging at a given instant time. Mat. USSR Sb. 28,
353–376 (1976)
8. Chistyakov, S.V.: On noncooperative differential games. Dokl. AN USSR 259, 1052–1055
(1981) (in Russian)
9. Guseinov, H.G., Subbotin, A.I., Ushakov, V.N.: Derivatives for multivalued mappings with
applications to game-theoretical problems of control. Problems Control Inform. Theory 14,
155–167 (1985)
10. Kleimenov, A.F.: Non zero-sum differential games. Ekaterinburg, Nauka (1993) (in Russian)
11. Kononenko, A.F.: On equilibrium positional strategies in nonantagonistic differential games.
Dokl. AN USSR 231, 285–288 (1976) (in Russian)
12. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer, New York
(1988)
13. Olsder, G.J.: On open- and closed-loop bang-bang control in nonzero-sum differential games.
SIAM J. Control Optim. 49(4), 1087–1106 (2001)
14. Subbotin, A.I.: Generalized Solutions of First-order PDEs. The Dynamical Perspective.
Birkhauser, Boston, Ins., Boston (1995)
Chapter 7
Nash Equilibrium Payoffs in Mixed Strategies
Anne Souquière
Abstract We consider non zero sum two players differential games. We study
Nash equilibrium payoffs and publicly correlated equilibrium payoffs. If players
use deterministic strategies, it has been proved that the Nash equilibrium payoffs
are precisely the reachable and consistent payoffs. Referring to repeated games, we
introduce mixed strategies which are probability distributions over pure strategies.
We give a characterization of the set of Nash equilibrium payoffs in mixed strategies.
Unexpectedly, this set is larger than the closed convex hull of the set of Nash
equilibrium payoffs in pure strategies. Finally, we study the set of publicly correlated
equilibrium payoffs for differential games and show that it is the same as the set of
Nash equilibrium payoffs using mixed strategies.
Keywords Non cooperative differential games • Nash equilibrium payoff

• Publicly correlated equilibrium payoff • Mixed strategy
7.1 Introduction
We study equilibria for non zero sum differential games. In general, for a given
equilibrium concept, existence and characterization of the equilibria highly depend
on the strategies used by the players. There are mainly three types of strategies:
• Non-anticipative strategies or memory-strategies where the control depends on
the entire past history of the game (trajectory and controls played so far).
A. Souquière ()
Institut TELECOM, TELECOM Bretagne, UMR CNRS 3192 Lab-STICC,
Technopole Brest Iroise, CS 83818, 29238 Brest Cedex, France
Laboratoire de Mathématiques, Université de Bretagne Occidentale,
UMR 6205, 6 Avenue Victor Le Gorgeu, CS 93837, 29238 Brest Cedex, France
128 A. Souquière
• Feed-back strategies where the current control depends only on the actual state
of the system.
• Open-loop controls where the control depends only on time.
Looking for Nash equilibrium payoffs in feedback strategies, one usually computes
Nash equilibrium payoffs as functions of time and space. This leads to a system
of non linear partial differential equations for which there is no general result for
existence nor uniqueness of a solution. If the system admits regular enough solu-
tions, they allow to compute the optimal feedbacks [3, 12]. There are few examples
for this approach, the results essentially deal with linear quadratic differential games
where solutions are sought amongst quadratic functions. For linear quadratic games,
there are conditions for existence of Nash equilibria in feedback strategies and
for existence and uniqueness of Nash equilibria in open-loops. Some numerical
methods can be applied to compute equilibria [11]. The drawback is that feedback
equilibria are highly unstable [5], except in some particular cases of one dimensional
games [6].
In the case of deterministic differential games where players use non-anticipative
strategies, there are existence and characterization results for Nash equilibrium
payoffs in [15, 16, 20]. Our aim is to extend this characterization to the case
where players use mixed non-anticipative strategies, namely random combination
of memory-strategies. The disadvantage of using non-anticipative strategies is that
the associated equilibria lack weak consistency compared to feedback strategies.
Their main interest is that they allow to characterize some kind of upper hull of all
Nash equilibrium payoffs using reasonable strategies.
The notion of mixed strategies is strongly inspired by repeated games. The folk
theorem for repeated games characterizes Nash equilibrium payoffs as feasible and
individually rational [2, 18]. As in repeated games, the difficulty is that mixed
strategies are unobservable [13].
Deterministic nonzero sum differential games are close to stochastic games,
where there is a characterization of the set of correlated equilibria in case the
punishment levels do not depend on the past history. This characterization, relying
on “rational payoffs” [19] is close to ours and to the characterization of Nash
equilibrium payoffs for stochastic games [10]. Our point is to give the link between
these two sets. However, in our case, the punishment level varies with time and the
specific conditions on the game comparable to the ones in [10] do not hold.
The notion of publicly correlated strategies has strong links with non zero sum
stochastic differential games. As for the deterministic case, there is a general result
of existence and characterization [7] in case players use non-anticipative strategies
which is quite close to ours. For non degenerate stochastic differential games,
there is a general result for existence of a Nash equilibrium in feedback strategies
[4] based on the existence of smooth enough solutions for the system of partial
differential equations defining the equilibrium. Another approach [14] uses BSDEs
to check the existence of solutions, prove the existence of a Nash equilibrium and
optimal feedbacks. Note that the equilibria defined through this last approach are in
fact equilibria in non-anticipative strategies [17] when they both exist.
7 Nash Equilibrium Payoffs in Mixed Strategies 129
Here we deal with deterministic non zero sum differential games in mixed
strategies and we study Nash equilibria and publicly correlated equilibria. We now
expose the framework of our game.
We consider a two players non zero sum differential game in RN that runs for
t ∈ [t0 , T ]. The dynamics of the game is given by:

ẋ(t) = f (x(t), u(t), v(t)) t ∈ [t0 , T ], u(t) ∈ U and v(t) ∈ V
(7.1)
x(t0 ) = x0
We first define open-loop controls: we denote by U(t0 ) (respectively V(t0 )) the set
of measurable controls of Player I (respectively Player II):
U(t0 ) := {u(·) : [t0 , T ] → U, u measurable}

V(t0 ) := {v(·) : [t0 , T ] → V, v measurable}
Under suitable regularity assumptions on the dynamics given below, if controls u ∈

U(t0 ) and v ∈ V(t0 ) are played, they define a unique solution of the dynamics (7.1)
t ,x ,u,v
denoted by t → Xt 0 0 defined on [t0 , T ].
W.l.o.g., the payoffs only depend on the final position of the system. If the con-
t ,x ,u,v
trols (u, v) are played, for i = 1, 2, Player i’s payoff is Ji (t0 , x0 , u, v) = gi (XT0 0 )
In this nonzero sum game, each player tries to maximize his final payoff.
In order to play the game, we need to define strategies. We first consider
deterministic or pure strategies:
Definition 7.1 (Pure Strategy). A pure strategy for Player I at time t0 is a map
α : V(t0 ) → U(t0 ) which satisfies the following conditions:
1. α is a measurable map from V(t0 ) to U(t0 ) where V(t0 ) and U(t0 ) are endowed
with the Borel σ -field associated with the L1 distance,
2. α is non-anticipative with delay, i.e. there exists some delay τ > 0 such that for
any v1 , v2 ∈ V(t0 ), if v1 ≡ v2 a.e. on [t0 ,t] for some t ∈ [t0 , T ], then α (v1 ) ≡ α (v2 )
a.e. on [t0 , (t + τ ) ∧ T ]
We denote by A(t0 ) (respectively B(t0 )) the set of pure strategies for Player I
(respectively Player II) and by τ (α ) the delay of the strategy α ∈ A(t0 ).
The main interest of this definition is that we can associate to any (α , β ) ∈ A(t0 ) ×
t ,x ,α ,β
B(t0 ) a unique trajectory t → Xt 0 0 defined on [t0 , T ] as stated in Lemma 7.1.
This allows to define the payoffs Ji (t0 , x0 , α , β ) for all (α , β ) ∈ A(t0 ) × B(t0 ) and
t ,x ,α ,β
i = 1, 2 by Ji (t0 , x0 , α , β ) = gi (XT0 0 ). The point of the paper is to study the
impact of introducing mixed strategies and correlated strategies on the equilibria.
We define mixed strategies as probability distributions over pure strategies:
Definition 7.2 (Mixed Strategy). A mixed strategy ((Ωα , P(Ωα ), Pα ), α ) is a
probability space (Ωα , Fα , Pα ) and an application α : Ωα × V(t0 ) → U(t0 ) that
satisfies:
130 A. Souquière
• α is a measurable map from Ωα × V(t0 ) to U(t0 ) where Ωα is endowed with

the σ -field Fα and V(t0 ) and U(t0 ) with the Borel σ -field associated with the
L1 distance,
• α is non anticipative with delay, i.e. there is some delay τ > 0 such that for any
ωα ∈ Ωα , the pure strategy α (ωα , ·) is non anticipative with delay τ .
We denote by Ar (t0 ) (respectively Br (t0 )) the set of mixed strategies for Player I
(respectively Player II).
The payoff associated to some mixed strategies (α ,β ) ∈ Ar (t0 ) × Br (t0 ) is defined
thanks to Lemma 7.1 through Ji (t0 , x0 , α , β ) = Ωα ×Ωβ Ji (t0 , x0 , α (ωα ), β (ωβ ))
dPα ⊗ dPβ (ωα , ωβ ). We also have to define random controls:
Definition 7.3 (Random Control). A random control ((Ω, F , P), (u, v)) on [t0 , T ]
is a probability space (Ω, F , P) and a collection of pairs of open-loop controls
((u, v)(ω ))ω ∈Ω such that for all ω ∈ Ω, u(ω ) ∈ U(t0 ) and v(ω ) ∈ V(t0 ) and
ω → (u(ω ), v(ω )) is a measurable map from (Ω, F ) to U(t0 ) × V(t0 ) endowed
with the σ -field generated by the product of the σ -fields associated with the L1 -
distance in U(t0 ) and V(t0 ).
We will call finite random control any random control with finite associated
probability space. Note that we can naturally define the payoff associated to a
random control.
In order to introduce publicly correlated strategies, we first need some correlation
device.
Definition 7.4 (Correlation Device). A correlation device ((Ω, F , P), C) is a
probability space (Ω, F , P) and a stochastic process C : [t0 , T ] × Ω → RN generating
the natural filtration (Ft ). We will often denote the correlation device with C.
In order to correlate the strategies, we do not assume that Ω is finite, but we assume
that both players observe C(t) at time t. The following definitions are adapted
from [7]. We first introduce admissible controls:
Definition 7.5 (C-Admissible Control). For any correlation device ((Ω, F , P), C)
with associated natural filtration (Ft ), a C-admissible control ũ for Player I is a
(Ft )-measurable process ũ : [t0 , T ] × Ω → U progressively measurable with respect
to (Ft ), and symmetrically for Player II. The set of C-admissible controls on [t0 , T ]
is denoted by ŨC (t0 ) for Player I and ṼC ((t0 ) for Player II. We will omit to mention
the correlation device as far as no confusion is possible.
We will identify admissible controls and denote it by ũ1 ≡ ũ2 on [t0 ,t] as soon as
P(ũ1 = ũ2 a.e. on [t0 ,t]) = 1.
We define correlated strategies the following way:
Definition 7.6 (Publicly Correlated Strategies). Correlated strategies are a triplet
(C, α , β ):
• ((Ω, F , P), C) is a correlation device generating the natural filtration (Ft ),
• A map α : ṼC (t0 ) → ŨC (t0 ) which is strongly non-anticipative with delay [7]:
there exists τ (α ) > 0 such that ∀(Ft )-stopping time S and for all ṽ1 , ṽ2 ∈ ṼC (t0 ),
if ṽ1 ≡ ṽ2 on t0 , S, then α (ṽ1 ) ≡ α (ṽ2 ) on t0 , (S + τ (α )) ∧ T ,
• A map β : Ũ(t0 )C → ṼC (t0 ) which is a strongly non-anticipative strategy with
delay.
We denote by Ac (t0 ) the set of publicly correlated strategies.
Note that our definition is somehow broader than the usual definition of correlated
strategies in repeated games where the correlation signal is given only at the
beginning of the game. Our correlation device is closer to the autonomous corre-
lation device described in [19]. We will call C-correlated strategies any publicly
correlated strategies using the correlation device C. Note that we can associate a
unique pair of C-admissible controls to any C-correlated strategies and therefore
define a unique payoff associated to any publicly correlated strategies as recalled in
Lemma 7.1.
We assume that the payoff functions g1 and g2 are Lipschitz continuous and
bounded, and assume Isaacs’condition: for all (x, ξ ) ∈ RN × RN
inf sup f (x, u, v), ξ

= sup inf f (x, u, v), ξ
.
u∈U v∈V v∈V u∈U
In this case the two-players zero sum game whose payoff function is g1 (respec-
tively g2 ) has a value. We denote by
V1 (t, x) := sup inf J1 (t, x, α , β ) = inf sup J1 (t, x, α , β ) (7.2)

α ∈A(t) β ∈B(t) β ∈B(t) α ∈A(t)
the value of the zero sum game with payoff function g1 where Player I aims at
maximizing his payoff and
V2 (t, x) := inf sup J2 (t, x, α , β ) = sup inf J2 (t, x, α , β ) (7.3)

α ∈A(t) β ∈B(t) β ∈B(t) α ∈A(t)
the value of the zero sum game with payoff function g2 where Player II is the
maximizer. We recall that these definitions remain unchanged whether α ∈ A(t)
or Ar (t) and β ∈ B(t) or Br (t) [8]. Our assumptions also guarantee that these value
functions are Lipschitz continuous.
As we are interested in nonzero sum games, we need equilibrium concepts:
Definition 7.7 (Nash Equilibrium Payoff in Pure Strategies). The pair (e1 , e2 ) ∈
R2 is a Nash equilibrium payoff in pure strategies (PNEP) for the initial conditions
(t0 , x0 ) if for all > 0, there exists (α , β ) ∈ A(t0 ) × B(t0) such that
1. For i = 1, 2, |Ji (t0 , x0 , α , β ) − ei | ≤
2. For all α ∈ A(t0 ): J1 (t0 , x0 , α , β ) ≤ J1 (t0 , x0 , α , β ) +
3. For all β ∈ B(t0 ): J2 (t0 , x0 , α , β ) ≤ J2 (t0 , x0 , α , β ) +
We denote by E p (t0 , x0 ) the set of all PNEPs for the initial conditions (t0 , x0 ).
132 A. Souquière
Definition 7.8 (Nash Equilibrium Payoff in Mixed Strategies). The definition is

the same as above replacing A(t0 ) with Ar (t0 ) and B(t0 ) with Br (t0 ).
We denote by Em (t0 , x0 ) the set of all Nash equilibrium payoffs in mixed
strategies (MNEPs) for the initial conditions (t0 , x0 ).
Definition 7.9 (Publicly Correlated Equilibrium Payoff). The payoff (e1 , e2 ) ∈
R2 is a publicly correlated equilibrium payoff (PCEP) for the initial conditions
(t0 , x0 ) if for all > 0, there exists some correlated strategies (C, α , β ) ∈ Ac (t0 )
such that:
1. For i = 1, 2, |Ji (t0 , x0 , α , β ) − ei | ≤
2. For all (C, α , β ) ∈ Ac (t0 ): J1 (t0 , x0 , α , β ) ≤ J1 (t0 , x0 , α , β ) +
3. For all (C, α , β ) ∈ Ac (t0 ): J2 (t0 , x0 , α , β ) ≤ J2 (t0 , x0 , α , β ) +
We denote by Ec (t0 , x0 ) the set of all PCEPs for the initial conditions (t0 , x0 ).
According to [15, 16, 20], the PNEPs for the initial conditions (t0 , x0 ) are exactly
the “reachable and consistent payoffs” (e1 , e2 ) ∈ R2 satisfying: ∀ > 0, ∃(u , v ) ∈
U(t0 ) × V(t0 ) such that:
• ∀i, |ei − Ji (t0 , x0 , u , v )| ≤
• ∀i, ∀t ∈ [t0 , T ], gi (XT0 0 ) ≥ Vi (t, Xt 0 0 ) −
t ,x ,u ,v t ,x ,u ,v
where Vi refers to (7.2) or (7.3). Furthermore, the set of PNEPs is non empty.
In this paper, we study MNEPs. First of all, noticing that any pure strategy can be
considered as a trivial mixed strategy, the set Em (t0 , x0 ) is a non empty superset of
E p (t0 , x0 ). It appears that the set Em (t0 , x0 ) is in fact compact, convex and generally
strictly larger than the closed convex hull of the set E p (t0 , x0 ). Our main result
(Theorem 7.1 below) states that:
The payoff e = (e1 , e2 ) ∈ R2 is a MNEP iff for all > 0, there exists a random
control ((Ω, F , P), (u , v )) such that ∀i = 1, 2:
• e is -reachable: |Ji (t0 , x0 , u , v )] − ei | ≤
• (u , v ) are -consistent:
∀t ∈ [t0 , T ], denoting by Ft = σ ((u , v )(s), s ∈ [t0 ,t]):

t ,x ,u ,v
P Vi (t, Xt 0 0 ) ≤ E gi (XT0 0 )Ft + ≥ 1 −
t ,x ,u ,v
The proof heavily relies on techniques introduced for repeated games in [1] known
as “jointly controlled lotteries” and on the fact that we work with non-anticipative
strategies with delay.
Finally, studying publicly correlated equilibria, we show that the set of PCEPs
is equal to the set of MNEPs. The idea of the proof uses the similarity between
correlated equilibrium payoffs and equilibrium payoffs of stochastic non zero sum
differential games.
We complete this introduction by describing the outline of the paper. In Sect. 7.2,
we recall the assumptions on the differential game we study. In Sect. 7.3, we give
the main properties of the set of MNEPs and present an example where the set of
MNEPs is strictly larger than the convex hull of the set of PNEPs. In Sect. 7.4, we
prove the equivalence between the sets of MNEPs and of PCEPs. We postpone to
the last section the proof of the characterization of the set of MNEPs.
7.2 Definitions
7.2.1 Assumptions and Notations
Throughout the paper, for any x, y ∈ RN , we will denote by x, y

the scalar product,
by x the euclidian norm and by x1 the L1 norm in RN : x1 = maxi=1...N |xi |.
The ball with center x and radius r will be denoted by B(x, r). For any set S, 1S
denotes the indicator function of S: for all s ∈ S, 1S (s) = 1 and for all s ∈
/ S, 1S (s) = 0.
We first define the assumptions on the differential game we are dealing with. The
dynamics of the game is given by (7.1):

ẋ(t) = f (x(t), u(t), v(t)) t ∈ [t0 , T ], u(t) ∈ U and v(t) ∈ V
x(t0 ) = x0
where
⎧
⎪
⎪ U and V are compact subsets of some finite dimensional spaces
⎨
U and V have infinite cardinality,
⎪
⎪ f : RN × U × V → RN is bounded, continuous and uniformly
⎩
Lipschitz continuous with respect to x
These assumptions guarantee existence and uniqueness of the trajectory t →

t ,x ,u,v
Xt 0 0 for t ∈ [t0 , T ] generated by any pair of controls (u, v) ∈ U(t0 ) × V(t0 ).
The second assumption is a technical assumption simplifying the proof of the main
theorem.
We will always assume that players observe the controls played so far.
7.2.2 Payoffs Associated to a Pair of Strategies
In order to study equilibrium payoffs of this game we have introduced pure and
mixed strategies. The major interest of working with non-anticipative strategies with
delay is the following useful result:
Lemma 7.1 (Controls Associated to a Pair of Strategies). 1. For any pair of pure
strategies (α , β ) ∈ A(t0 ) × B(t0 ) there is a unique pair of controls (uαβ , vαβ ) ∈
U(t0 ) × V(t0 ) such that α (vαβ ) = uαβ and β (uαβ ) = vαβ .
134 A. Souquière
2. For any pair of mixed strategies (α , β ) ∈ Ar (t0 ) × Br (t0 ), and any

ω = (ωα , ωβ ) ∈ Ωα × Ωβ , there is a unique pair of controls (uω , vω ) ∈
U(t0 ) × V(t0 ) such that α (ωα )(vω ) = uω and β (ωβ )(uω ) = vω . Furthermore,
the map ω → (uω , vω ) is measurable from Ωα × Ωβ endowed with Fα ⊗ Fβ into
U(t0 ) × V(t0 ) endowed with the Borel σ -field associated with the L1 distance.
3. For any correlated strategies (C, α , β ) ∈ Ac (t0 ), there is a unique pair of
C-admissible controls (ũαβ , ṽαβ ) ∈ UC (t0 ) × VC (t0 ) such that α (ṽαβ ) = ũαβ
and β (ũαβ ) = ṽαβ .
Proof. The first two results are in [9], whereas the third is a straightforward
extension of the result established for admissible strategies in [7]. Given any
t ,x ,α ,β
pair of pure strategies (α , β ) ∈ A(t0 ) × B(t0 ), we denote by (X· 0 0 ) the map
t0 ,x0 ,uαβ ,vαβ t0 ,x0 ,uαβ ,vαβ
t→ Xt defined on [t0 , T ] where X· is the unique solution of
the dynamics (7.1). This allows us to define the payoff associated to any pair of
strategies.
Given correlated strategies (C, α , β ), the final payoff of Player i is:
t,x,ũ ,ṽ
t,x,α ,β
Ji (t, x, α , β ) = E gi XT αβ αβ := E gi XT
Notice that pure strategies are degenerated correlated strategies using some trivial
correlation device. Finally, note that in a zero sum game, using correlated strategies
with a fixed correlation device leads to the same value as using pure strategies.
Indeed, fix the device ((Ω, F , P), C) and denote by (C, α̃ , β̃ ) any C-correlated
strategies and (α , β ) any pair of pure strategies. For i = 1, 2:

t,x,α̃ ,β̃ t,x,α̃ ,β
sup inf E gi (XT ) ≥ sup inf E gi (XT )
α̃ β α̃
β̃
t,x,α ,β
= sup inf gi (XT ) = Vi (t, x)
β α

t,x,α ,β
= inf sup gi XT
α β

t,x,α ,β̃
= inf sup E gi XT
α
β̃

t,x,α̃ ,β̃
≥ inf sup E gi XT
α̃
β̃
On the other hand we have:

t,x,α̃ ,β̃ t,x,α̃ ,β̃
sup inf E gi XT ≤ inf sup E gi XT
α̃ α̃
β̃ β̃
and in the end for any C-correlated strategies α̃ , β̃ , for i = 1, 2:

sup inf Ji t, x, α̃ , β̃ = Vi (t, x) = inf sup Ji t, x, α̃ , β̃

α̃ α̃
β̃ β̃
7.2.3 Definitions
We will call reachable in mixed strategies a payoff (e1 , e2 ) ∈ R2 which completes

only the first part of the definition 7.8: ∀ > 0, ∃(α , β ) ∈ Ar (t0 ) × Br (t0 )
such that:
∀i = 1, 2, |ei − Ji (t0 , x0 , α , β )| ≤
A pair of strategies (α , β ) ∈ Ar (t0 ) × Br (t0 ) satisfying:

∀α ∈ Ar (t0 ), J1 (t0 , x0 , α , β ) ≥ J1 (t0 , x0 , α , β ) −
∀β ∈ Br (t0 ), J2 (t0 , x0 , α , β ) ≥ J2 (t0 , x0 , α , β ) −
will be called -optimal. Note that we just have to check the -optimality of α
(respectively β ) against pure strategies β ∈ B(t0 ) (respectively α ∈ A(t0 )), if α
and β are defined on a finite probability space.
7.3 Nash Equilibrium Payoffs Using Mixed Strategies
7.3.1 Characterization
Theorem 7.1 (Characterization of Nash Equilibrium Payoffs Using Mixed

Strategies). The payoff e = (e1 , e2 ) ∈ R2 is a MNEP iff for all > 0, there exists a
finite random control ((Ω, P(Ω), P), (u , v )) such that ∀i = 1, 2:
• e is -reachable: |Ji (t0 , x0 , u , v )] − ei | ≤
• (u , v ) are -consistent:
∀t ∈ [t0 , T ], denoting by Ft = σ ((u , v )(s), s ∈ [t0 ,t]):

t ,x ,u ,v
P Vi (t, Xt 0 0 ) ≤ E gi (XT0 0 )Ft + ≥ 1 −
t ,x ,u ,v
Note that the characterization could be given using trajectories following [20] rather
than controls, provided the trajectory stems from the dynamics (7.1).
We just give the idea of the proof which is postponed to Sect. 7.5.
136 A. Souquière
The fact that any MNEP satisfies such a characterization is in fact quite natural
if we extend the definition to any random control. Otherwise, there would exist
profitable deviations for one of the players. The way to restrict the definition only to
finite random controls is given through appropriate projection as shown in Sect. 7.4.
The sufficient condition is not intuitive. We have to build non anticipative
strategies with delay such that no unilateral deviation is profitable. The idea is to
build a trigger strategy: follow the same trajectory as the one defined through the
consistent controls (u , v ) as long as no deviation occurs and punish any deviation
in such a way that if a deviation occurred at the point (t, x(t)) the deviating player,
say i, will be rewarded with his guaranteed payoff Vi (t, x(t)). The unique difficulty
is to coordinate the choice of the trajectory to be followed each time there is some
node in the trajectories generated by (u , v ). To this end, players will use some
small delay at each node in order to communicate through jointly controlled lottery.
Assume for example that the trajectory is splitting in two, one generated by ω1 with
probability 1/2 and another generated by ω2 with probability 1/2. During the small
communication delay, Player I chooses either the control u1 or u2 and Player II
selects v1 or v2 . If (u1 , v1 ) or (u2 , v2 ) are played, players will follow the trajectory
generated by ω1 and the one generated by ω2 otherwise. Note that if each player
selects each communication control with probability 1/2 no unilateral cheating in
the use of the control may change the probability of the outcome: each trajectory
will be followed with probability 1/2. This jointly controlled lottery procedure is
easily extended to any finite probability over the trajectories. Of course, if one player
does not use the communication control, he will be punished and get his guaranteed
payoff which, by assumption, is not profitable.
7.3.2 Convexity of the Set of Nash Equilibrium Payoffs

Using Mixed Strategies
Proposition 7.1. The set Em (t0 , x0 ) of all MNEPs for the initial conditions (t0 , x0 )
is convex and compact in R2 .
Proof. Compactness comes from the fact that the payoff functions are bounded.
Let (e1 , e2 ) ∈ R4 be a pair of Nash equilibrium payoffs in mixed strategies. We
will prove that (λ e1 + (1 − λ ) e2 ) is a Nash equilibrium payoff in mixed strategies
for all λ ∈ (0, 1). We will simply build a finite random control satisfying the
characterization property of Theorem 7.1. As for j = 1, 2, e j is a Nash equilib-
rium payoff, we may choose random controls ((Ω j , P(Ω j ), P j ), (u j , v j )) such that
∀i, j = 1, 2:
• |E j (gi (XTt0 ,x0 ,u ,v )) − eij | ≤ 3

j j

• ∀t ∈ [t0 , T ], denoting by Ft = σ (u j , v j )(s), s ∈ [t0 ,t] :
j

t ,x ,u j ,v j t ,x ,u j ,v j j
P j Vi (t, Xt 0 0 ) ≤ E j gi (XT0 0 ) Ft + ≥ 1−
3
We need to build controls close to the initial pairs (u j , v j ), j = 1, 2, but with some
tag in order to distinguish them. Set some small delay δ > 0 such that for all x ∈
B(x0 , δ f ∞ ), for all (u, v) ∈ U(t0 ) × V(t0 ), for all i = 1, 2, for all t ≥ t0 + δ :
⎧
⎨ Vi (t, Xtt0 ,x0 ,u,v ) − Vi (t − δ , X t0 ,x,u,v ) ≤
t−δ 3
(7.4)
⎩ gi (X t0 ,x0 ,u,v ) − gi (X t0 ,x,u,v ) ≤
T T −δ 3
We now choose some u1

= u2 ∈ U and v1
= v2 ∈ V and set for j = 1, 2:
⎧ j
⎪
⎪ ū (s) = u j for s ∈ [t0 ,t0 + δ )
⎨ j
ū (s) = u j (s − δ ) for s ∈ [t0 + δ , T ]
⎪
⎪ v̄ j (s) = v j for s ∈ [t0 ,t0 + δ )
⎩ j
v̄ (s) = v j (s − δ ) for s ∈ [t0 + δ , T ]
j t ,x0 ,ū j ,v̄ j

We will denote by X̄· = X· 0 for j = 1, 2. We immediately get thanks
to (7.4) ∀i, j = 1, 2:

|E j (gi (X̄Tj )) − eij | ≤ 2 ≤ (7.5)
3

and for all t ∈ [t0 , T ], denoting by F̄t j = σ (ū j , v̄ j )(s), s ∈ [t0 ,t] :

P j Vi (t, X̄t j ) ≤ E j gi (X̄Tj )F̄t j + ≥ 1 − (7.6)
For i, j = 1, 2, denote by

Σti j = Vi (t, X̄t j ) ≤ E j gi (X̄Tj )F̄t j +
We now define a new finite random space Ω = {1, 2} × Ω1 × Ω2 endowed with the
probability P defined for all ω = ( j, ω 1 , ω 2 ) by:

P( j, ω 1 , ω 2 ) = λ P1 (ω 1 )P2 (ω 2 ) if j = 1
P( j, ω , ω ) = (1 − λ )P (ω )P (ω ) if j = 2
1 2 1 1 2 2
and define on Ω the random control (u, v) defined by:

(u, v)( j, ω 1 , ω 2 ) = (ū1 , v̄1 )(ω 1 ) if j = 1
(u, v)( j, ω 1 , ω 2 ) = (ū2 , v̄2 )(ω 2 ) if j = 2
We will denote by X· = X·t0 ,x0 ,u,v . It remains to prove that for i = 1, 2:

• |E[gi (XT )] − λ e1i − (1 − λ )e2i | ≤
138 A. Souquière
• ∀t ∈ [t0 , T ], denoting by Ft = σ ((u, v)(s), s ∈ [t0 ,t]):

P Vi (t, Xt ) ≤ E gi (XT )Ft + ≥ 1 −
The first relation is easy to prove. For i = 1, 2, we have:

E[gi (XT )]−λ e1i −(1−λ )e2i = λ E1 [gi (X̄T1 )]+(1 − λ )E2 [gi (X̄T2 )]−λ e1i −(1−λ )e2i

≤ λ E1 [gi (X̄T1 )] − e1i + (1 − λ ) E2 [gi (X̄T2 )] − e2i
≤ thanks to (7.5)
In order to prove the second inequality, for i = 1, 2, we denote by

Σti = Vi (t, Xt ) ≤ E gi (XT )Ft + 3
We have for i = 1, 2 and t ∈ [t0 + δ , T ]:

E gi (XT )Ft = E gi (XT )(1{1}×Ω1 ×Ω2 + 1{2}×Ω1×Ω2 )Ft

= E gi (X̄T1 )F̄t1 1{1}×Ω1 ×Ω2 + E gi (X̄T2 )F̄t2 1{2}×Ω1 ×Ω2
Therefore, assuming w.l.o.g. that the functions gi are non negative and using (7.6):

E gi (XT )Ft ≥ [Vi (t, X̄t1 ) − ]1{1}×Σ i1 ×Ω2 + [Vi (t, X̄t2 ) − ]1{2}×Ω1 ×Σ i2
t t
≥ Vi (t, Xt )1{1}×Σ i1 ×Ω2 + Vi (t, Xt )1{2}×Ω1×Σ i2 −

t t
And finally:

P Σti ≥ P {1} × Σti1 × Ω2 ∪ {2} × Ω1 × Σti2
≥ λ (1 − ) + (1 − λ )(1 − )
≥ 1−
Note that for t ∈ [t0 ,t0 + δ ], the preceding relation is straightforward.

7.3.3 Comparison Between the Sets of Nash Equilibrium

Payoffs in Pure and Mixed Strategies
We have just proven that the set of MNEPs is convex; therefore it contains the closed
convex hull of the set of PNEPs. When trying to compare these two sets, it appears
that in general, they are not equal. This result is not intuitive because the guaranteed
Fig. 7.1 Function g1
payoffs are exactly the same whether players use pure or mixed strategies. It appears
because players may correlate their strategies throughout the whole game and not
only at the beginning of it.
Proposition 7.2. There exist nonzero sum differential games in which the set of
MNEPs is larger than the convex hull of the set of PNEPs.
Proof. We will build a counter-example where an MNEP does not belong to the
closed convex hull of the PNEPs.
Consider the simple game in finite time in R2 with dynamics:
ẋ = u + v u, v ∈ [−1/2, 1/2]2
starting from the origin O = (0, 0) at time t = 0 and ending at time t = T = 1. The
set of all reachable points in this game is the unit ball in R2 for the L1 norm.
The payoff functions are the Lipschitz continuous functions defined as follows:
⎧
⎪
⎪ g1 (x) = 1 − 4|x2| for |x2 | ≤ 1/4 and |x2 | ≥ |x1 |
⎨
g1 (x) = 1 − 4|x1| for |x1 | ≤ 1/4 and |x1 | ≥ |x2 |
g1 :
⎪
⎪ g (x) = x + 2|x | − 1 for x2 ≥ −2|x1| + 1
⎩ 1 2 1
g1 (x) = 0 elsewhere
In fact, g1 is the nonnegative function defined on the unit square shown in Fig. 7.1

g2 (x) = 0 for x2 ≥ 0
g2 :
g2 (x) = −x2 for x2 ≤ 0
The game clearly fulfill the regularity assumptions listed in the introduction. We
will denote by Lg the greater of the Lipschitz-constants of g1 and g2 for the L1 -norm.
140 A. Souquière

The set of all reachable payoffs is [0, 2] × {0} ∪ y∈(0,1] ([0, 1 − y], y). It is also
clear that

V1 (t, x) = g1 (x)
V2 (t, x) = g2 (x)
The initial values are V1 (0, O) = 1 and V2 (0, O) = 0, implying any Nash
equilibrium payoff has to reward Player I with at least 1 and Player II with a non-
negative payoff. In pure strategies, no trajectory can end up at time T at some x such
that x2 < 0 because this would cause Player I to earn strictly less than 1. We then
have e2 = 0 corresponding to x2 ≥ 0 for every PNEP. We can easily compute
E p (0, O) = [1, 2] × {0} = ConvE p (0, O).
It is the set of all reachable payoffs such that e1 ≥ 1 and e2 ≥ 0.

We now will compute some finite random control (u, v) leading to a final payoff
of 1 for Player I and positive for Player II. The controls are as follows:
• For t ∈ [0, 3/4], play u = v = (1/2, 0) and join (3/4, 0) at t = 3/4.
• From t = 3/4 on, with probability one half: play u = v = (1/2, 1/2) and join
(1, 1/4) at t = 1 to get the payoff (5/4, 0).
• From t = 3/4 on, with probability one half: play u = v = (1/2, −1/2) and join
(1, −1/4) at t = 1 to get the payoff (3/4, 1/4).
The final payoff will be (e1 , e2 ) = (1, 1/8) ∈
/ ConvE p (0, O). It remains to prove that
this payoff is a MNEP.
We will denote by X· = X·0,O,u,v . We use the characterization of the MNEPs of
Theorem 7.1 and prove that along the trajectories the condition
∀i = 1, 2 ∀t ∈ [0, 1] E[gi (XT )|Ft ] ≥ Vi (t, Xt )
is satisfied. Indeed, along the trajectories:

for t ∈ [0, 1/4] V1 (t, Xt ) = 1 − 4t ∈ [0, 1] and E g1 (XT )Ft = 1
for t ∈ [1/4, 1/2] V1 (t, Xt ) = 0 and E g1 (XT )Ft = 1
for t ∈ [1/2, 3/4] V1 (t, Xt ) = 2t − 1 ∈ [0, 1/2] and E g1 (XT )Ft = 1
for t ∈ (3/4, 1] : either V1 (t, Xt ) = 3t−7/4∈[1/2, 5/4] and E g1 (XT )Ft = 5/4
or V1 (t, Xt ) = t−1/4∈[1/2, 3/4] and E g1 (XT )Ft = 3/4
and
⎧
⎪
⎪ for t ∈ [0, 3/4] V2 (t, Xt ) = 0 and E g2 (XT )Ft = 1/8
⎨
for t ∈ (3/4, 1] : either V2 (t, Xt ) = 0 and E g2 (XT )Ft = 0
⎪
⎪
⎩ or V2 (t, Xt ) = t − 3/4 ∈ [0, 1/4] and E g2 (XT )Ft = 1/4
This proves that the final payoff (e1 , e2 ) = (1, 1/8) is a MNEP.

7.4 Publicly Correlated Equilibrium Payoffs
We recall that Ec (t0 , x0 ) ⊃ E p (t0 , x0 ). We are going to state some characterization

of publicly correlated equilibrium payoffs (PCEPs) and compare the set of MNEPs
and the set of PCEPs.
Theorem 7.2. The set of publicly correlated equilibrium payoffs is equal to the set
of Nash equilibrium payoffs using mixed strategies.
Proof. To begin with, we will show that Em (t0 , x0 ) ⊆ Ec (t0 , x0 ). First note that a
simple adaptation of the proof in [7] states that:
Proposition 7.3 (Characterization of publicly correlated equilibrium payoffs).
The payoff e = (e1 , e2 ) ∈ R2 is a PCEP for the initial conditions (t0 , x0 ) iff for all
> 0, there exists a random control ((Ω, F , P), (u , v )), such that ∀i = 1, 2:
1. |E[gi (XT0 0 )] − ei | ≤
t ,x ,u ,v

2. ∀t∈ [t0 , T ], if we denote by (Ft ) = σ {(u , v )(s),s ∈ [t0 ,t]}
t0 ,x0 ,u ,v
) Ft ≥ Vi (t, Xt 0 0 ) − ≥ 1 −
t ,x ,u ,v
P E gi (XT
This characterization and Theorem 7.1 ensure that any MNEP is in fact a PCEP.
We now will prove that Em (t0 , x0 ) ⊇ Ec (t0 , x0 ). Note that the only difference
between the characterizations of MNEPs and PCEPs is that the latest relies on
a random control possibly defined on an infinite underlying probability space,
whereas MNEPs are characterized through finite random controls. We will consider
some PCEP satisfying the characterization of Proposition 7.3 and we will prove
that we are able to build a finite random control satisfying the characterization of
Theorem 7.1, implying it will be a MNEP.
Consider some PCEP e. Fix and consider the 2 -optimal random control
((Ω, F , P), (u , v )). Denote by X· = X· 0 0 and set for all ω ∈ Ω: X· (ω ) =
t ,x ,u ,v
t0 ,x0 ,(u ,v )(ω )
X· . Note that this random control satisfies:
⎧

⎨ E[gi (XT )] − ei ≤
⎪ 2
∀t ∈ [t0 , T ], if we denote by Ft = σ {(u , v )(s), s ∈ [t0 ,t]} : (7.7)

⎩ P E g (X )F ≥ V (t, X ) − 2 ≥ 1 − 2
⎪
i T t i t
If Ω is finite, there is nothing left to prove. Else, we will build a finite random control
rewarding a payoff close to e and consistent.
We set h > 0 and h̄ > 0 to be defined later such that there exist Nh , Nh̄ ∈
N∗ such that T − t0 = Nh h and (T − t0 ) f ∞ = Nh̄ h̄. We build the following
time partition Gh = {tk = t0 + kh}k=0,...,Nh and the grid in RN : Gh̄ = {x0 +
∑ni=1 ki h̄ei }(ki )∈{−Nh̄ ,...,0,...,Nh̄ }n where (ei )i=1...n is a basis of RN . We now introduce a
projection on the grid:
RN → Gh̄
Π: x
→ min{xi ∈ Gh̄ / d1 (x, xi ) = infx j ∈Gh̄ d1 (x, x j )}
142 A. Souquière
where the minimum is taken with respect to the lexicographic order and d1 is the
distance associated to the norm x1 .
To any (tk , xi , x j ) ∈ Gh × Gh̄ × Gh̄ we associate, if it exists some ϕ (tk , xi , x j ) =
tk ,x,u,v
(x, u, v) ∈ RN × U(tk ) × V(tk ) such that Π (x) = xi and Π (Xtk+1 ) = x j . We will
set ϕx (tk , xi , x j ) = x and ϕc (tk , xi , x j ) = (u, v).
We now are able to build a finite random control on (Ω, F , P). To any ω ∈ Ω we
associate (uη , vη )(ω ) in the following way:
• Fix (u0 , v0 ) ∈ U × V
• (uη , vη )(ω )|[t0 ,t1 ) = (u0 , v0 )
• For all k = 1 . . . Nh − 1, for all s ∈ [tk ,tk+1 ):

(uη , vη )(ω )(s) = ϕc tk−1 , Π (Xtk−1 (ω )), Π (Xtk (ω )) (s − h)
Note that the definition of (uη , vη ) is non anticipative. From now on, we will
η t ,x ,u ,v η t ,x ,(u ,v )(ω )
denote by X· = X· 0 0 η η and set for all ω ∈ Ω: X· (ω ) = X· 0 0 η η .
We now would like to prove that the set of finitely many random control (uη , vη )
defined on (Ω, F , P) satisfies for i = 1, 2, for some constants C1 ,C2 ,C3 :
• |E[gi (XTη )] − ei | ≤ C1
• ∀t∈ [t0 , T ], ifwe denote by Ftη = σ {(u
η , vη )(s), s ∈ [t0 ,t]}
η η
P E gi (XT ) Ft ≥ Vi (t, Xtη ) − C2 ≥ 1 − C3
First of all, we shall prove that the trajectories generated by (uη , vη ) and (u , v )
are close for sufficiently small values of h and h̄.
For all k = 0 . . . Nh − 1, we have
Xtηk+1 (ω ) − Xtk (ω )1

tk−1 ,Xtηk (ω ),ϕc (tk−1 ,Π (Xtk−1 (ω )),Π (Xtk (ω ))) (ω )
≤ X
tk − Xtk

1

tk−1 ,ϕ (tk−1 ,Π (Xtk−1 (ω )),Π (Xtk (ω ))) (ω )
≤X
tk − Xtk

1
t ,X η (ω ),ϕc (t ,Π (X (ω )),Π (X (ω ))) tk−1 ,ϕ (tk−1 ,Π (Xt (ω )),Π (Xt (ω )))
k−1 tk k−1 tk−1 tk
+ Xtk − Xtk k−1 k

1

η Lf h
≤ h̄ + ϕx (tk−1 , Π (Xtk−1 (ω )), Π (Xtk )(ω )) − Xtk (ω ) e
1

η
≤ h̄ + Xtk (ω ) − Xtk−1 (ω ) + h̄ eL f h
1
tk−1 ,ϕ (tk−1 ,Π (Xt (ω )),Π (Xt (ω )))

because by definition, Π (Xtk (ω )) = Π (Xtk k−1 k
) and

Π (Xtk−1 (ω )) = Π (ϕx (tk−1 , Π (Xtk−1 (ω )), Π (Xtk (ω )))) and points in B(x0 , (T − t0 )
f ∞ ) having the same projection on Gh̄ are at most h̄ distant. Using backward
induction, and noticing that Xtη1 (ω ) − Xt0 (ω )1 ≤ f ∞ h, we have that for all
k = 0 . . . Nh − 1:
k−1
η
Xtk+1 (ω ) − Xtk (ω ) ≤ h̄(1 + eL f h ) ∑ eiL f h + h ekL f h f ∞
1 i=0
T − t0 L f (T −t0 )
≤ 2h̄ e + h eL f (T −t0 ) f ∞
h
In order to minimize the distance between X· (ω ) and X·η (ω ), we set for example
h̄ = h2 in order to get for all k = 0 . . . Nh :

Xtk (ω ) − Xtηk (ω )1 ≤ h eL f (T −t0 ) (2(T − t0 ) + f ∞) + f ∞
and for all t ∈ [tk ,tk+1 ):

Xt (ω ) − Xtη (ω )1 ≤ h eL f (T −t0 ) (2(T − t0 ) + f ∞ ) + 3 f ∞
Finally choosing h small enough:
sup Xt (ω ) − Xtη (ω )1 ≤ (7.8)

t∈[t0 ,T ]
It is now easy to check that the final payoff using (uη , vη ) is close to the payoff
generated by (u , v ). Indeed for all i = 1, 2:

Ji (t0 , x0 , u , v ) − Ji (t0 , x0 , uη , vη ) ≤ gi (XT (ω )) − gi(X η (ω )) dP(ω )
T
Ω

≤ Lg XT (ω ) − XTη (ω )1 dP(ω )
Ω
≤ Lg
where Lg is maximum of the Lipschitz constant of the payoff functions g1 and g2 .

Using the Assumption (7.7) on (u , v ), we get for all i = 1, 2 and < 1:

E gi (X η ) − ei ≤ Lg + 2 ≤ (Lg + 1)
T
It remains to prove that the trajectories generated by (uη , vη ) are consistent.

For all t ∈ [t0 , T ], for all i = 1, 2, using (7.8) we get:
η
Vi (t, Xt ) ≤ E Vi (t, Xt )Ft + LV
η
(7.9)
where LV is maximum of the Lipschitz constant of the value functions V1

and V2 , and
144 A. Souquière

E gi (XT )Ftη ≤ E gi (XTη )Ftη + Lg (7.10)
We now have to use the Assumption (7.7) on (u , v ): if we denote by

Σi t := ω / Vi (t, Xt ) ≤ E gi (XT )Ft + 2
we know that P(Σit ) ≥ 1 − 2 . Then, denoting by K an upper bound of the payoff

functions, for all t ∈ [t0 , T ], for all i = 1, 2, we get:

Vi (t, Xt ) ≤ E gi (XT )Ft 1Σi + K1(Σi )c + 2
t t

≤ E gi (XT ) Ft + K1(Σ )c +
it
assuming w.l.o.g. that the functions gi are non negative.

Going back to our estimate of Vi (t, Xtη ) as computed in (7.9) and noticing that
the filtration (Ftη ) is a subfiltration of (Ft ), we have:

Vi (t, Xtη ) ≤ E E gi (XT )Ft Ftη + E K1(Σ )c Ftη + + LV
it

≤ E gi (XT )Ft + KP (Σi t )c Ft + (LV + 1)
η η

≤ E gi (XTη )Ftη + KP (Σi t )c Ftη + (LV + Lg + 1) due to (7.10)
We rewrite this last inequality introducing the constant C∗ = max(LV , Lg , 1, K):

Vi (t, Xtη ) ≤ E gi (XTη )Ftη + C∗ P (Σit )c Ftη + 3C∗ (7.11)

Using the assumption P((Σit )c ) ≤ 2 , we get that P P (Σit )c Ftη ≥ ≤ . This
implies for all t ∈ [t0 , T ], for all i = 1, 2:

P Vi (t, Xtη ) ≤ E gi (XTη )Ftη + 4C∗ ≥ P P((Σi t )c |Ftη ) ≤
≥ 1−
Finally, for all > 0, we have built finitely many controls (uη , vη ) defining a finite
random control satisfying for < 1 for i = 1, 2:

E[gi (X η )] − ei ≤ 2C∗
T
and for all t ∈ [t0 , T ] for i = 1, 2:

η t ,x ,u ,v η
P Vi (t, Xt ) ≤ E gi (XT0 0 η η )Ft + 4C∗ ≥ 1 −
This proves that e is a MNEP.

Note that we have in fact proven that any MNEP can be approximated through a
PCEP and vice versa.
7.5 Proof of the Main Theorem
Proof. We start with the proof of the necessary condition.

Consider a Nash equilibrium payoff e = (e1 , e2 ) and a pair of associated 2 -
2
optimal mixed strategies (α , β ). We will consider the random control defined
on Ω = Ωα × Ωβ using the probability P = Pα ⊗ Pβ by (u , v )(ωα , ωβ ) =
(uωα ωβ , vωα ωβ ). We will denote the associated trajectories by X· = X· 0 0 .
t ,x ,u ,v
We have for small , for all i = 1, 2:

E[gi (XT )] − ei ≤ ≤ .
2
We will prove that these controls are -consistent. Suppose on the contrary that
there exists t¯ ∈ [t0 , T ] such that for example:

P E g1 (XT )Ft¯ ≥ V1 (t¯, Xt¯ ) − < 1 − .
Denote by

Σ := (ωα , ωβ )/ E g1 (XT )Ft¯ ≥ V1 (t¯, Xt¯ ) − .
As we want to build trigger strategies, we have to introduce Maximin strategies:

Lemma 7.2 (Maximin Strategy). For all > 0, for all t ∈ (t0 , T ), there exists
τ > 0 such that if we denote by Aτ (t) = {α ∈ A(t)/ τ (α ) ≥ τ } there exists
αg,t : B(x0 , (t − t0) f ∞ ) → Aτ (t) such that:
,t
t,x,αg (x)(v),v
∀x ∈ B(x0 , (t − t0 ) f ∞ ), inf g1 XT ≥ V1 (t, x) −
v∈V(t)
Proof of Lemma 7.2. We will build the Maximin strategy αg,t (·) as a collection of
finitely many pure strategies with delay. For all x ∈ B(x0 , (t − t0 ) f ∞ ), there exists
some pure strategy αx ∈ A(t) such that:
t,x,αx (v),v
inf g1 (XT ) ≥ V1 (t, x) − /2
v∈V(t)
For continuity reasons, there exists a Borelian partition (Oi )i=1,...I of the ball
B(x0 ,(t − t0 ) f ∞ ) such that for any i there exists some xi ∈ Oi such that
t,z,α (v),v
x
∀z ∈ Oi , inf g1 XT i ≥ V1 (t, z) −
v∈V(t)
146 A. Souquière
and for all x ∈ B(x0 , (t − t0 ) f ∞ ), we define the Maximin strategy αg,t (x) as the
strategy that associates to any v ∈ V(t) the control:
αg,t (x)(v) = ∑ αxi (v)1x∈Oi

i
Note that we have by construction:

,t
t,x,α (x)(v),v
∀x ∈ B(x0 , (t − t0 ) f ∞ ), inf g1 XT g ≥ V1 (t, x) −
v∈V(t)
As the definition of the Maximin strategy relies on a finite collection of pure

strategies with delay, there exists some strictly positive delay τ such that ∀x ∈
B(x0 , (t − t0 ) f ∞ ), αg,t (x) is a pure strategy with delay greater than or equal to τ .

Choose some delay δ > 0 small enough such that if we denote the Lipschitz
constant of the value function V1 by L, we have L(1 + f ∞ )δ ≤ 2 /4. We now
build a mixed strategy α defined on Ωα using Pα in the following way: for all
v ∈ V(t0 )
• α (ωα )(v)(s) ≡ α (ωα )(v)(s) for s ∈ [t0 , t¯ + δ )
• If there exists ω ∈ Ω such that (u , v )(ω ) ≡ (α (ωα )(v), v) on [t0 , t¯) and
ω ∈ Σ , then go on playing α (ωα )(v)(s) ≡ α (ωα )(v)(s) for s ∈ [t¯ + δ , T ]
,t¯+δ t0 ,x0 ,α (ωα )(v),v
• Else, play α (ωα )(v) = αg4 (Xt¯+δ )(v|[t¯+δ ,T ] ) for all t ∈ [t¯ + δ , T ]
Note that (α (ωα ), β (ωβ )) generates the same controls as (α (ωα ), β (ωβ ))
for all (ωα , ωβ ) ∈ Σ and the same controls as (α (ωα ), β (ωβ )) on [t0 , t¯) if
(ωα , ωβ ) ∈ / Σ . Computing the payoff of (α , β ) and using the fact that Σ is
(Ft¯)-measurable:
⎛ ⎛ ⎞ ⎞
,t¯+δ
t¯+δ ,Xt¯+δ ,αg4 (Xt¯+δ ),β
J1 (t0 , x0 , α , β ) = E ⎝g1 ⎝XT ⎠ 1Σ c ⎠ + E(g1 (XT )1Σ )

≥ E(V1 (t¯ + δ , Xt¯+δ )1Σc ) − (1 − P(Σ )) + E(g1 (XT )1Σ )
4

≥ E(V1 (t¯, Xt¯ )1Σc ) − L(1 + f ∞)δ − (1 − P(Σ ))
4
+ E(g1 (XT )1Σ )
3
≥ E(E(g1 (XT )|Ft¯)1Σc ) + E(g1(XT )1Σ ) + (1 − P(Σ )) − 2 /4
4
3
≥ E(g1 (XT )1Σc ) + E(g1(XT )1Σ ) + (1 − P(Σ )) − 2 /4
4
2
> J1 (t0 , x0 , α , β ) +
2
This is in contradiction with the 2 -optimality of (α , β ).

2
We now will prove the sufficient condition.

Consider some payoff e = (e1 , e2 ) reachable and consistent as in Proposition 7.1.
For all > 0, we will build -optimal strategies rewarding a payoff close to e.
Fix > 0. Set δ small enough such that:
1. ∀t ∈ [t0 , T ], ∀x ∈ RN , ∀y ∈ B(x, δ f ∞ ), for all i = 1, 2:
|Vi (t, x) − Vi (t + δ , y)| ≤ (7.12)
2. ∀t ∈ [t0 , T ], ∀x ∈ RN , ∀y ∈ B(x, δ f ∞ ), ∀(u, v) ∈ U(t) × V(t), for all i = 1, 2:
|gi (XTt,x,u,v ) − gi (XTt,y,u,v )| ≤ (7.13)
3. ∃Nδ ∈ N∗ such that Nδ δ = T − t0

We introduce the time partition (θ0 = t0 , . . . , θk = t0 + kδ , . . . , θNδ = T ). Set η = N .
δ
Using the assumption, choose a random control ((Ω, P(Ω), P), (uη , vη )) rewarding
a payoff η -close to e and η -consistent, denoting by (Ft ) the filtration (Ft ) =
(σ {(uη , vη )(s), s ∈ [t0 ,t]}):

t ,x ,u ,v t ,x ,u ,v
P Vi (t, Xt 0 0 η η ) ≤ E[gi (XT0 0 η η )|Ft ] + η ≥ 1 − η (7.14)
t ,x ,u ,v t ,x ,(u ,v )(ω )
We will set X·η = X· 0 0 η η and for any ω ∈ Ω: X·η (ω ) = X· 0 0 η η .
If the random control is in fact deterministic, we already know a way to build
some pure strategies (α , β ) that are -optimal and reward a payoff -close to e (cf.
the construction of Proposition 6.1 in [20] for example). If the controls (uη , vη ) are
real random controls, we have to build -optimal mixed strategies rewarding a payoff
-close to e. The idea of the optimal strategies (α , β ) is to build “trigger” mixed
strategies that are correlated in order to generate controls close to (uη , vη ). We
will use some jointly correlated lottery at each “node” of the trajectories generated
by (uη , vη ) and, if the opponent does not play the expected control, the player
who detected the deviation swaps to the “punitive strategy”. The proof proceeds
in several steps. First of all, we have to build jointly controlled lotteries for each
“node”. Then we build the optimal strategies, and check that they reward a payoff
close to e and that they are optimal.
To begin with, we introduce the explosions that are kind of “nodes” in the
trajectories generated by (uη , vη ):
Definition 7.10 (Explosion). Consider a finite random control ((Ω, P(Ω), P),
(u , v )) associated to its natural filtration (Ft ). We set Ft− = {0,
/ Ω}. An explosion
0
is any t ∈ [t0 , T ) such that Ft−
= Ft+ .
Assume that (uη , vη ) generates M̄ distinct pairs of deterministic controls with
M̄ ≥ 2 and M explosions with 1 ≤ M ≤ M̄ − 1 denoted by {τi }. We introduce an
148 A. Souquière
auxiliary time step τ to be defined later such that τ < min j=k |τ j − τk |/2, τ < T −
max j τ j and ∃N̄ ∈ N\{0, 1} such that N̄ τ = δ . This ensures that there is no explosion
on [T − τ , T ]. We introduce another time partition (t0 , . . . ,tk = t0 + kτ , . . . ,tNδ N̄ = T ).
We now will explain how to correlate the strategies at each explosion using
jointly controlled lotteries.
First note that we can approximate the real probability P through a probability Q
taking rational values in such a way that the random control ((Ω, FT , Q), (uη , vη ))
rewards a payoff 2η close to e and 2η consistent: for all t ∈ [t0 , T ]:

Q Vi (t, Xtη ) ≤ EQ gi (XTη )Ft + 2η ≥ 1 − 2η (7.15)
7.5.1 Explosion Procedure
Suppose τ̄ is an explosion with τ̄ ∈ [tk ,tk+1 ) and consider ω1 , ω2 ∈ Ω such

that t¯ = sup{t/ (u , v )(ω1 )(s) ≡ (u , v )(ω2 )(s) on [t0 ,t]}. We assume that the
filtration Ftk is generated by the atoms {Ωl }l∈L . We have for some l: ω1 , ω2 ∈ Ωl .
By definition of the delay τ , there is no other explosion on (tk ,tk+1 ). The definition

of an explosion allows us to set Ωl := Ii=1 Ωli with Ωli ∈ Ftk+1 , 2 ≤ I ≤ M̄.
qi (tkl )
We consider the rational conditional probabilities Q(Ωli |Ωl ) = q(tkl )
. We build a
jointly controlled lottery as in [1]: consider the auxiliary two players process with
outcome matrix G, q(tkl ) distinct actions ua : [tk ,tk+1 ] → U for Player I and q(tkl )
distinct actions vb : [tk ,tk+1 ] → V for Player II. Note that as we assumed that U and
V have infinite cardinality, we can define correlation controls (ua , vb ) as distinct
constant controls and use distinct controls for each explosion. The matrix G is
build in such a way that the only possible outcomes are G(a, b) ∈ {1 . . . I} and
each row and each column of G contains exactly qi (tkl ) times the outcome i for all
i ∈ {1 . . . I}. Note that if Player II plays some fixed vb and Player I plays each ua with
qi (tkl )
equiprobability 1
q(tkl )
, then the outcome will be i with probability q(tkl )
= Q(Ωli |Ωl )
and symmetrically, if Player I plays some fixed ua and Player II plays each vb with
probability q(t1l ) , then the outcome will be i with probability Q(Ωli |Ωl ). Note that
k
this explosion procedure allows the players to correlate their controls on any Ωli
with probability Q(Ωli |Ωl ) in such a way that no unilateral cheating in the use of
the correlation controls may change the outcome of the correlation matrix G. We
introduce a way to punish the opponent if he is not playing the expected control
through punitive strategies:
Lemma 7.3 (Punitive Strategy). For all > 0, for all t ∈ (t0 , T ), there exists τ > 0
such that if we denote by Aτ (t) = {α ∈ A(t)/ τ (α ) ≥ τ } there exists α p,t : B(x0 , (t −
t0 ) f ∞ ) → Aτ (t) such that:
,t
t,x,α p (x)(v),v
∀x ∈ B (x0 , (t − t0 ) f ∞ ) , sup g2 XT ≤ V2 (t, x) +
v∈V(t)
Proof of Lemma 7.3. The proof is similar to the proof of Lemma 7.2.

We now have everything needed to define the -optimal strategies.
7.5.2 Definition of the Strategies (α , β )
We recall that the idea of the strategy for Player I is to play the same control as
uη (ω ), ω ∈ Ω as long as there is no explosion and as long as Player II plays
vη (ω ). If an explosion takes place on [tk ,tk+1 ) meaning Ftk+1 is generated by the
atoms (Ωi )i∈I , play on this interval some correlation control as defined by the
corresponding explosion procedure. Then observe at tk+1 the control played by
the opponent on [tk ,tk+1 ) and deduce from the explosion procedure on which Ωi
the game is now correlated and play uη (ωi ), ωi ∈ Ωi from tk+1 on until the next
explosion as long as Player II plays vη (ωi ). Player I repeats the same procedure at
each explosion. As soon as Player I detects that Player II played some unexpected
control, he swaps to the punitive strategy.
In order to define the strategy in a more convenient way, we have to introduce
some auxiliary random processes depending only on the past, namely Ω̄ keeping the
information on which trajectory generated by (uη , vη ) is currently being followed
and S such that S = 0/ if no deviation was observed in the past and S = (tk , x) where
tk ∈ {t0 . . .tNδ N̄ } means that some deviation occurred on [tk ,tk+1 ) and the punitive
strategy is to be played from the state (tk+2 , x) because there is a delay between
the time at which deviation is detected and the time from which punitive strategy is
played.
First of all, in order to build the strategy α for example, we will define the
associated underlying finite probability space. We will define it by induction on
the number of explosions. We will always assume that an explosion procedure is
defined using constant correlation controls that are not used in any other explosion
procedure. This allows to build the set Ωα by backward induction adding new
correlation controls for each explosion.
Any ωα ∈ Ωα prescribes one correlation control for any of the possible
explosion procedures. Fix any sequence of correlation controls (ui ) possibly leading
to the explosion τ̄ ∈ [tk ,tk+1 ) associated to the atom Ωl of Ftk . Consider the set
of correlation controls {ua } associated to this explosion. Then, the conditional
probability of each ua given (ui ) is by definition q(t1l ) :
k
1
Pα [ωα ua |ωα (ui )] = (7.16)
q(tkl )
We now define the strategy α using auxiliary random processes:

Sα : Ωα × V(t0 ) × {tk }k=0...Nδ N̄ → 0/ ∪ [t0 , T ] × RN
150 A. Souquière
and
Ω̄α : Ωα × V(t0 ) × {tk }k=0...Nδ N̄ → FT .
At time t0 , for any ωα , for any control v ∈ V(t0 ), we set St0 (ωα , v) = 0/ and
α
α
Ω̄t0 (ωα , v) = Ω and fix u0 ∈ U. For all k ∈ {0, . . . Nδ N̄ − 1}, if α (ωα )(v) is built
on [t0 ,tk ), we define α (ωα )(v) further by:
1. If Stk (ωα , v) = 0,
α
/ for example Stk (ωα , v) = (ti , x), this means that Player II
α
did not play the expected control from ts ∈ [ti ,ti+1 ) on, then play the punitive
η ,t
strategy α (v)|[tk ,tk+1 ) = α p i+2 (x)(v|[ti+2 ,T ] )|[tk ,tk+1 ) as defined in Lemma 7.3 and
α α
(ωα , v) = Stk (ωα , v).
α
set Ω̄tk+1 (ωα , v) = 0/ and Stk+1
α
2. If Stk (ωα , v) = 0, / then
• If there is no explosion on [tk ,tk+1 ) for (uη , vη )(ω ), ω ∈ Ω̄tk (ωα , v),
α
then play α (ωα )(v)|[tk ,tk+1 ) = uη (ω )|[tk ,tk+1 ) for some ω ∈ Ω̄tk (ωα , v) and
α
α α
set Ω̄tk+1 (ωα , v) = Ω̄tk (ωα , v). If k ≥ 1 and if v|[tk−1 ,tk ] ≡ vη (ω )|[tk−1 ,tk ]
for all ω ∈ Ω̄tk (ωα , v) then set Stk+1
α α t0 ,x0 ,α ,v
(ωα , v) = (tk−1 , Xtk+1 ), else set
α
Stk+1 (ωα , v) = 0/
• If there is an explosion on [tk ,tk+1 ) for (uη , vη )(ω ), ω ∈ Ω̄tk (ωα , v), play
α
on [tk ,tk+1 ) the control ua in ωα corresponding to the current explosion

procedure then consider the control v played by Player II on [tk−1 ∨ t0 ,
tk + τ2 ]:
≡ vη (ω )|[tk−1 ,tk ] for all ω ∈ Ω̄tk (ωα , v) then
α
– If k ≥ 1 and if v|[tk−1 ,tk ]
α t0 ,x0 ,α ,v α
set Stk+1 (ωα , v) = (tk−1 , Xtk+1 ) and Ω̄tk+1 (ωα , v) = 0/ and define
α (ωα ) further using the procedure at step 1, else
– If v|[tk ,tk + τ ]
≡ vb for any of the vb prescribed by the explosion proce-
2
α α
dure, then set Ω̄tk+1 (ωα , v) = 0/ and Stk+1 (ωα , v) = 0./ If k < Nδ N̄ − 1,
α t0 ,x0 ,α ,v
play α (ωα )(v)|[tk+1 ,tk+2 ) = u0 and set Stk+2 (ωα , v) = (tk , Xtk+2 ) and
α
Ω̄tk+2 (ωα , v) = 0. /
– Else, Player II played one of the expected controls for example vb .
Assume Ω̄tk (ωα , v) = Ωl . Consider G(a, b) = κ and set Ω̄tk+1
α α
(ωα , v) =
α (ω , v) = 0.
Ωlκ and Stk+1 α / If k < N δ N̄ − 1, play α (ω
α )(v)| [tk+1 ,tk+2 ) =
α α α
uη (ω )|[tk+1 ,tk+2 ) for some ω ∈ Ω̄tk+1 (ωα , v) and set Ω̄tk+2 (ωα , v) = Ω̄tk+1
α
(ωα , v) and Stk+2 (ωα , v) = 0. /
Note that this procedure indeed defines a mixed strategy. It also ensures that for
all k = 0, . . . , Nδ N̄, at time tk , Ω̄tk (v) is either the empty set or one of the atoms of
α
Ftk and {ST (v) ∈ {tk } × RN } ∈ Ftk where Ftk = σ ((α (v), v)(s), s ∈ [t0 ,tk ]).
α α ,v α ,v
The strategy β is built symmetrically using the auxiliary random processes Ω̄β
and Sβ .
7.5.3 Payoff of the Strategies (α , β )
We will first study the controls generated if Player I plays α and Player II plays
some pure strategy β with delay τ (β ) such that β generates no deviation. We will
say that β generates no deviation as soon as for all k ∈ {0, . . ., Nδ N̄}, Stk (β ) = 0/
α
α α
(equivalently ST (β ) = 0),/ even if ST (β ) = 0/ does not imply that the control
generated by β on [T − τ , T ] is one of the vη .
We will first consider the values taken by the process Ω̄α (β ).
Lemma 7.4. If the strategies (α , β ) are played where β is some pure strategy with
delay such that for all k ∈ {0, . . . , Nδ N̄}, Stk (β ) = 0,
α
/ then for all k ∈ {0, . . . , Nδ N̄},
for all F ∈ Ftk :
α
Pα Ω̄tk (β ) ⊂ F = Q(F)
Proof. We will prove the Lemma by induction on k for all F such that F is an atom
of the filtration Ftk .
For k = 0, this is obviously true for the filtration Ft0 is trivial and Ω̄t0 (β ) = Ω.
α
Assume that the property of the Lemma is true at stage k, k < Nδ N̄ − 1 and that
Ftk is generated by the atoms {Ωki }i∈I . We know that for all k, Ω̄tk (β ) ∈ {Ωki }i∈I ∪ 0.
α
/

Assume now that Ftk+1 = σ {Ω j } j∈J where the Ω j are the atoms of Ftk+1 .
k+1 k+1
Assume that Ω̄ (β ) = Ωk .
α
tk i
• If there exists j ∈ J such that Ωki = Ωk+1 j , this means that no explosion takes
place on [tk ,tk+1 ) for the controls (uη , vη )(ω ), ω ∈ Ωki . As Stk (β ) = 0,
α
/ the
strategy α will generate on [tk ,tk+1 ) the control uη (ω ) for any ω ∈ Ωi and
k
α α α (β ) = Ωk ≥
we will get Ω̄tk+1 (β ) = Ω̄tk (β ) = Ωki . This implies Pα Ω̄tk+1 i
α α
Pα Ω̄tk (β ) = Ωi . On the other hand, the definition of the process Ω̄ (β )
k
α
(β ) ⊆ Ω̄tk (β ) leading to
α
ensures that Ω̄tk+1

α
(β ) = Ωki = Pα Ω̄tk (β ) = Ωki = Q(Ωki )
α
Pα Ω̄tk+1
= Ωk+1
• Assume now that Ωki j for all j ∈ J. This means there is an explosion

on [tk ,tk+1 ) for the controls (uη , vη )(ω ), ω ∈ Ωki and Ωki = jj= i
j0 Ω j . As-
k+1
sume that we have for some ωα Ω̄tk (ωα , β ) = Ωki . Recall that Ftk =
α α ,v
α
σ ((α (v), v)(s), s ∈ [t0 ,tk ]). Note that Stk (ωα , β ) = 0,
/ implying on [tk ,tk+1 )
the strategy α will generate one of the correlation control ua ∈ Ωα prescribed
by the explosion procedure for Ωki . The conditional probability that the control
generated by α at time tk is ua given all correlation controls played so far is

α ,β 1 k α ,β
Pα α (β )|[tk ,tk+1 ) = ua Ftk
α
= × P α Ω̄t (β ) = Ω i Ft
q(tki ) k k
152 A. Souquière
due to (7.16) because every correlation control being unique, the only way to play
ua is when Ω̄tk (β ) = Ωki . Given the controls played on [t0 ,tk ) for any trajectory
α
such that Ω̄tk (β ) = Ωki , at time tk , the pure strategy β being a strategy with
α
delay will generate on [tk ,tk + τ (β )] the same control for example vb whatever
the control ua chosen by Player I on [tk ,tk+1 ). Note that we must have v|[tk ,tk +τ /2)
is equivalent to one of the constant correlation controls, else, Player I would
α
detect some deviation at time tk+1 and set Stk+2 (β ) = 0.
/ In the end, Player II has
to play on [tk ,tk + τ /2) one of the correlation control vb , and always plays the
same control whatever the control ua played by Player I. Finally, we will get for
all j = j0 . . . ji :

α α ,β α k α ,β
Pα Ω̄tk+1 (β ) = Ωk+1
j Ftk
= Q(Ω j |Ω
k+1 k
i ) × P α Ω̄tk
(β ) = Ω i Ftk
and finally taking the expectation w.r.t. Pα :

α α
Pα Ω̄tk+1 (β ) = Ωk+1
j = Q(Ωk+1 j |Ωi )Pα Ω̄tk (β ) = Ωi
k k
= Q(Ωk+1
j |Ωi )Q(Ωi ) = Q(Ω j )
k k k+1
We have proven that for all k ∈ {0, . . . , Nδ N̄ − 1}, for all atom Ωki of the
filtration Ftk ,

Pα Ω̄tk (β ) = Ωki = Q(Ωki ).
α
Noticing that there is no explosion on [T − τ , T ], we get FT = FtN N̄−1 and due to the
δ
definition of the strategy and the fact that S (β ) = 0,
/ we get Ω̄ (β ) = Ω̄ (β ),
α α α
T T tN N̄−1
δ
hence the result.

We still assume that Player I plays α and Player II plays some pure strategy β
such that β generates no deviation and we will compute the payoff Ji (t0 , x0 , α , β )
for i = 1, 2.
Lemma 7.5. If the strategies (α , β ) are played where β is some pure strategy with
delay such that ST (β ) = 0,
α
/ then for all i = 1, 2:
3
|Ji (t0 , x0 , α , β ) − ei| ≤
Nδ
Corollary. The strategies (α , β ) reward a payoff N3 close to e.

δ
Proof of the Corollary. The proof of the Corollary is straightforward. Indeed, as β

is a mixed strategy, namely a finite probability distribution on finitely many pure
strategies β (ωβ ) generating ST (β (ωβ )) = 0/ against α , we get for i = 1, 2:
α

3
|Ji (t0 , x0 , α , β ) − ei | ≤ |Ji (t0 , x0 , α , β (ωβ )) − ei |dPβ (ωβ ) ≤

Ωβ Nδ

Proof of Lemma 7.5. We recall that the explosions are denoted by τi , i ∈ I.

For all i ∈ I, there exists k(τi ) ∈ {0, . . . , Nδ N̄ − 2} such that τi ∈ [tk(τi ) ,tk(τi )+1 ).
We denote by

Δ := [t0 , T − τ )\ ∪i∈I [tk(τi ) ,tk(τi )+1 )
Assume that FT = σ ({Ω j } j=1...M̄ ) where the Ω j are the atoms of FT and players are
using (α , β ) as in the assumptions of the Lemma. Notice that ∀ωα ∈ {Ω̄T (β ) =
α
Ω j }, ∀ω j ∈ Ω j , the control of Player I generated by (α (ωα ), β ) satisfies

uα (ωα )β (s) ≡ uη (ω j )(s) ∀s ∈ Δ . Consequently, ∀ωα ∈ {Ω̄T (β ) = Ω j }, the
α

control of Player II generated by (α (ωα ), β ) satisfies vα (ωα )β (s) ≡ vη (ω j )(s)

∀s ∈ Δ , else we would get ST (ωα , β )
α
/ Therefore, for any ωα satisfying
= 0.
Ω̄T (ωα , β ) = Ω j and any ω j ∈ Ω j , for all t ∈ [t0 , T ]:
α

t0 ,x0 ,(α ,β )(ωα ) η
Xt − Xt (ω j ) L f (T −t0 )
≤ M τ (1 + f ∞ ) e
where L f denotes the Lipschitz constant of f and M the number of explosions.

We can choose τ small enough in order that for i = 1, 2, for all t ∈ [t0 , T ]:
⎧
⎨ gi (X t0 ,x0 ,(α ,β )(ωα ) ) − gi X η (ω j ) ≤ η
T T
(7.17)
⎩ t0 ,x0 ,(α ,β )(ωα )
Vi (t, Xt ) − Vi (t, Xtη (ω j )) ≤ η
leading, for any j = 1 . . . M̄ and any ω j ∈ Ω j , to

Eα gi (XT0 0 )1Ω̄α (β )=Ω − E gi (XT )1Ω j
t ,x ,α ,β η
T j

≤ gi (XTη (ω j ))Pα (Ω̄T (β ) = Ω j ) − gi (XTη (ω j ))Q(Ω j )+ η Pα (Ω̄T (β ) = Ω j )
α α
≤ η Q(Ω j ) due to Lemma 7.4

Finally, for all i = 1, 2:
|Ji (t0 , x0 , α , β ) − Ji (t0 , x0 , uη , vη )|

M̄

∑ Eα gi(XT0 0 )1Ω̄α (β )=Ω − E gi (XTη )1Ω j
t ,x ,α ,β
≤
T j
j=1
M̄
≤ ∑ η Q(Ω j ) = η
j=1
Using now (7.15), we have for all i = 1, 2:

|Ji (t0 , x0 , α , β ) − ei | ≤ |Ji (t0 , x0 , α , β ) − Ji (t0 , x0 , uη , vη )|
+ |Ji (t0 , x0 , uη , vη ) − ei | ≤ 3η
and the strategies (α , β ) reward a payoff N3 close to e.

δ
154 A. Souquière
7.5.4 Optimality of the Strategies (α , β )
It remains to prove that the strategies (α , β ) are optimal. We will prove it for β :
there exists some constant Cα satisfying
∀β ∈ B(t0 ), J2 (t0 , x0 , α , β ) ≤ J2 (t0 , x0 , α , β ) + Cα . (7.18)
Consider some pure strategy with delay β . If β generates no deviation (ST (β ) = 0),
α
/
then we have just proven that:
3 6
J2 (t0 , x0 , α , β ) ≤ e2 + ≤ J2 (t0 , x0 , α , β ) + . (7.19)
Nδ Nδ
It remains to prove the same kind of result as (7.18) for any pure strategy β
generating some unexpected controls (leading for some ωα to ST (ωα , β ) = 0).
α
/
The idea of the proof is first to build some pure strategy β̃ generating the same
controls as β against α as long as no deviation occurs and generating no deviation
against α , that is some non-deviating extension of β . We then will compare the
payoffs induced by β and β̃ .
Lemma 7.6 (Non Deviating Extension β̃ of Some Pure Strategy β of Player II).
To any pure strategy with delay β , one can associate a pure strategy with delay β̃
satisfying:
1. Sα (β̃ ) = 0/
2. The pairs of strategies (α (ωα ), β ) and (α (ωα ), β̃ ) generate the same
pairs of controls on [t0 , T − τ ] × {ST (β ) = 0}
/ ∪k∈{0,...,Nδ N̄} [t0 ,tk ] × {ST (β ) ∈
α α
{tk } × RN }.
Proof. We juste give the sketch of the proof. The strategy β̃ is built the following
way. We need auxiliary random processes in order to keep in mind
• Which trajectory generated by (uη , vη ) is followed.
• Wether β deviated in the past: there exists t ∈ (t0 ,tk ) such that Sα (β̃ )t = 0.
/
• If the strategy played by Player I is α .
For all time interval [tk ,tk+1 ],
• If Player I deviated from α , play any control.
• If β deviated, then play the expected control (either vη (ω ) for the ω correspond-
ing to the followed trajectory or any expected correlation control in case there
is some explosion). Then check whether Player I played the expected controls
corresponding to α .
• If β did not deviate, then if there is no explosion, first play vη (ω ) for the ω to be
followed then check if β deviated and if Player I played the expected strategy, if
there is some explosion, if β is going to play some expected correlation control
on [tk ,tk + τ (β )] then play this control on the first half of the time interval, then
check if β deviates on [tk ,tk + τ /2], if it does not deviate, play β for the remaining
time interval and otherwise, play any correct correlation control, then check if
Player I deviated.
In this way we are able to build a pure strategy with delay. Indeed, β̃ is anticipative
with respect to β but non anticipative with respect to the control u of the opponent.
Furthermore β̃ satisfies ST (β̃ ) = 0/ and Ω̄T (β̃ )
α α
/ As long as β generates no
= 0.
deviation, the controls generated by (α , β ) and (α , β̃ ) are the same.

We have for any deviating pure strategy β :
Nδ N̄−1
Eα g2 (XT0 0 )1Sα (β )∈{t }×RN
t ,x ,α ,β
J2 (t0 , x0 , α , β ) = ∑ T i
i=0

t ,x0 ,α ,β
+ Eα g2 XT0 1Sα (β )=0/ (7.20)
T
Assume that for example ST (β ) = (ti , x). This means that some deviation occurred
α
on [ti ,ti+1 ). There exists k ∈ {1 . . . Nδ } such that [ti ,ti+1 ) ⊂ [θk−1 , θk ). Using the
definition of the strategy α and introducing the non deviating extension β̃ of β :
η ,t
t ,x0 ,α ,β t ,x,α p i+2 (x),β
g2 (XT0 )1Sα (β )=(t ,x) = g2 (XTi+2 )1Sα (β )=(t ,x)
T i T i
≤ (V2 (ti+2 , x) + η )1Sα (β )=(t ,x)

T i

≤ V2 (ti , Xti0 0 ) + η + 1Sα (β )=(t ,x) due to (7.12)
t ,x ,α ,β
T i

≤ V2 (ti , Xti0 0 ) + 2 1Sα (β )=(t ,x)
t ,x ,α ,β̃
T i

t ,x0 ,α ,β̃
)|Fti ) + 3 1Sα (β )=(t ,x)
α β̃
≤ Eα (V2 (θk , Xθ0k
T i
due to (7.12) because (θk − ti ) ≤ δ
We introduce this last inequality because our estimate of V2 (ti , Xti0 0 ) induces
t ,x ,α ,β̃
some error term of length η , therefore we need to sum up at most Nδ such error
terms in order to bound the global error to some .
In the end we have for all ti ∈ [θk−1 , θk ):
!
t0 ,x0 ,α ,β̃ α β̃
g2 XT0 0
t ,x ,α ,β
1 α
ST (β )∈{ti }×RN
≤ E α V 2 θ k , Xθk Fti
+3 1 α
ST (β )∈{ti }×RN
(7.21)
V2 (θk , Xθ0k 0 ).
t ,x ,α ,β̃
The point now is to get an estimate of We will prove the
following Lemma:
156 A. Souquière
Lemma 7.7. For all t ∈ {tk }k=0...Nδ N̄ , for all pure strategy β̃ generating no
deviation against α , we have:

t ,x ,α ,β̃ α β̃
Pα V2 (t, Xt 0 0 ) ≤ Eα g2 (XT0 0 )Ft + 4η ≥ 1 − 2η
t ,x ,α ,β̃
where (Ft ) = σ ((α , β̃ )(s), s ∈ [t0 ,t])

α β̃
Proof. The proof is straightforward for the trajectories generated by (uη , vη )

and (α , β̃ ) differs on small time intervals cf. (7.17) and (uη , vη ) are consistent
cf. (7.15).

We will denote by:

t ,x ,α ,β̃ α β̃
Σt = V2 (t, Xt 0 0 ) ≤ Eα g2 (XT0 0 )Ft + 4η
α β̃ t ,x ,α ,β̃
We now will compute a more precise estimate of V2 (θk , Xθ0k 0 ) denoting by

t ,x ,α ,β̃
g∞ some bound of the payoff functions g1 and g2 :

t ,x ,α ,β̃ α β̃
V2 (θk , Xθ0k 0 ) ≤ Eα g2 (XT0 0 )Fθk + 4η 1 α β̃ + g∞1 α β̃ c
t ,x ,α ,β̃
Σ (Σ ) θk θk

t ,x ,α ,β̃ α β̃
≤ Eα g2 (X 0 0 )F 1 + g∞1 + 4η
Σθ (Σ θ )c
T θk α β̃ α β̃
k k

t ,x ,α ,β̃ α β̃
≤ Eα g2 (XT0 0 )Fθk + g∞1 + 4η (7.22)
(Σ θ )c
α β̃
k
assuming g2 is non negative, which is possible without lack of generality because

this function is bounded.
It remains to introduce this estimate (7.22) in inequality (7.21) to get for all
i ∈ {0, . . . , Nδ N̄ − 1}, if ti ∈ [θ(k−1) , θk ):

g2 XT0 0
t ,x ,α ,β
1Sα (β )∈{t }×RN
T i

t ,x ,α ,β̃ α β̃
≤ Eα V2 θk , Xθ0k 0 Fti + 3 1Sα (β )∈{t }×RN
T i
⎛ ⎞

α β̃
≤ Eα ⎝Eα g2 XT0 0
t ,x ,α ,β̃ α β̃
Fθk +g∞1 α β̃ c + 4η Fti ⎠
Σ θk
× 1Sα (β )∈{t }×RN

T i
+ 31Sα (β )∈{t }×RN thanks to (7.22)

T i

t ,x ,α ,β̃ α β̃
≤ Eα g2 XT0 0 Fti 1Sα (β )∈{t }×RN
T i

α β̃ α β̃
+ g∞Pα (Σθk )c Fti 1Sα (β )∈{t }×RN + 71Sα (β )∈{t }×RN
T i T i
Note that at time ti there is no deviation, implying Fti = Fti and

α β̃ α β

t ,x0 ,α ,β t ,x ,α ,β̃ α β
g2 (XT0 )1Sα (β )∈{t }×RN ≤ Eα g2 (XT0 0 )Fti 1Sα (β )∈{t }×RN
T i T i

α β̃ α β
+ g∞Pα (Σθk )c Fti 1Sα (β )∈{t }×RN + 71Sα (β )∈{t }×RN
T i T i
Using the fact that {Sα (β ) ∈ {ti }×RN } is (Fti )-measurable due to the definition
α β
of the strategy α , we get for i = 0 . . . Nδ N̄ − 1, if ti ∈ [θk−1 , θk ):

α β
t ,x0 ,α ,β
)1Sα (β )∈{t }×RN ≤ Eα g2 (XT0 0 )1Sα (β )∈{t }×RN Fti
t ,x ,α ,β̃
g2 (XT0
T i T i
" #
α β
+ g∞Eα 1 α β̃ c 1Sα (β )∈{t }×RN Fti + 71Sα (β )∈{t }×RN (7.23)
(Σ ) T θk
i T i
We now use this estimate to compute the expectation of the payoff in case there is
some deviation:
Nδ N̄−1
Eα g2 XT0 0
t ,x ,α ,β
∑ 1Sα (β )∈{t }×RN
T i
i=0
Nδ N̄−1
Eα Eα g2 XT0 0 1Sα (β )∈{t }×RN |Fti
t ,x ,α ,β̃ α β
≤ ∑ T i
i=0
" " ##
Nδ kN̄−1
α β
+∑ ∑ Eα g∞ Eα 1 1 α |Fti
(Σθ )c ST (β )∈{ti }×R
α β̃ N
k=1 i=(k−1)N̄ k
+ 7 due to (7.23)

≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/
T
" #
Nδ
+ g∞ ∑ Eα 1 1 α + 7
(Σθ )c ST (β )∈[θk−1 ,θk )×R
α β̃ N
k=1 k
Nδ c
g2 (XT0 0 )1Sα (β )=0/
t ,x ,α ,β̃
Σθk
α β̃
≤ Eα + g∞ ∑ Pα + 7
T
k=1
Nδ
2
≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/ + g∞ ∑ + 7 thanks to Lemma 7.7
T N
k=1 δ

≤ Eα g2 XT0 0
t ,x ,α ,β̃
1Sα (β )=0/ + 2g∞ + 7 (7.24)
T
158 A. Souquière
Going back to our estimate of J2 (t0 , x0 , α , β ) as in (7.20) we have:
J2 (t0 , x0 , α , β )
Nδ N̄−1
Eα g2 (XT0 0 )1Sα (β )∈{t }×RN + Eα g2 (XT0 0 )1Sα (β )=0/
t ,x ,α ,β t ,x ,α ,β̃
= ∑ T i T
i=0

≤ Eα g2 (XT0 0 )1Sα (β )=0/ + (2g∞ + 7)
t ,x ,α ,β̃
T

+ Eα g2 (XT0 0 )1Sα (β )=0/ due to (7.24)
t ,x ,α ,β̃
T
t ,x0 ,α ,β̃

≤ Eα (g2 (XT0 )) + (2g∞ + 7)
6
≤ J2 (t0 , x0 , α , β ) + + (2g∞ + 7) thanks to (7.19)
Nδ
This proves that β is (13 + 2g∞) optimal. The proof is symmetric to state
that α is (13 + 2g∞) optimal.
Finally, we have build mixed strategies (α , β ) rewarding a payoff 3 close to e
and (13 + 2g∞) optimal. This proves e is a Nash equilibrium payoff.

References
1. Aumann, R.J., Maschler, M.B.: Repeated Games with Incomplete Information. MIT Press,
Cambridge (1995)
2. Aumann, R.J., Shapley, L.S.: Long-Term Competition – A Game Theoretic Analysis Mimeo,
Hebrew University (1976), reprinted in Essays in Game Theory in Honor of Michael Maschler,
(N. Megiddo, ed.), Springer-Verlag, 1–15 (1994)
3. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. Academic, London
(1995)
4. Bensoussan, A., Frehse, J.: Stochastic games for N players. J. Optim. Theory Appl. 105(3),
543–565 (2000)
5. Bressan, A., Shen, W.: Semi-cooperative strategies for differential games. Int. J. Game Theory
32, 561–593 (2004)
6. Bressan, A., Shen, W.: Small BV solutions of hyperbolic noncooperative differential games.
SIAM J. Control Optim. 43(1), 194–215 (2004)
7. Buckdahn, R., Cardaliaguet, P., Rainer, C.: Nash equilibrium payoffs for nonzero sum
stochastic differential games. SIAM J. Control Optim. 43(2), 624–642 (2004)
8. Cardaliaguet, P.: Representations formulae for some differential games with asymetric infor-
mation. J. Optim. Theory Appl. 138(1), 1–16 (2008)
9. Cardaliaguet, P., Quincampoix, M.: Deterministic differential games under probability knowl-
edge of initial condition. Int. Game Theory Rev. 10(1), 1–16 (2008)
10. Dutta, P.K.: A folk theorem for stochastic games. J. Econ. Theory 66(1), 1–32 (1995)
11. Engwerda, J.C.: LQ Dynamic Optimization and Differential Games. Wiley, New York (2005)
12. Friedman, A.: Differential Games, Wiley-Interscience, New York (1971)
13. Gossner, O.: The folk theorem for finitely repeated games with mixed strategies. Int. J. Game
Theory 24, 95–107 (1995)
14. Hamadène, S., Lepeltier, J.-P., Peng, S.: BSDEs with continuous coefficients and stochastic
differential games. In: El Karoui et al. (eds.) Backward Stochastic Differential Equations,
Pitman Res. Notes in Math. Series, vol. 364, pp.161–175. Longman, Harlow (1997)
15. Kleimenov, A.F.: Nonantagonist Differential Games. “Nauka” Uralprime skoje Otdelenie
Ekaterinburg (1993) (in russian)
16. Kononenko, A.F.: Equilibrium positional strategies in non-antagonistic differential games
Dokl. Akad. Nauk SSSR 231(2), 285–288 (1976). English translation: Soviet Math. Dokl.
17(6), 1557–1560 (1977) (in Russian)
17. Rainer, C.: On two different approaches to nonzero sum stochastic differential games. Appl.
Math. Optim. 56, 131–144 (2007)
18. Rubinstein, A.: Equilibrium in supergames with the overtaking criterion. J. Econ. Theory 31,
227–250 (1979)
19. Solan, E.: Characterization of correlated equilibria in stochastic games. Int. J. Game Theory
30, 259–277 (2001)
20. Tolwinski, B., Haurie, A., Leitmann, G.: Cooperative equilibria in differential games. J. Math.
Anal. Appl. 119, 182–202 (1986)
Chapter 8
A Penalty Method Approach for Open-Loop
Variational Games with Equality Constraints
Dean A. Carlson and George Leitmann
Abstract In this paper we consider the possibility of solving a variational game

with equality constraints by using a penalty method approach. Under the assumption
that the unconstrained penalized games have open loop Nash equilibria we give
conditions on our model to ensure that there exists a subsequence of penalty param-
eters converging to infinity for which the corresponding sequence of solutions to
the penalized games converges to an open loop Nash equilibrium of the constrained
game. Our conditions are based on classical growth and convexity conditions found
in the calculus of variations. We conclude our paper with some remarks on obtaining
the solutions of the penalized games via Leitmann’s direct method.
Keywords Differential games • Nash equilibria • Equality constraints • Penalty

methods
8.1 Introduction
For the past several years, we have been studying a class of variational games which
may be viewed as an extension of the calculus of variations. In particular, our focus
has been on exploiting a direct solution method, originally due to G. Leitmann in [4],
to investigate sufficient conditions for open-loop Nash equilibria. The study of such
problems pre-dates J. Nash’s work in non-cooperative games, and their study can be
D.A. Carlson ()

American Mathematical Society, Mathematical Reviews, 416 Fourth Street,
Ann Arbor, MI 48103, USA
G. Leitmann
Graduate School, University of California at Berkeley, Berkeley, CA 94720, USA
162 D.A. Carlson and G. Leitmann
found in the 1920s with a series of mathematical papers by Roos [5–10] exploring
the dynamics of competition in economics. The last of Roo’s papers, provides an
extensive investigation into general variational games and provides analogues of
the standard first-order necessary conditions, such as the Euler–Lagrange equations,
the Weierstrass necessary condition, transversality conditions, Legendre’s necessary
condition and the Jacobi necessary condition. To date, most of these papers dealt
only with unconstrained problems (i.e., free problems of Lagrange type). In this
paper we investigate problems with equality constraints. Our approach is to consider
the feasibility of a penalty method for these problems which extends our recent
paper Carlson and Leitmann [1] from the case of a single-player game to an N-player
game. Penalty methods, of course, are not new and they have been used in a variety
of settings. However, in the study of games a quick search of MathSciNet produced
only 22 papers pertaining to penalty methods and games.
The remainder of the paper is organized as follows. In Sect. 8.2, we define
the class of games we consider and introduce the penalized game. In the next
section we digress to discuss some relevant results concerning growth conditions
and sequentially weak relative compactness. We prove our main result in Sect. 8.4.
In Sect. 8.5 we present an example illustrating our results and we conclude with
some brief remarks indicating how other known techniques might be useful.
8.2 The Class of Games Considered
We consider an N-person game in which the state of player j = 1, 2, . . . , N is a real-

valued function x j (·) : [a, b] → Rn j with fixed initial value x j (a) = xa j . The objective
of each player is to minimize a Lagrange type functional
b
I j (x(·)) = L j (t, x(t), ẋ j (t)) dt, (8.1)
a
over all of his/her possible admissible trajectories (see below), ẋ j (·) satisfying the
fixed end condition x(a) = xa and the equality constraint
g j (t, x(t), ẋ j (t)) = 0, a.e. a < t < b. (8.2)

.
The notation used here is that x = (x1 , x2 , . . . xN ) ∈ ∏Nj=1 Rn j = Rn , in which
n = n1 + n2 + · · · + nN . We assume, for each j = 1, 2, . . . N, that (L j , g j )(·, ·, ·) : A j →
R2 is a continuous function defined on the open set A j ⊂ R × Rn × Rn j with the
additional property that each g j (t, x, p j ) ≥ 0 for all (t, x, p j ) ∈ A j .
Remark 8.1. The above framework does allow us to include more general equality
constraints. Indeed if g j (t, x, p j ) is not nonnegative we can replace it by its square.
Additionally, if there is more than one constraint we can simply add them together.
8 A Penalty Method Approach for Open-Loop Variational Games 163
Clearly, the trajectories of the other players influences the decision of the jth player
and so each player is unable to minimize independently of the other players. As
a consequence, the players seek to play a (open-loop) Nash equilibrium instead.
To introduce this concept we first introduce the following notation. For each fixed
j = 1, 2, . . . , N, x = (x1 , x2 , . . . , xN ) ∈ Rn , and y j ∈ Rn j we use the notation
.
[x j , y j ] = (x1 , x2 , . . . , x j−1 , y j , x j+1 , . . . xN )
and if x(·) : [a, b] → Rn and y(·) : [a, b] → Rn j we use the notation

.
[x j , y](·) = (x1 (·), x2 (·), . . . , x j−1 (·), y(·), x j+1 (·), . . . xN (·))
With this notation we have the following definitions.

Definition 8.1. We say a function x(·) = (x1 (·), x2 (·), . . . , xN (·)) : [a, b] → Rn is an
admissible trajectory for the constrained variational game (8.1), (8.2) if and only if
it is absolutely continuous, satisfies the fixed end conditions
x j (a) = xa j , j = 1, 2, . . . , N, (8.3)
satisfies the equality constraints (8.2), satisfies (t, x j (t), ẋ j (t)) ∈ A j for almost all
t ∈ [a, b] and such that I j (x(·)) exists for all j = 1, 2, . . . , N .
Definition 8.2. Given an admissible trajectory x(·) for the constrained variational
game (8.1), (8.2) we say a function y j (·) : [a, b] → Rn j is an admissible trajectory
for player j relative to x(·) if and only if the function [x j , y j ](·) is an admissible
trajectory for the constrained variational game.
With these definitions we can now give the definition of a Nash equilibrium.
Definition 8.3. An admissible trajectory for the constrained variational game (8.1),
(8.2) x∗ (·) : [a, b] → Rn is called a Nash equilibrium if and only if for each player
j = 1, 2, . . . , N and each function y j (·) : [a, b] → Rn j that is an admissible trajectory
for player j relative to x∗ (·) one has
b
I j (x∗ (·)) = L j (t, x∗ (t), ẋ∗j (t)) dt
a
b
≤ L j (t, [x∗ (t) j , y j (t)], ẏ j (t)) dt
a
= I j ([x∗ j , y j ](·)). (8.4)
Remark 8.2. From the above definitions it is clear that when all of the players “play”
a Nash equilibrium, then each player’s strategy is his best response to that of the
other players. In other words, if player j applies any other admissible trajectory
relative to the Nash equilibrium, than his equilibrium trajectory, his cost functional
will not decrease.
Remark 8.3. The above dynamic game clearly is not the most general structure one
can imagine, even in a variational framework. In particular, the cost functionals
are coupled only through their state variables and not through their strategies (i.e.,
their time derivatives). While not the most general, one can argue that this form is
general enough to cover many cases of interest since in a “real-world setting,” an
individual player will not know the strategies of the other players (see e.g., Dockner
and Leitmann [3]).
To solve games of the type described above, one usually tries to solve the first-
order necessary conditions to obtain a candidate for the Nash equilibrium and then
apply a sufficient condition to verify that it is one. For the above constrained
problem, the first-order necessary conditions are complicated as a result of the
equality constraints. For such problems one must find a multiplier for each of the
constraints which in its most general form is a measure. As a consequence of this
fact we choose to consider a family of unconstrained games in which the objective
of each player incorporates the constraint multiplied by a positive constant. We now
describe this family of games.
8.2.1 The Penalized Games
To define the penalized games for each λ > 0 define the function Lλ , j : A j → R by
the formula
Lλ , j (t, x, p j ) = L(t, x, p j ) + λ g j (t, x, p j ), (8.5)
for each (t, x, p j ) ∈ A j . With this integrand we consider the unconstrained game in
which each player tries to minimize the integral functional
b
Iλ , j (x(·)) = Lλ , j (t, x(t), ẋ j (t)) dt, j = 1, 2, . . . N, (8.6)
a
over all of his admissible trajectories x j (·) satisfying the fixed end condition
x j (a) = xa j . Of course, the set of admissible trajectories for this family of un-
constrained games is larger than the set of admissible trajectories for the original
constrained game. For completeness we give the following definitions.
Definition 8.4. For a given λ > 0, a function x(·) : [a, b] → Rn is an admissible
trajectory for the unconstrained game (8.6) if it is absolutely continuous, satisfies
the fixed end condition (8.3), satisfies (t, x(t), ẋ j (t)) ∈ A j for almost all t ∈ [a, b] and
Iλ , j (x(·)) exists for all i = 1, 2, . . . , N.
Definition 8.5. Given an admissible trajectory x(·) for the unconstrained game
(8.6), we say a function y j (·) : [a, b] → Rn j is admissible for player j relative to
x(·) if the trajectory [x(·) j , y j (·)] is an admissible trajectory for the unconstrained
game (8.6).
Definition 8.6. Given a fixed λ > 0, we say an admissible trajectory x∗λ (·) : [a, b] →
Rn for the unconstrained variational (8.6) is a Nash equilibrium of for each j =
1, 2, . . . N and any function y j (·) : [a, b] → Rn j that is admissible for player j relative
to x∗λ (·) one has
b
Iλ , j (x∗λ (·)) = Lλ , j (t, x∗λ (t), ẋ∗λ j (t)) dt
a
b
≤ Lλ , j (t, [x∗λ (t) j , y j (t)], ẏ j (t)) dt
a
= Iλ , j ([x∗λ j , y j ](·)). (8.7)
We notice that if y(·) is an admissible trajectory for the constrained game (8.1),
(8.2) then it is an admissible trajectory for the unconstrained game (8.6) for any
value of λ ≥ 0 and I j (y(·)) = Iλ , j (y(·)). Thus, if it is the case that x∗λ (·) is both a
Nash equilibrium for the unconstrained game (8.6) and if it is also an admissible
trajectory for the constrained game (8.1), (8.2), then it is a Nash equilibrium for the
constrained game. Indeed if y(·) is admissible for player j for the constrained game
relative to x∗λ (·) (which implies that g j (t, [x∗λ (t) j , y(t) j ], ẏ(t)) = 0) then we have
∗j ∗j
I j (x∗λ (·)) = Iλ , j (x∗λ (·)) ≤ Iλ , j ([xλ , y](·)) = I j ([xλ , y](·)).
The above observation is useful only if we find that a Nash equilibrium for one
of the penalized games is an admissible trajectory for the constrained game. The
idea of a penalty method is that as the penalty parameter λ grows to infinity the
penalized term tends to zero. We now give conditions for when this occurs.
Lemma 8.1. Assume for each j = 1, 2, . . . , N that there exists constants A∗j and
B∗j such that for each admissible trajectory for the unconstrained games, x(·) one
has A∗j ≤ I j (x(·)) ≤ B∗j . Further assume that there exists a λ0 > 0 such that for
all λ > λ0 the unconstrained penalized games have Nash equilibria x∗λ (·) and that
corresponding to each there exists an absolutely continuous function yλ (·) such
that for each j = 1, 2, . . . , N the trajectories [x∗ j , yλ , j ](·) are admissible for the
constrained game (i.e., g(t, [x∗ j , yλ , j ](t), ẏλ j (t)) = 0 a.e. t ∈ [a, b]). Then one has,
b
lim g j (t, x∗λ (t), ẋ∗λ , j (t)) dt = 0, j = 1, 2, . . . , N.
λ →+∞ a
Proof. To prove this result we proceed by contradiction and assume that for some
j = 1, 2 . . . , N there exists a sequence {λk } and an 0 > 0 such that
b
g j t, x∗λk (t), ẋ∗λk , j (t) dt > 0 .
a
From our assumptions we now have the following inequalities

A∗j + 0 λk ≤ Iλk , j x∗λk (·)

≤ Iλk , j x∗λ j , yλk , j (·)
k

= I j x∗λ j , yλk , j (·)
k
≤ B∗j
for each k = 1, 2, . . .. Letting k → +∞ produces the obvious contradiction.

Remark 8.4. In the above, we gave conditions under which the penalty term
vanishes as the penalty parameter λ → +∞. This however does not imply that
lim g j (t, x∗λ (t), ẋ∗λ , j (t)) = 0

λ →∞
for almost all t ∈ [a, b]. Furthermore, we also know nothing about the convergence
of the trajectories {x∗λ (·)} as λ → ∞.
Remark 8.5. The existence of the A∗j ’s can be realized by assuming that the
integrands L j (·, ·, ·) are bounded below, which is not an unusual assumption for
minimization problems. The existence of the admissible trajectories yλ (·) is much
more difficult to satisfy, but it is easy to see that such a trajectory exists if the
equality constraints are not dependent on the other players and if there exists feasible
trajectories for the original constrained game. That is, g j (·, ·, ·) : [a, b] × Rn j ×
Rn j → R and there exists at least one trajectory y(·) satisfying the fixed end
condition (8.3) such that
g j (t, y j (t), ẏ j (t)) = 0, a.e. t ∈ [a, b], j = 1, 2, . . . , N.
In this case one only needs to take yλ (·) = y(·) for all λ > λ0 . Finally, the existence
of the constants B∗j is perhaps the most difficult to verify, unless one assumes that
the integrands L j (·, ·, ·) are also bounded above. However, we note that in our proof,
this condition can be weakend slightly by assuming that for each j = 1, 2, . . . , N
one has I j (x(·)) ≤ B∗j for all feasible trajectories for the original constrained game
(8.1), (8.2).
We now begin to investigate the convergence properties of the family of Nash
equilibria x∗λ (·) .
8.3 Growth Conditions and the Compactness

of the Set of Nash Equilibria
In this section we begin by reviewing some classical notions concerning the weak
topology of absolutely continuous functions and criteria for compactness of a
sequence of absolutely continuous functions. Following this discussion we apply
these ideas to our game model and the compactness of the set of Nash equilibria
{xλ (·)}λ >0 . Following this result we discuss the lower semicontinuity properties
of the integral functionals I j (·) and Iλ , j (·) with respect to the weak topology of
absolutely continuous funcntions. This will allow us to present our main result in
the next section. These questions have their roots in the classical existence results
of the calculus of variations.
The existence theory of the calculus of variations is a delicate balance between
the compactness properties of sets of admissible trajectories and the conditions
imposed on the integral functional to insure lower semicontinuity. Fortunately, this
is a well studied problem for the cases we consider here and indeed the results are
now classical. We begin first by discussing growth conditions and the weak topology
in the class of absolutely continuous functions.
The space of absolutely continuous functions, denoted as AC([a, b]; Rm ), is a
subspace of the set of continuous functions z(·) : [a, b] → Rm with the property that
their first derivatives ż(·) are Lebesgue integrable. Clearly they include the class of
piecewise smooth trajectories. Further, we also know that the fundamental theorem
of calculus holds, i.e.,
t
z(t) = z(a) + ż(s) ds
a
for every t ∈ [a, b] and moreover, whenever
t
z(t) = z(a) + ξ (s) ds, a ≤ t ≤ b,
a
holds for some Lebesgue integrable function ξ (·) : [a, b] → Rm , then necessarily we
have ż(t) = ξ (t) for almost all t ∈ [a, b]. The convergence structure imposed on this
space of functions is the usual weak topology which we define as follows.
Definition 8.7. A sequence {zk (·)}+∞ k=1 in AC([a, b]; R ) converges weakly to a
m
function y(·) ∈ AC([a, b]; R ) if there exists a sequence {tk }+∞

m
k=1 ⊂ [a, b] that
converges to tˆ ∈ [a, b] such that zk (tk ) → y(tˆ) as k → ∞ and for every bounded
measurable function ψ (·) : [a, b] → Rm one has (in which
·, · denotes the usual
scalar product in Rm )
b b
lim
ψ (s), żk (s) , ds =
ψ (s), ẏ(s) ds.
k→∞ a a
We make the following observations concerning the above definition. First, since we
are only interested in absolutely continuous functions satisfying the fixed endpoint
conditions (8.3), for any sequence of interest for us here we can take tk = a so
that the first condition in the above definition is automatically satisfied. Secondly,
the convergence property of the sequence of derivatives is referred to as weak
convergence in L1 ([a, b]; Rn ) (the space of Lebesgue integrable functions) of the
derivatives. As a consequence of these two observations, we need to consider the
weak compactness of a set of integrable functions. To this end we have the following
well known theorem.
Theorem 8.1. Let {h(·) : [a, b] → Rm } be a family of Lebesgue integrable functions.
The following two statements are equivalent.
1. The family {h(·)} is sequentially weakly relatively compact in L1 ([a, b]; Rm ).
2. There is a constant M ∈ R and a function Φ (·) : [0, ∞) → R such that
b
Φ (ζ )
lim = +∞ and Φ (h(s)) ds ≤ M
ζ →∞ ζ a
for all h(·) in the family.

Remark 8.6. The above theorem is a consequence of many people and we refer
the interested reader to the discussion in Cesari [2, Sect. 10.3]. The weak relative
compactness of the family {h(·)} means that there exists a subsequence {hk (·)}+∞
k=1
which converges weakly in L1 ([a, b]; Rm ) to some integrable function.
In light of the above theorem, we impose the following growth condition φ on the
integrands L j (·, ·, ·).
Growth condition φ . For each j = 1, 2, . . . , N there exists a function Φ j (·) :

[0, ∞) → R that is bounded below satisfying limζ →∞ Φ j (ζ )/ζ = +∞ such that
L j (t, z, p j ) ≥ Φ j (|p j |) for almost every t ∈ [a, b] and all (z, p j ) ∈ ×Rn × Rn j .
Remark 8.7. Observe that when the growth condition φ holds, the fact that each
Φ j (·) is bounded below ensures the existence of the constants A∗j required in
Lemma 8.1.
Theorem 8.2. Assume that the integrands L j (·, ·, ·) satisfy the growth condition φ
and that there exists constants B∗j , j = 1, 2, . . . , N, such that one has I j (x(·)) ≤ B∗j
for all feasible trajectories of the original constrained game (8.1), (8.2). Further
assume that for each λ sufficiently large there exists a Nash equilibrium x∗λ (·) for the
penalized unconstrained variational game (8.6) and that there exists an admissible
trajectory yλ (·) such that the trajectories [x∗λ j , yλ , j ](·) are feasible trajectories for
the constrained variational game (8.1), (8.2). Then there exists a sequence λk → +∞
as k → ∞ and a function x∗ (·) ∈ AC([0, 1]; Rn ) satisfying the endpoint conditions
(8.3) and such that {x∗λ (·)}+∞ ∗
k=1 converges weakly in AC([a, b]; R ) to x (·) as k → ∞.
n
k
Proof. As a consequence of the growth condition φ and the other assumed

conditions, for each j = 1, 2, . . . , N and all λ sufficiently large we have
b b
Φ j (|ẋ∗λ , j (t)|) dt ≤ L j (t, x∗ (t), ẋ∗λ , j (t)) dt
a a
b
≤ L j (t, x∗ (t), ẋ∗λ , j (t)) + λ g(t, x∗(t), ẋ∗λ , j (t)) dt
a
b
≤ L j (t, [x∗λ j , yλ , j ](t), ẏλ , j (t)) dt
a
≤ B∗j .
This implies, in view of Theorem 8.1, that the set of functions {ẋ∗λ , j (·)}λ >λ0 are
relatively weakly sequentially compact in L1 ([a, b]; Rn j ) for each j = 1, 2, . . . , N and
morover since x∗λ , j (a) = xa j for each λ > λ0 and each j, it follows that {x∗λ , j (·)}λ >λ0
is relatively sequentially weakly compact. This means, by a diagonalization argu-
ment, that we can find a sequence of {λk }+∞ ∗
k=1 and a function x (·) : [a, b] → R such
n
∗ +∞
that λk → +∞ as k → +∞ and the corresponding sequence {xλ (·)}k=1 converges
k
weakly to x∗ (·) in AC([a, b]; Rn ).

Remark 8.8. We note that when the L j (·, ·, ·) satisfy the growth condition φ and the
assumptions of Lemma 8.1 hold we can conclude that, when the Nash equilibria
x∗λ (·) exist for all λ > 0 sufficiently large, there exists a sequence {λk }+∞
k=1 , with
λk → +∞ as k → +∞, and an absolutely continuous trajectory x∗ (·) such the
x∗λ (·) → x∗ (·) weakly in AC([a, b]; Rn ) and that
k
b
lim g j (t, x∗λk (t), ẋ∗λk , j (t)) dt = 0, j = 1, 2, . . . , N.
k→∞ a
It remains to investigate whether the limit function x∗ (·) is a Nash equilibrium

for the constrained variational game. To see this we need to investigate the lower
semicontinuity properties of the integral functionals I j (·) and Iλ , j (·). To do this
we view these functionals as being defined on the product spaces of functions
.
S j = C([a, b]; Rn ) × L([a, b]; Rn j ). For the topology on S j we use the topology of
pointwise convergence in C([a, b]; Rn ) and the weak topology of L([a, b]; Rn j ). To
be precise we have the following definition.
.
Definition 8.8. We say a sequence of functions {(zk (·), pk (·))}+∞ k=1 ⊂ S, where S =
C([a, b]; Rn ) × L([a, b]; Rm ) converges to (ẑ(·), p̂(·)) ∈ S if zk (t) → ẑ(t) pointwise
almost everywhere as k → +∞ and pk (·) → p̂(·) weakly in L([a, b]; Rm ) as k → +∞.
Remark 8.9. We notice that if we have a sequence of admissible trajectories for

the unconstrained variational game, say {xk (·)}+∞ k=1 which converges weakly in
AC([a, b]; Rn ) to an admissible trajectory x̂(·) ∈ AC([a, b]; Rn ), then the sequence
{(xk (·), ẋk, j (·))}+∞
k=1 automatically converges to (x̂(·), x̂ j (·)) in S j for each j =
˙
1, 2, . . . , N.
The lower semicontinuity property for functionals defined on S is now given in

the next definition.
Definition 8.9. A functional K (·) : S → R is said to be lower semicontinuous
on S if lim infk→+∞ K ((zk (·), pk (·))) ≥ K ((ẑ(·), p̂(·))) for every (ẑ(·), p̂(·)) ∈ S
and every sequence {(zk (·), pk (·))}+∞
k=1 ⊂ S that converges weakly to (ẑ(·), p̂(·)).
With this definition we can state the following theorem.

Theorem 8.3. Let f (·, ·, ·) : [a, b] × Rn × Rm → R be a continuous function such
that p → f (t, x, p) is convex for all (t, x) ∈ [a, b] × Rn and such that there exists an
integrable function η (·) : [a, b] → [0, ∞) such that f (t, x, p) ≥ −η (t) for all (t, x, p) ∈
[a, b] × Rn × Rn . Then the functional J(·) : S → R defined by
b
J(z(·), p(·)) = f (t, x(t), p(t)) dt
a
is lower semicontinuous with respect to the topology of pointwise almost everywhere

convergence in x(·) and weak L1 -convergence in p(·) (as defined in Definition 8.8).
Proof. See Cesari [2, Theorem 10.8i].

Remark 8.10. This theorem has a long history dating to the beginning of the
twentieth century with the work of L. Tonelli. For a discussion of these matters
we refer the reader to the comprehensive monograph of Cesari [2, Chap. 10].
Our next result provides gives us what we desire, namely that if there exists a family
of Nash equilibria {x∗λ (·)}λ >0 for the unconstrained variational games (8.6), then
there exists a subsequence which converges to a Nash equilibrium of the constrained
variational game. Unfortunately, we have to restrict our model to the case when the
equality constraints are uncoupled. That is we now suppose that g j (·, ·, ·) : [a, b] ×
Rn j × Rn j → R.
Theorem 8.4. Assume that for each j = 1, 2, . . . , N the integrands L j (·, ·, ·) satisfy
the growth condition φ and that p → L j (t, x, p) is convex for each (t, x) ∈ [a, b]× Rn .
Further suppose that each g j (·, ·, ·) : [a, b] × Rn j × Rn j → [0, +∞) is continuous
and that the maps p → g j (t, x, p) are convex on Rn j for each (t, x) ∈ [a, b] × Rn j .
Additionally, assume that there exists constants B∗j such that I j (y(·)) ≤ B∗j for each
trajectory y(·) which is admissible for the constrained game (8.1), (8.2). Then,
if for each sufficiently large λ > 0 there exists a Nash equilibrium, x∗λ (·), of the
unconstrained penalized game (8.6) and if there exists at least one admissible
trajectory y(·) for the constrained game, then there exists a trajectory x∗ (·) which
is admissible for the constrained variational game such that x∗λ (·) → x∗ (·) weakly
k
in AC([a, b]; Rn ) as k → ∞ and moreover is a Nash equilibrium for the constrained
variational game.
Proof. We first notice that the existence of the sequence λk → +∞ as k → +∞ and
the trajectory x∗ (·) such that x∗λ (·) → x∗ (·) weakly in AC([a, b]; Rn ) as k → +∞
k
has been established in Theorem 2. Moreover, as a result of Lemma 1 (see also the
remarks following its proof) we also have that
b
lim g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt = 0, j = 1, 2, . . . N.
k→∞ a
Now, since the functions g j (·, ·, ·) are nonnegative and convex in their last n j
arguments we have that the integral functionals
b
G j ((z(·), p(·))) = g j (t, z(t), p(·)) dt, j = 1, 2, . . . N,
a
are lower semicontinuous on C([a, b]; Rn j ) × L([a, b]; Rn j ). In particular this means
that
b b
0 = lim g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt ≥ g j (t, x∗j (t), ẋ∗j (t)) dt ≥ 0,
k→+∞ a a
which implies that g j (t, x∗j (t), x∗j (t)) = 0 for almost all t ∈ [a, b]. This of course
says that x∗ (·) is an admissible trajectory for the constrained variational game. It
remains to show that it is a Nash equilibrium. To see this fix j = 1, 2, . . . N and
let y j (·) : [a, b] → Rn j be an admissible trajectory for player j relative to x∗ (·) and
consider the following inequalities for λk
b
I j (x∗λk (·)) = L j (t, x∗λ (t), ẋ∗λk , j (t)) dt
a
≤ Iλk , j (x∗λk (·))
b
= L j (t, x∗λk (t), ẋ∗ λk , j(t)) + λk g j (t, x∗λk , j (t), ẋ∗λk , j (t)) dt
a
≤ Iλk , j ([x∗λ j , y j (t)](·))

k
b
∗j
= L j (t, [xλ , y j ](t), ẏ j (t)) + λk g j (t, y j (t), ẏ j (t)) dt,
a k
b
= L j (t, [x∗λ j , y j ](t), ẏ j (t)) dt
a k
where the first inequality is a result of the nonnegativeness of g j (·, ·, ·), the second
inequality is a consequence of the fact that x∗λ (·) is a Nash equilibrium for the
k
unconstrained variational game with λ = λk and the last equality follows because
g j (t, y j (t), ẏ j (t)) = 0 for almost all t ∈ [a, b] by the definition of y j (·). Letting k → ∞
in the above gives
I j (x∗ (·)) ≤ lim inf I j (x∗λk (·))

k→∞
= lim inf Iλk , j (x∗λk (·))

k→∞
≤ lim inf Iλk , j ([x∗λ j , y j ](·))

k→∞ k
= I j ([x∗ j , y j ](t)).
In the above, the first inequality is a consequence of the lower semicontinuity

properties of the functionals I j (·), the second equality follows from properties of
limit inferiors and the nonnegativity of g j (·, ·, ·), the next inequality follows from
the Nash equilibria properties of the x∗λ (·)’s and the last equality follows from the
k
fact that g j (t, y j (t), ẏ j (t)) = 0 a.e. on [a, b] and Lebesgue’s dominated convergence
theorem. Since the trajectory y j (·) is any trajectory for player j which is admissible
relative to x∗ (·) the desired result is proved.

Remark 8.11. The above theorem provides us with what we desire. Unfortunately,
we must restrict ourselves to equality constraints that are uncoupled with respect to
each of the players. The above proof breaks down in the general case at the very
end since we cannot ensure the equality g j (t, [x∗λ j , y j ](t), ẏ j (t)) = 0 for almost all
k
t ∈ [a, b].
8.4 Example: The Single Player Case
A firm produces a good k whose quantity at time t ∈ [a, b] is given by a production

process
k̇(t) = f (k(t)) + c(t), t ∈ (0, b),
in which f (k(t)) is a production rate and c(t) denotes a rate of external investments
required for the production (i.e., amount of raw materials). The goal of the firm is
to maximize its profit. The price per unit of each unit is given by a demand p =
p(k(t)), a function that depends on the available inventory of the firm, and the cost
of production C(c(t)) depends on the external investment rate at time t. Thus the
objective of the firm is to maximize a functional of the form
b
I(k(·)) = p(k(t))k(t) − C(c(t)) dt
0
b
= p(k(t))k(t) − C(k̇(t) − f (k(t))) dt. (8.8)
a
over all inventory streams t

→ k(t) satisfying a given initial level k(0) = k0 . This
gives a simple unconstrained variational game. The production processes of the firm
generates a pollutant s(t) at each time t which the government has mandated must
be reduced to a fraction of its initial level (i.e. to α s(0), α ∈ (0, 1)) over the time
interval [0, b]. That is, s(b) = α s(0). Each firm generates pollution according to the
following process:
ṡ(t) = g(k(t)) − μ s(t), t ∈ (0, b),

s(0) = s0 , s(b) = α s0 , (8.9)
in which g(k(t)) denotes the rate of production of the pollutant by the firm and
μ > 0 is a constant representing the “natural” abatement of the pollutant. This gives
our differential constraint. Thus the problem for the firm is to maximize its profit
given by (8.8) while satisfying the pollution constraint (8.9) and the end conditions
(k(0), s(0)) = (k0 , s0 ) and s(b) = sb (here of course we interpret sb = α s0 but this is
not necessary for the formulation of the problem).
If we choose for specificity p(k) = π k, f (k) = α k + β , g(k) = γ k and C(c) = 12 c2
with all of the coefficients positive constants, the above calculus of variations
problem becomes one in which the objective functional is quadratic and the
differential side constraint becomes linear.
To apply our theory we consider the family of unconstrained variational problems
(Pλ ) of minimizing
b
1
[k̇(t) − α k(t)) − β ]2 − π k(t)2 + λ [ṡ(t) + μ s(t) − γ k(t)]2 dt, (8.10)
0 2
over all piecewise continuous (k(·), s(·)) : [0, b] → R2 satisfying the end conditions
(k(0), s(0)) = (k0 , s0 ), s(b) = sb . (8.11)
We define Lλ (·, ·) by the integrand of the objective. That is,
1
Lλ ((k, s), (p, q)) = [p − α k − β ]2 − π k2 + λ [q + μ s − γ k]2.
2
Further we note that since k(b) is unspecified the solution (kλ (·), sλ (·)) must satisfy
the transversality condition

∂ Lλ
= k̇λ (b) − α kλ (b) − β = 0.
∂ p ((kλ (b),sλ (b)),(k̇λ (b),ṡλ (b)))
This supplies us with the terminal condition for the state kλ (t) at t = b.
The Euler–Lagrange equations for (Pλ ) is given by
d
[k̇(t) − α k(t) − β ] = −α (k̇(t) − α k(t) − β ) − 2π k(t)
dt
−2λ γ (ṡ(t) + μ s(t) − γ k(t))
d
2λ [ṡ(t) + μ s(t) − γ k(t)] = 2μλ (ṡ(t) + μ s(t) − γ k(t))).
dt
For a solution (kλ (·), sλ (·)) of the above system define Λ (·) = ṡ(·) + μ s(·) − γ k(·)
and observe that the second equation becomes
d
Λ (t) = μΛ (t), t ∈ (0, b),
dt
which has the general solution Λ (t) = Λ0λ eμ t , where Λ0λ is a constant to be
determined. Observe that the constant Λ0λ does depend on λ since in general kλ (·)
will. Substituting Λ (·) for (ṡλ (t) + μ sλ (t) − γ kλ (t)) into the first equation gives us
the uncoupled equation,
d λ
[k̇ (t) − α kλ (t) − β ] = −α (k̇λ (t) − α kλ (t) − β ) − 2π kλ (t) − 2λ γΛ0λ eμ t ,
dt
or after simplifying becomes
k̈λ (t) + (2π − α 2 )kλ (t) = αβ − 2λ γΛ0λ eμ t .
The general solution of this equation has the form kλ (t) = kc (t)+k1 (t)+k2 (t) where
kc (·) is the general solution of the homogeneous equation k̈(t) + (2π − α 2 )k(t) = 0,
k1 (·) solves the nonhomogeneous equation k̈(t) + (2π − α 2 )k(t) = αβ and k2 (·)
solves the nonhomogeneous equation k̈(t) + (2π − α 2 )k(t) = −2λ γΛ0λ eμ t . Each
of these equations are easy to solve. For simplicity we assume α 2 − 2π > 0 (to
insure real roots of the characteristic equation). Using elementary techniques we
have
αβ 2λ γΛ0λ
kc (t) = A∗ ert + B∗e−rt , k1 (t) = , and k2 (t) = eμt .
α 2 − 2π α − 2π − μ 2
2
giving us
αβ 2λ γΛ0λ
kλ (t) = − + eμ t + A∗ ert + B∗ e−rt ,
α 2 − 2π α 2 − 2π − μ 2
√
in which r = α 2 − 2π . Using the initial value for k(0) = k0 and the transversality
condition we obtain the following two equations for A∗ and B∗ .
αβ 2λ γΛ0λ
A ∗ + B ∗ = k0 + −
α 2 − 2π α 2 − 2π − μ 2
α 2β 2λ γΛ (μ − α ) μ b
(r − α ) erb A∗ + (−r − α ) e−rb B∗ = β − − 2 0λ e .
α − 2π
2 α − 2π − μ 2
Using Cramer’s rule we get the following expressions for A∗ and B∗ :

1 αβ
A∗ = k0 (−r − α ) e−rb + 2 (−r − α ) e−rb + α
Δ α − 2π
2λ γΛ0λ
μb −rb
+ 2 (μ − α ) e − (−r − α ) e
α − 2π − μ 2

∗ 1 αβ
B = β − k0 (r − α ) erb − 2 α + (r − α ) erb
Δ α − 2π
2λ γΛ0λ
μb
(r − α ) e rb
− ( μ − α ) e ,
α 2 − 2π − μ 2
in which Δ = 1/[(−r − α ) e−rb − (r − α ) erb ] (i.e., the reciprocal of the determinant

of the coefficient matrix of the 2× 2 linear system for A∗ and B∗ ). To proceed further,
and at the same time trying to keep things managable, observe that we can write
A∗ = A + BΛ0λ λ and B∗ = C + DΛ0λ λ in which A, B, C and D are constants
independent of Λ0λ and λ . We can continue this observation to obtain that
αβ
kλ (t) = − + EΛ0λ λ eμ t + (A + BΛ0λ λ ) ert + (C + DΛ0λ λ ) e−rt ,
α 2 − 2π
in which E is also a constant that is independent of Λ0λ and λ .
We now determine sλ (·). To this end we use the definition of Λ (t) = Λ0λ eμ t to
obtain the differential equation
ṡλ (t) + μ sλ (t) = γ kλ (t) + Λ0λ eμ t

αβ
=− + (EΛ0λ λ + Λ0λ ) eμ t
α 2 − 2π
+(A + BΛ0λ λ ) ert + (C + DΛ0λ λ ) e−rt .
This is a first-order linear differential equation which is easily solved by multiplying

both sides by eμ t . The unique solution satisfying the fixed initial condition sλ (0) =
s0 is given by
αβ 1
sλ (t) = s0 e−μ t − (1 − e−μ t ) + (EΛ0λ λ + Λ0λ )(eμ t − e−μ t )
μ (α 2 − 2π ) 2μ
1 1
(A + BΛ0λ λ )(ert − e−μ t ) + (C + DΛ0λ λ )(e−rt − e−μ t ).
r+μ −r + μ
Using the end condition sλ (b) = sb we find Λ0λ to be

μb −1
e μ b − e− μ b e − e− μ b erb − e−μ b e−rb − e−μ b
Λ0 λ = + E+ B+ D λ
2μ 2μ r+μ −r + μ

−μ b 1 − e− μ b erb − e−μ b e−rb − e−μ b
× sb − s0 e − αβ + A+ C ,
μ (α 2 − 2π ) r+μ −r + μ
which we observe has the form

F
Λ0 λ = ,
G + Hλ
where F , G, and H are constants depending only on the data.
In the above we have successfully solved the first-order necessary conditions
(i.e., the Euler–Lagrange equations and the relevant transverality condition) for each
positive λ . Further, it is clear that we have obtained the unique solution in this case.
Moreover, since the objective functional is quadratic in the derivative arguments it
is clear that there exists an optimal solution for each λ which must be (kλ (·), sλ (·)).
We now explore what happens when λ → +∞. First we observe that Λ0λ → 0 as
λ → +∞. This means that we have
lim ṡλ (t) + μ sλ (t) − γ kλ (t) = lim Λ0λ eμ t = 0, for all t ∈ [a, b],
λ →+∞ λ →+∞
which implies that in the limit the constraint is satisfied. Further, we notice that
Λ0λ λ → F /H as λ → +∞ from which one easily sees that (kλ (·), sλ (·)) →
(k∗ (·), s∗ (·)) as λ → +∞ where

αβ EF μ t BF DF
k∗ (t) = − + e + A + e rt
+ C + e−rt
α 2 − 2π H H H
αβ F E μt
s∗ (t) = s0 e−μ t − (1 − e−μ t ) + (e − e−μ t )
μ (α 2 − 2π ) 2μ H

1 BF 1 DF
A+ (ert − e−μ t ) + C+ (e−rt − e−μ t ).
r+μ H −r + μ H
Therefore, as a consequence of our theoretical results we now know that t →

(k∗ (t), s∗ (t)) is a solution of our original problem.
8.5 Example: The Two-Player Case
If we extend the above example to two players each firm produces an equivalent
good whose quantity at time t ∈ [a, b] is given by a production process
k̇ j (t) = f j (k j (t)) + c j (t), t ∈ (a, b), j = 1, 2,
in which f j (k j (t)) is a production rate and c j (t) denotes a rate of external

investments required for the production (i.e., amt of raw materials). The goal of each
firm is to maximize its profit. The price per unit of each unit is given by a demand
function p = p(k1 (t), k2 (t)), a function that depends on the total inventory of each
firm, and the cost of production C j (c j (t)) depends on the external investment rate at
time t. Thus the objective of each firm is to maximize a functional of the form
b
I j (k1 (·), k2 (·)) = p(k1 (t), k2 (t))k j (t) − C j (c j (t)) dt
a
b
= p(k1 (t), k2 (t))k j (t) − C j (k̇ j (t) − f j (k j (t))) dt. (8.12)
a
over all inventory streams t → k j (t) satisfying a given initial level k j (a) = ka j . This
gives a simple unconstrained variational game. The production processes of each
firm generate the same pollutant s(t) = s1 (t)+ s2 (t) at each time t (here si (t) denotes
the pollutant level due to the ith player) which the government has mandated must
be reduced to a fraction of its initial level (i.e. to α s(a), α ∈ (0, 1)) over the time
interval [a, b]. That is, s(b) = α s(a). Each firm generates pollution according to the
following process:
ṡ j (t) = g j (k j (t)) − μ s j (t), t ∈ (a, b), (8.13)

s j (a) = sa j ,
in which g j (k j (t)) denotes the rate of production of the pollutant by firm j = 1, 2

and μ > 0 is a constant representing the “natural” abatement of the pollutant. This
gives our differential constraint. Thus the problem for each firm is to maximize its
profit given by (8.12) while satisfying the pollution constraint (8.13) and the end
conditions (k j (a), s j (a)) = (ka j , sa j ) and s j (b)) = sb j (here of course we interpret
sb j = α sa j .
Making the same simplifying assumptions on the form of the relevant functions
as in the single player case we obtain a linear-quadratic game. In this case the
technique used above can be modified to solve the Euler–Lagrange equation.
Unfortunately, in this case it is well known the necessary conditions are not
sufficient and so one can not guarantee that the solutions obtained for each λ are
Nash equilibria. One technique which might be applicable would be a variation of
Leitmann’s direct method which provides sufficient conditions for open-loop Nash
equilibria. To do this requires the investigation of variational games with unspecified
right endpoint conditions. In the single player case such a modification has been
investigated briefly by Wagener [11].
8.6 Conclusions
In this paper we explored the use of a penalty method to find open-loop Nash
equilibria for a class of variational games. We showed that using classical as-
sumptions, with roots in the calculus of variations, it was possible to establish our
results. We presented an example of a single-player game in detail and gave some
indication of the difficulties encountered when treating the multi-player case. Our
analysis suggests that a new extension of Leitmann’s direct method to problems
with unspecified right endpoint conditions could prove useful in using this penalty
method to determine Nash equilibria.
Acknowledgements The authors would like to dedicate this paper to the memory of our friend
and colleague Thomas L. Vincent. Additionally, the first author would like to offer his best wishes
to his friend and co-author George Leitmann on the occasion of his eighty-fifth birthday.
References
1. Carlson, D.A., Leitmann, G.: An equivalent problem approach to absolute extrema for calculus
of variations problems with differential constraints. Dyn. Contin. Discrete Impuls. Syst. Ser. B
Appl. Algorithms 18(1), 1–15 (2011)
2. Cesari, L.: Optimization-theory and applications: problems with ordinary differential equa-
tions. In: Applications of Applied Mathematics, vol. 17. Springer, New York (1983)
3. Dockner, E.J., Leitmann, G.: Coordinate transformation and derivation of open-loop Nash
equilibrium. J. Optim. Theory Appl. 110(1), 1–16 (2001)
4. Leitmann, G.: A note on absolute extrema of certain integrals. Int. J. Non-Linear Mech. 2,
55–59 (1967)
5. Roos, C.F.: A mathematical theory of competition. Am. J. Math. 46, 163–175 (1925)
6. Roos, C.F.: Dynamical economics. Proc. Natl. Acad. Sci. 13, 145–150 (1927)
7. Roos, C.F.: A dynamical theory of economic equilibrium. Proceedings of the National
Academy of Sciences 13, 280–285 (1927)
8. Roos, C.F.: A dynamical theory of economics. J. Polit. Econ. 35(5), 632–656 (1927)
9. Roos, C.F.: Generalized Lagrange problems in the calculus of variations. Trans. Am. Math.
Soc. 30(2), 360–384 (1928)
10. Roos, C.F.: Generalized Lagrange problems in the calculus of variations. Trans. Am. Math.
Soc. 31(1), 58–70 (1929)
11. Wagener, F.O.O.: On the Leitmann equivalent problem approach. J. Optim. Theory Appl.
142(1), 229–242 (2009)
Chapter 9
Nash Equilibrium Seeking for Dynamic Systems
with Non-quadratic Payoffs
Paul Frihauf, Miroslav Krstic, and Tamer Başar
Abstract We consider general, stable nonlinear differential equations with N

inputs and N outputs, where in the steady state, the output signals represent the
payoff functions of a noncooperative game played by the steady-state values of
the input signals. To achieve locally stable convergence to the resulting steady-
state Nash equilibria, we introduce a non-model-based approach, where the players
determine their actions based only on their own payoff values. This strategy is
based on the extremum seeking approach, which has previously been developed for
standard optimization problems and employs sinusoidal perturbations to estimate
the gradient. Since non-quadratic payoffs create the possibility of multiple, isolated
Nash equilibria, our convergence results are local. Specifically, the attainment of
any particular Nash equilibrium is not assured for all initial conditions, but only
for initial conditions in a set around that specific stable Nash equilibrium. For non-
quadratic costs, the convergence to a Nash equilibrium is not perfect, but is biased
in proportion to the perturbation amplitudes and the higher derivatives of the payoff
functions. We quantify the size of these residual biases.
Keywords Extremum seeking • Learning • Nash equilibria • Noncooperative

games
P. Frihauf () • M. Krstic

Department of Mechanical and Aerospace Engineering, University of California,
San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0411, USA
T. Başar
Coordinated Science Laboratory, University of Illinois, 1308 West Main Street,
Urbana, IL 61801-2307, USA
180 P. Frihauf et al.
9.1 Introduction
We study the problem of solving noncooperative games with N players in real

time by employing a non-model-based approach, where the players’ actions are the
inputs to a general, stable nonlinear differential equation, whose outputs are the
players’ payoff values. By utilizing deterministic extremum seeking with sinusoidal
perturbations, the players achieve local attainment of their Nash strategies without
the need for any model information. We analyze games where the dynamic system
acts on a faster time scale compared to the time scale of the players’ strategies
and where the players have non-quadratic payoff functions, which may result
in multiple, isolated Nash equilibria. We study the effect of these non-quadratic
functions on the players’ convergence to a Nash equilibrium.
Most algorithms designed to achieve convergence to a Nash equilibrium require
modeling information for the game and assume that the players can observe the
actions of the other players. Players update their actions based on the gradient of
their payoff functions in [34]. In [25], the stability of general player adjustments is
analyzed under the assumption that a player’s response mapping is a contraction,
which, for games with quadratic payoff functions, is ensured by a diagonal
dominance condition. Distributed iterative algorithms are designed in [24] for
the computation of equilibria in a general class of non-quadratic convex Nash
games with conditions for the contraction of general nonlinear operators to achieve
convergence specified. A strategy known as fictitious play (employed in finite
games) depends on the actions of the other players so that a player can devise a
best response. A dynamic version of fictitious play and gradient response, which
also includes an entropy term, is developed in [37] and is shown to converge to a
mixed-strategy Nash equilibrium in cases that previously did not converge. In [44],
a synchronous distributed learning algorithm, where players remember their own
actions and utility values from the previous two time steps, is shown to converge in
probability to the set of restricted Nash equilibria.
Other works have focused on learning strategies where players determine their
actions based on their own payoff functions. No-regret learning for players with
finite strategy spaces is shown in [20] to have similar convergence properties
as fictitious play. In [18], impossibility results are established, which show that
uncoupled dynamics do not converge to a Nash equilibrium in general. These
results are extended in [19] to show that for finite games, uncoupled strategies
that incorporate random experimentation and a finite memory lead to almost sure
convergence to a Nash equilibrium. In [13], players in finite games employ regret
testing, which utilizes only a player’s payoff values and randomly evaluates alternate
strategies, to converge in probability to the set of stage-game Nash equilibria.
Another completely uncoupled learning strategy, where players experiment with
alternative strategies, is presented in [42] and leads to behaviors that come close
to a pure Nash equilibrium “a high proportion of the time.” Lower bounds on
the number of steps required to reach a Nash equilibrium, which are exponential
in the number of players, are derived in [17] for finite games when players use
9 Nash Equilibrium Seeking for Dynamic Systems with Non-quadratic Payoffs 181
uncoupled strategies. An approach, which is similar to our Nash seeking method

(found in [23] and in this paper), is studied in [39] to solve coordination problems
in mobile sensor networks. Additional results on learning in games can be found
in [9, 14, 38]. Other diverse engineering applications of game theory include
the design of communication networks in [1, 4, 27, 35], integrated structures and
controls in [33], and distributed consensus protocols in [6,28,36]. A comprehensive
treatment of static and dynamic noncooperative game theory can be found in [5].
The results of this work extend the methods of extremum seeking [3,26,29,31,40,
41], originally developed for standard optimization problems. The extremum seek-
ing method, which performs non-model based gradient estimation, has been used in
a variety of applications, such as steering vehicles toward a source in GPS-denied
environments [10, 11, 43], controlling flow separation [7] and Tokamak plasmas
[8], reducing the impact velocity of an electromechanical valve actuator [32], and
optimizing nonisothermal continuously stirred tank reactors [15] and HCCI engine
control [22].
In this work, N players in a noncooperative game that has a dynamic mapping
from the players’ actions to their payoff values employ an extremum seeking
strategy to stably attain a pure Nash equilibrium. The key feature of our approach
is that the players are not required to know the mathematical model of their payoff
function or the underlying model of the game. They need to measure only their
own payoff values to determine their respective real-valued actions. Consequently,
this learning strategy is radically uncoupled according to the terminology of [13].
We consider payoff functions that satisfy a diagonal dominance condition and are
non-quadratic, which allows for the possibility of multiple, isolated Nash equilibria.
Our convergence result is local in the sense that convergence to any particular
Nash equilibrium is assured only for initial conditions in a set around that specific
stable Nash equilibrium. Moreover, this convergence is biased in proportion to the
perturbation amplitudes and the higher derivatives of the payoff functions.
The paper is organized as follows: we provide the general problem statement
in Sect. 9.2 and introduce the Nash equilibrium seeking strategy in Sect. 9.3. In
Sects. 9.4 and 9.5, we use general averaging theory and singular perturbation theory
to analyze the convergence properties of the game. Finally, we provide a numerical
example for a two-player game in Sect. 9.6 and conclude with Sect. 9.7.
9.2 Problem Statement
Consider a noncooperative game with N players and a dynamic mapping from the
players’ actions ui to their payoff values Ji , which the players wish to maximize.
Specifically, we consider a general nonlinear model,
ẋ = f (x, u), (9.1)

Ji = hi (x), i = 1, . . . , N, (9.2)
where x ∈ Rn is the state, u ∈ RN is a vector of the players’ actions, Ji ∈ R is the

payoff value of player i, f : Rn × RN → Rn and hi : Rn → R are smooth, and hi is
a possibly non-quadratic function. The inclusion of the dynamic system (9.1) in the
game structure and the consideration of non-quadratic payoff functions is motivated
by oligopoly games with nonlinear demand and cost functions [30].
We make the following assumptions about the N-player game:
Assumption 9.1. There exists a smooth function l : RN → Rn such that
f (x, u) = 0 if and only if x = l(u). (9.3)
Assumption 9.2. For each u ∈ RN , the equilibrium x = l(u) of the system (9.1) is
locally exponentially stable.
Hence, we assume that for any action by the players, the plant is able to stabilize
the equilibrium. We can relax the requirement for each u ∈ RN as we need to only be
concerned with the action sets of the players, namely, u ∈ U = U1 × · · · ×UN ⊂ RN .
The following assumptions are central to our Nash seeking scheme as they ensure
that at least one stable Nash equilibrium exists.
Assumption 9.3. There exists at least one, possibly multiple, isolated stable Nash
equilibria u∗ = [u∗1 , . . . , u∗N ] such that
∂ (hi ◦ l) ∗
(u ) = 0, (9.4)
∂ ui
∂ 2 (hi ◦ l) ∗
(u ) < 0, (9.5)
∂ u2i
for all i ∈ {1, . . . , N}.

Assumption 9.4. The matrix,
⎡ ∂ 2 (h ∗ ∂ 2 (h1 ◦l)(u∗ ) ∂ 2 (h1 ◦l)(u∗ )
⎤
1 ◦l)(u )
∂ u21 ∂ u1 ∂ u2 ··· ∂ u1 ∂ uN
⎢ ⎥
⎢ ∂ 2 (h2 ◦l)(u∗ ) ∂ 2 (h2 ◦l)(u∗ ) ⎥
⎢ ∂ u1 ∂ u2 ∂ u22 ⎥
Λ =⎢
⎢ ..
⎥,
⎥ (9.6)
⎢ .. ⎥
⎣ . . ⎦
∂ 2 (hN ◦l)(u∗ ) ∂ 2 (hN ◦l)(u∗ )
∂ u1 ∂ uN ∂ u2N
is diagonally dominant and hence, nonsingular.

By Assumptions 9.3 and 9.4, Λ is Hurwitz.
For this game, we seek to attain a Nash equilibrium u∗ stably by employing an
algorithm that does not require the players to know the actions of the other players,
the mathematical form of the payoff functions hi , or the dynamical system f .
9.3 Nash Equilibrium Seeking
Deterministic extremum seeking, which is a non-model-based real-time optimiza-

tion strategy, enables player i to attain a Nash equilibrium by evolving its action ui
according to the measured value of its payoff Ji . Specifically, when employing this
algorithm, player i implements the following strategy:
ui (t) = ûi (t) + μi(t), (9.7)

û˙i (t) = ki μi (t)Ji (t), (9.8)
where μi (t) = ai sin(ωi t + ϕi ), Ji is given by (9.2), and ai , ki , ωi > 0. Figure 9.1

depicts a noncooperative game played by two players implementing the extremum
seeking strategy (9.7)–(9.8) to attain a Nash equilibrium. Note how this strategy
requires only the payoff value Ji to be known.
We select the parameters ki = εω Ki = O(εω ) where ω = mini {ωi } and ε , ω are
small positive constants. As will be seen later, the perturbation signal gain ai must
also be small. Intuitively, ω is small since the players’ actions should evolve more
slowly than the dynamic system, creating an overall system with two time scales. In
the limiting case of an infinitely fast dynamic system, i.e., a static game where the
players’ actions are direct inputs to their payoff functions, ω no longer needs to be
small.
We denote the error relative to the Nash equilibrium as
ũi (t) = ui (t) − μi(t) − u∗i ,

= ûi (t) − u∗i . (9.9)
Fig. 9.1 Deterministic Nash

seeking scheme employed by
two players in a dynamic
system with non-quadratic
payoffs
and formulate the error system in the time scale τ = ω t as
dx
ω = f (x, u∗ + ũ + μ (τ )), (9.10)
dτ
dũi
= ε Ki μi (τ )hi (x), (9.11)
dτ
where ũ = [ũ1 , . . . , ũN ], μ (τ ) = [μ1 (τ ), . . . , μN (τ )] and μi (τ ) = ai sin(ωi τ /ω + ϕi ).

The system (9.10)–(9.11) is in the standard singular perturbation form with ω as a
small parameter. Since ε is also small, we anlayze (9.10)–(9.11) using the general
averaging theory for the quasi-steady state of (9.10), followed by the use of the
singular perturbation theory for the full system.
9.4 General Averaging Analysis
For the averaging analysis, we first “freeze” x in (9.10) at its quasi-steady state
x = l(u∗ + ũ + μ (τ )) (9.12)
and substitute (9.12) into (9.11) to obtain the “reduced system,”
dũi
= ε Ki μi (τ )pi (u∗ + ũ + μ (τ )), (9.13)
dτ
where pi (u∗ + ũ + μ (τ )) = (hi ◦ l)(u∗ + ũ + μ (τ )). This system’s form allows for the
use of general averaging theory [12, 21] and leads to the result:
Theorem 9.1. Consider the system (9.13) for an N-player game under Assumptions
9.3 and 9.4 and where ωi = ω j , ωi
= ω j + ωk , 2 ωi
= ω j + ωk , and ωi = 2 ω j + ωk
for all distinct i, j, k ∈ {1, . . . , N}. There exist M, m > 0 and ε̄ , ā such that, for all
ε ∈ (0, ε̄ ) and ai ∈ (0, ā), if |Δ (0)| is sufficiently small, then for all τ ≥ 0,

Δ (τ ) ≤ M e−mτ Δ (0) + O ε + maxi a3 , (9.14)

i
where

Δ (τ ) = ũ1 (τ ) − ∑Nj=1 c1j j a2j , . . . , ũN (τ ) − ∑Nj=1 cNjj a2j , (9.15)
and
⎡ ⎤
∂ 3 p1
⎡ ⎤ ∂ u1 ∂ u2j
(u∗ )
⎢ ⎥
c1j j ⎢ .. ⎥
⎢ . ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ∂ 3 p j−1 ∗ ⎥
⎢ ⎥
⎢ j−1 ⎥
⎢c j j ⎥ ⎢ ∂ u j−1 ∂ u2j (u )⎥
⎢ j ⎥ ⎢ ⎥
⎢ c ⎥ = − 1 Λ −1 ⎢ 1 ∂ pj ∗ ⎥
3
⎢ jj ⎥ ⎢ (u ) ⎥. (9.16)
⎢ j+1 ⎥ 4 ⎢ 2 ∂uj 3
⎥
⎢c j j ⎥ ⎢ ⎥
⎢ ∂ 2 p j+1 (u∗ )⎥
3
⎢ . ⎥ ⎢ ∂ ∂ ⎥
⎢ . ⎥ u
⎢ j j+1
u
⎥
⎣ . ⎦ ⎢ .. ⎥
N ⎢ . ⎥
cjj ⎣ 3 ⎦
∂ pN ∗
∂u ∂u
2 (u )
j N
Proof. As already noted, the form of (9.13) allows for the application of general
averaging theory, which yields the average system,
T
dũave 1
i
= ε Ki lim μi (τ )pi (u∗ + ũave + μ (τ )) dτ . (9.17)
dτ T →∞ T 0
The equilibrium, ũe = [ũe1 , . . . , ũeN ], of (9.17) satisifies

T
1
0 = lim μi (τ )pi (u∗ + ũe + μ (τ )) dτ , (9.18)
T →∞ T 0
for all i ∈ {1, . . . , N}, and we postulate that ũe has the form,
N N N
ũei = ∑ bij a j + ∑∑ cijk a j ak + O max ai .
3
i
(9.19)
j=1 j=1 k≥ j
By expanding pi about u∗ in (9.18) with a Taylor polynomial and substituting

(9.19), the unknown coefficients bij and cijk can be determined. The Taylor polyno-
mial approximation [2] requires pi to be k + 1 times differentiable, namely,
k
Dα pi (u∗ ) e
pi (u∗ + ũe + μ (τ )) = ∑ α !
(ũ + μ (τ ))α
|α |=0
Dα pi (ζ ) e
+ ∑ α !
(ũ + μ (τ ))α ,
|α |=k+1

k
Dα pi (u∗ ) e
= ∑ α! (ũ + μ (τ ))α
+ O max
i
a k+1
i , (9.20)
|α |=0
where ζ is a point on the line segment that connects the points u∗ and u∗ + ũe + μ (τ ).
In (9.20), we have used multi-index notation, namely, α = (α1 , . . . , αN ), |α | = α1 +
· · · + αN , α ! = α1 ! · · · αN !, uα = uα1 1 · · · uαNN , and Dα (hi ◦ l) = ∂ |α | pi /∂ uα1 1 · · · ∂ uαNN .
The second term on the last line of (9.20) follows by substituting the postulated form
of ũe (9.19).
We select k = 3 to capture the effect of the third order derivative on the system
as a representative case. The effect of higher-order derivatives can be studied if
the third order derivative is zero. Substituting (9.20) into (9.18) and computing the
average of each term gives

a2i e ∂ 2 pi ∗ N
∂ 2 pi
0=
2
ũi
∂ ui
2
(u ) + ∑ ũej
∂ u i ∂ u j
(u∗ )
=i
j

1 e 2 a2i ∂ 3 pi ∗ N
∂ 3 pi
+ (ũi ) + (u ) + ũei ∑ ũej 2 (u∗ )
2 8 ∂ ui 3
=i
j ∂ u i ∂ u j

1 e
2 a j
2
N
∂ 3 pi
+∑ ũ j + (u∗ )
=i
j
2 4 ∂ ui ∂ u2j

N N
∂ 3 pi
+ ∑ ∑ ũ j ũk e e ∗
(u ) + O(max a5i ), (9.21)
=i
j k > j ∂ u i ∂ u j ∂ u k i
k
=i
where we have noted (9.4), utilized (9.19), and computed the integrals shown in the
appendix.
Substituting (9.19) into (9.21) and matching first order powers of ai gives
⎡ ⎤ ⎡ 1⎤ ⎡ 1⎤
0 b1 b1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ = a1 Λ ⎣ . ⎦ + · · · + aN Λ ⎣ . ⎦ , (9.22)
0 bN1 bNN
which implies that bij = 0 for all i, j since Λ is nonsingular by Assumption 9.4.
Similarly, matching second order terms of ai , and substituting bij = 0 to simplify the
resulting expressions, yields
⎛ ⎡ 3 ⎤⎞
∂ p1 ∗
(u )
⎢ ∂ u1 ∂ u j
2
⎜ ⎥⎟
⎜ ⎢ .. ⎥⎟
⎜ ⎢ ⎥⎟
⎜ ⎢ . ⎥⎟
⎡ ⎤ ⎜ ⎢ ∂ 3 p j−1 ∗ ⎥⎟
⎡ ⎤ ⎜ ⎡ ⎤ ⎢ ⎥⎟
0 c1jk ⎜ c1j j ⎢ ∂ u j−1 ∂ u2j (u )⎥⎟
⎢ .. ⎥ N N ⎢ . ⎥ N 2 ⎜ ⎢ . ⎥ 1 ⎢ 1 ∂3pj ∗ ⎥
⎜ ⎢ ⎟
⎥⎟
⎣ . ⎦ = ∑ ∑ a j ak Λ ⎢ . ⎥ + ∑ a ⎜Λ ⎣ . ⎦ + ⎢
⎣ . ⎦ j=1 j ⎜ . (u ) ⎥⎟. (9.23)
4 ⎢ 2 ∂uj ⎥⎟
3
j=1 k> j
⎜ ⎢ ⎥⎟
⎢ ∂ 2 p j+1 (u∗ )⎥⎟
N N 3
0 c jk ⎜ cjj
⎜ ⎢ ∂ u j ∂ u j+1 ⎥⎟
⎜ ⎢ ⎥⎟
⎜ ⎢ .. ⎥⎟
⎜ ⎢ . ⎥⎟
⎝ ⎣ 3 ⎦⎠
∂ pN ∗
∂u ∂u2 (u )
j N
Thus, cijk = 0 for all i, j, k when j

= k, and cij j is given by (9.16). Therefore, the
equilibrium of the average system is
N
ũei = ∑ cij j a2j + O max ai .
i
3
(9.24)
j=1
By again utilizing a Taylor polynomial approximation, one can show that the
Jacobian Ψ ave = [ψi, j ]N×N of (9.17) at ũe has elements given by
T
1 ∂ pi ∗
ψi, j = ε Ki lim μi (τ )
(u + ũe + μ (τ )) dτ ,
T →∞ T 0 ∂ uj

2 ∂ pi
1 2
∗
= ε Ki ai (u ) + O ε max ai ,3
(9.25)
2 ∂ ui ∂ u j i
and is Hurwitz by Assumptions 9.3 and 9.4 for sufficiently small ai , which implies
that the equilibrium (9.24) of the average system (9.17) is exponentially stable, i.e.,
there exist constants M, m > 0 such that
|ũave (τ ) − ũe | ≤ M e−mτ |ũave (0) − ũe| . (9.26)
From the general averaging theory [12, 21], we have
|ũ(τ ) − ũe | ≤ M e−mτ |ũ(0) − ũe | + O(ε ), (9.27)
and defining Δ (τ ) as in Theorem 9.1 completes the proof.

From Theorem 9.1, we see that u of reduced system (9.13) converges to a region
that is biased away from the Nash equilibrium u∗ . This bias is in proportion to the
perturbation magnitudes ai and the third derivatives of the payoff functions, which
are captured by the coefficients cij j . Specifically, ûi of the reduced system converges
to u∗i + ∑Nj=1 cij j a2j + O(ε + maxi a3i ) as t → ∞.
For a two-player game, Theorem 9.1 holds with the obvious change to omit any
reference to ωk and with the less obvious inclusion of the requirement ωi = 3ω j . The
requirement ωi = 3ω j is not explicitly stated in Theorem 9.1 since the combination
of ωi = ω j and ωi = 2ω j + ωk for all distinct i, j, k implies that ωi
= 3ω j . If the
payoff functions were quadratic, rather than non-quadratic, the requirements for the
perturbation frequencies would be simply, ωi = ω j , ωi
= ω j + ωk for the N-player
game, and ωi = ω j , ωi = 2ω j for the two-player game.
9.5 Singular Perturbation Analysis
We analyze the full system (9.10)–(9.11) in the time scale τ = ω t using singular
perturbation theory. First, we note that by [12, Theorem 14.4] and Theorem 9.1 there
exists an exponentially stable almost periodic solution ũa = [ũa1 , . . . , ũaN ] such that
dũai
= ε Ki μi (τ )pi (u∗ + ũa + μ (τ )). (9.28)
dτ
Moreover, ũa is unique within a neighborhood of the average solution ũave [16].
We define zi = ũi − ũai and obtain
dzi
= ε Ki μi (τ ) [hi (x) − pi (u∗ + ũa + μ (τ ))] , (9.29)
dτ
dx
ω = f (x, u∗ + z + ũa + μ (τ )), (9.30)
dτ
which from Assumption 9.1, the quasi-steady state is
x = l(u∗ + z + ũa + μ (τ )). (9.31)
Thus, the reduced model is given by
dzi
= ε Ki μi (τ ) [pi (u∗ + z + ũa + μ (τ )) − pi(u∗ + ũa + μ (τ ))] , (9.32)
dτ
which has an equilibrium at z = 0 that is exponentially stable for sufficiently small
ai as shown in Sect. 9.4.
To formulate the boundary layer model, let y = x − l(u∗ + z + ũa + μ (τ )), and
then in the time scale t = τ /ω ,
dy
= f (y + l(u∗ + z + ũa + μ (τ )), u∗ + z + ũa + μ (τ )),
dt
= f (y + l(u), u), (9.33)
where u = u∗ + ũ + μ (τ ) should be viewed as a parameter independent of the time

variable t. Since f (l(u), u) = 0, y = 0 is an equilibrium of (9.33) and is exponentially
stable by Assumption 9.2.
With the exponential stability of the origin established for both the reduced
model (9.32) and the boundary layer model (9.33), we apply Tikhonov’s Theorem
on the Infinite Interval [21, Theorem 11.2], where ω is the singular perturbation
parameter, to conclude that the solution z(τ ) of (9.29) is O(ω )-close to the
solution of the reduced model (9.32). Consequently, ũ(τ ) converges exponen-
tially to the solution ũa (t), which is O(ε )-close to the equilibrium ũe . Hence, as
τ → ∞, ũ(τ ) = [ũ1 (τ ), . . ., ũN (τ )] converges to an O(ω + ε )-neighborhood of

∑Nj=1 c1j j a2j , . . . , ∑Nj=1 cNjj a2j + O(maxi a3i ). Since u(τ )− u∗ = ũ(τ )+ μ (τ ) = ũ(τ )+
O(maxi ai ), u(τ ) converges to an O(ω + ε + maxi ai )-neighborhood of u∗ .
Also from Tikhonov’s Theorem on the Infinite Interval, the solution x(τ ) of
(9.10) satisfies
x(τ ) − l(u∗ + ũ(τ ) + μ (τ )) − y(t) = O(ω ), (9.34)
where y(t) is the solution to the boundary layer model (9.33). We can write
x(τ ) − l(u∗ ) = O(ω ) + l(u∗ + ũ(τ ) + μ (τ )) − l(u∗) + y(t). (9.35)
From the convergence properties of ũ(τ ) and because y(t) is exponentially decaying,
x(τ ) − l(u∗ ) exponentially converges to an O(ω + ε + maxi ai )-neighborhood of
the origin. Thus, Ji = hi (x) exponentially converges to an O(ω + ε + maxi ai )-
neighborhood of the payoff value (hi ◦ l)(u∗ ).
We summarize with the following theorem:
Theorem 9.2. Consider the system (9.1)–(9.2), (9.7)–(9.8) for an N-player game
under Assumptions 9.1–9.4 and where ωi = ω j , ωi = ω j + ωk , 2 ωi
= ω j + ωk , and
= 2ω j + ωk for all distinct i, j, k ∈ {1, . . . , N}. There exists ω ∗ > 0 and for any
ωi
ω ∈ (0, ω ∗ ) there exist ε ∗ , a∗ > 0 such that for the given ω and any ε ∈ (0, ε ∗ ),
maxi ai ∈ (0, a∗ ), the solution (x(t), u1 (t), . . . , uN (t)) converges exponentially to
an O(ω + ε + maxi ai )-neighborhood of the point (l(u∗ ), u∗1 , . . . , u∗N ), provided the
initial conditions are sufficiently close to this point.
Due to the Nash seeking strategy’s continual perturbation of the players’ actions,
we achieve exponential convergence to a neighborhood of u∗ , rather than u∗ itself.
The size of this neighborhood depends directly on the selected Nash seeking
parameters, as seen by Theorem 9.2. Thus, smaller parameters lead to a smaller
convergence neighborhood, but they also lead to slower convergence rates. (The
reader is referred to [41] for detailed analysis of this design trade-off for extremum
seeking controllers.) If another algorithm were used in parallel to detect convergence
of a player’s actions on the average, a player could either decrease the size of its
perturbation, making the convergence neighborhood smaller, or choose a constant
action based on its convergence detection. However, with a constant action, a player
will not be able to adapt to any future changes in the game.
By achieving exponential convergence, the players are able to achieve con-
vergence in the presence of a broader class of perturbations to the game than if
convergence were merely asymptotic (see [21, Chap. 9]).
9.6 Numerical Example
For an example game with players that employ the extremum seeking strategy
(9.7)–(9.8), we consider the system,
ẋ1 = −4x1 + x1 x2 + u1 , (9.36)

ẋ2 = −4x2 + u2 , (9.37)
J1 = −16x21 + 8x21x2 − x21 x22 − 4x1 x22 + 15x1x2 + 4x1, (9.38)
J2 = −64x32 + 48x1x2 − 12x1x22 , (9.39)
whose equilibrium state is given by
4u1 1
x̄1 = , x̄2 = u2 . (9.40)
16 − u2 4
The Jacobian at the equilibrium (x̄1 , x̄2 ) is

⎡ ⎤
1 4u1
− (16 − u2)
⎣ 4 16 − u2 ⎦ , (9.41)
0 −4
which is Hurwitz for u2 < 16. Thus, (x̄1 , x̄2 ) is locally exponentially stable, but
not for all (u1 , u2 ) ∈ R2 , violating Assumption 9.2. However, as noted earlier, this
restrictive requirement of local exponential stability for all u ∈ RN was done merely
for notational convenience; we actually only require this assumption to hold for the
players’ action sets. In this example, we restrict the players’ actions to the set
U = {(u1 , u2 ) ∈ R2 | u1 , u2 ≥ 0, u2 < 16}. (9.42)
At x = x̄, the payoffs are
J1 = −u21 + u1 u2 + u1, (9.43)

J2 = −u32 + 3u1u2 , (9.44)
which, on U, yield a Nash equilibrium: (u∗1 , u∗2 ) = (1, 1). At u∗ , this game satisfies
Assumptions 9.3 and 9.4, implying the stability of the reduced model average
system, which can be found explicitly according to (9.17) to be

dũave 1 ave
1
= ε K1 a1 −ũ1 + ũ2
2 ave
, (9.45)
dτ 2

dũave 2 3 ave ∗ ave 3 ave 2 3 2
2
= ε K2 a2 ũ − 3u2ũ2 − (ũ2 ) − a2 , (9.46)
dτ 2 1 2 8
with equilibria,

1 1
ũe1 = (1 − 4u∗2) ± (1 − 4u∗2)2 − 4a22, (9.47)
8 8
ũe2 = 2ũe1 , (9.48)
Fig. 9.2 Time history of the 1.6

two-player game initialized at
(u1 , u2 ) = (0.25, 1.5) 1.4
û 2
1.2
action û i
1
0.8 û 1
1
0.6
0.98
0.4
0.96
900 950 1000
0.2
0 200 400 600 800 1000
time (sec)
compared to the postulated form (9.19),

2
ũe,p
1 = a2 + O(max a3i ), (9.49)
1 − 4u∗2 2 i
4
ũe,p
2 = a2 + O(max a3i ). (9.50)
1 − 4u∗2 2 i
For sufficiently small a2 , ũe ≈ (0, 0), (−3/4, −3/2), whereas the postulated form
ũe,p ≈ (0, 0) only. The equilibrium at (−3/4, −3/2) corresponds to the point
(1/4, −1/2), which lies outside of U and is an intersection of the extremals
∂ J1 /∂ u1 = 0, ∂ J2 /∂ u2 = 0.
The Jacobian Ψ ave of the average system is
⎡ ⎤
1
⎢−κ1 κ1 ⎥
Ψ ave = ⎣ 3 2 ⎦, (9.51)
∗
κ2 −3κ2 (ũ2 + u2)
e
2
where κ1 = ε K1 a21 and κ2 = ε K2 a22 , and its characteristic equation is given by,

1
λ 2 + (κ1 + 3κ2 (ũe2 + u∗2)) λ + 3κ1κ2 ũe2 + u∗2 − = 0.
4
α1
α2
Thus, Ψ ave is Hurwitz if and only if α1 and α2 are positive. For sufficiently small a2
(so that ũe ≈ (0, 0)), α1 , α2 > 0, which implies that u∗ is a stable Nash equilibrium.
For the simulations, we select k1 = 1.5, k2 = 2, a1 = 0.09, a2 = 0.03, ω1 = 0.5,
and ω2 = 1.3, where the parameters are chosen to be small, in particular the
perturbation frequencies ωi , since the perturbation must occur at a time scale
that is slower than fast time scale of the nonlinear system. Figures 9.2 and 9.3
Fig. 9.3 Time history of the

1
two-player game initialized at
(u1 , u2 ) = (0.1, 0.05)
0.8
û 1 û 2
action û i
0.6
1
0.4 0.98
0.96
0.2
2900 2950 3000
0
0 500 1000 1500 2000 2500 3000
time (sec)
depict the evolution of the players’ actions û1 and û2 initialized at (u1 (0), u2 (0)) =
(û1 (0), û2 (0)) = (0.25, 1.5) and (0.1, 0.05). The state (x1 , x2 ) is initialized at the
origin in both cases. We show ûi instead of ui to better illustrate the convergence of
the players’ actions to a neighborhood about the Nash strategies since ui contains
the additive signal μi (t).
The slow initial convergence in Fig. 9.3 can best be explained by examining
the phase portrait of the average of the reduced model û-system, which can be
shown to be

2 1 1 ave
1 = ε K1 a1 − û1 + û2 ,
˙ûave ave
(9.52)
2 2

2 3 ave 3 ave 2 3 2
2 = ε K2 a2 û − (û2 ) − a2 ,
˙ûave (9.53)
2 1 2 8
and has equilibria given by

5 1
ûe1 = ± 9 − 4a22,
8 8
ûe2 = 2ûe1 − 1. (9.54)
For a2 = 0.03, (ûe1 , ûe2 ) = (0.9999, 0.9998) and (0.2501, −0.4998). Figure 9.4 is the
phase portrait of this system with the stable Nash equilibrium represented by a green
circle, and the point (1/4, −1/2) a red square, which is an unstable equilibrium in
the phase portrait. The boundary of U is denoted by dashed red lines. The initial
condition for Fig. 9.3 lies near the unstable point, so the trajectory travels almost
Fig. 9.4 Phase portrait for 3

the û-reduced model average ∂U
system (9.52)–(9.53). The
stable Nash equilibrium 2
(green circle), the point
(1/4, −1/2) (red square), and
the boundary of U (red 1
dashed lines) are denoted
û 2ave
0
2
2 1 0 1 2 3
û ave
1
entirely along the eigenvector that points towards the stable equilibrium. We also
see that the trajectories remain in U for initial conditions suitably close to the Nash
equilibrium.
9.7 Conclusions
We have introduced a non-model-based approach for convergence to the Nash

equilibria of noncooperative games with N players, where the game has a dynamic
mapping from the players’ actions to their payoff values. When employing this
strategy each player measures only the value of its own payoff. The convergence
is biased in proportion to both the payoff functions’ higher derivatives and the per-
turbation signals’ amplitudes for non-quadratic payoff functions, which give rise to
the possibility of multiple, isolated Nash equilibria.
Even though we have considered only the case where the action variables of the
players are scalars, the results equally apply to the vector case, namely ui ∈ Rni ,
by simply considering each different component of a player’s action variable to be
controlled by a different (virtual) player. In this case, the payoff functions of all
virtual players corresponding to player i will be the same.
Acknowledgements This research was made with Government support under and awarded by
DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate
(NDSEG) Fellowship, 32 CFR 168a, and by grants from National Science Foundation, DOE, and
AFOSR.
Appendix
The following integrals are computed to obtain (9.21), where we have assumed the
frequencies satisfy ωi = ω j , 2 ωi = ω j , 3 ωi = ω j , ωi
= ω j + ωk , ωi
= 2 ω j + ωk ,
2 ωi
= ω j + ωk , for distinct i, j, k ∈ {1, . . . , N} and defined γi = ωi / mini {ωi }:
T T
1 ai
lim μi (τ ) dτ = lim sin(γi τ + ϕi ) dτ
T →∞ T 0 T →∞ T 0
= 0, (9.55)

1 T a2i T
lim μi2 (τ ) dτ = lim [1 − cos(2γi τ + 2ϕi )] dτ ,
T →∞ T 0 T →∞ 2T 0
a2i
= , (9.56)
2
T T
1 a3i
lim μi3 (τ ) dτ = lim [3 sin(γi τ + ϕi ) − sin(3γi τ + 3ϕi )] dτ ,
T →∞ T 0 T →∞ 4T 0
= 0, (9.57)
T
1 T a4i
lim μi4 (τ ) dτ = lim [3 − 4 cos(2γi τ + 2ϕi )
T →∞ T 0 T →∞ 8T 0
+ cos(4γi τ + 4ϕi )] dτ ,
3a4i
= , (9.58)
8
T T
1 ai a j
lim μi (τ )μ j (τ ) dτ = [cos((γi − γ j )τ + ϕi − ϕ j )
T →∞ T 0 2T 0
− cos((γi + γ j )τ + ϕi + ϕ j )] dτ ,
= 0, (9.59)
T
1 T a2i a j
lim μi2 (τ )μ j (τ ) dτ = [sin(γ j τ + ϕ j )
T →∞ T 0 2T 0
− cos(2γi τ + 2ϕi ) sin(γ j τ + ϕ j )] dτ ,
T
a2i a j
= [2 sin(γ j τ + ϕ j )
4T 0
− sin((2γi + γ j )τ + 2ϕi + ϕ j )
+ sin((2γi − γ j )τ + 2ϕi − ϕ j )] dτ ,
= 0, (9.60)
T T
1 a3i a j
lim μi3 (τ )μ j (τ ) dτ = [3 sin(γi τ + ϕ j ) sin(γ j τ + ϕ j )
T →∞ T 0 4T 0
− sin(3γi τ + 3ϕi ) sin(γ j τ + ϕ j )] dτ ,
T
a3i a j
= [3 cos((γi − γ j )τ + ϕi − ϕ j )
8T 0
−3 cos((γi + γ j )τ + ϕi + ϕ j )
− cos((3γi − γ j )τ + 3ϕi − ϕ j )
+ cos((3γi + γ j )τ + 3ϕi + ϕ j )] dτ ,
= 0, (9.61)
a2i a2j T
1 T
lim μi2 (τ )μ 2j (τ ) dτ = lim [2 − 2 cos(2γi τ + 2ϕi )
T →∞ T 0 T →∞ 8T 0
−2 cos(2γ j τ + 2ϕ j )
+ cos(2(γi − γ j )τ + 2(ϕi − ϕ j ))
+ cos(2(γi + γ j )τ + 2(ϕi + ϕ j ))] dτ ,
a2i a2j
= , (9.62)
4
T T
1 ai a j ak
lim μi (τ )μ j (τ )μk (τ ) dτ , = lim [cos((γi − γ j )τ + ϕi − ϕ j )
T →∞ T 0 T →∞ 2T 0
− cos((γi + γ j )τ + ϕi + ϕ j )] sin(γk τ + ϕk ) dτ ,
ai a j ak
= lim
T →∞ 4T
T
× [sin((γi − γ j + ωk )τ + ϕi − ϕ j + ϕk )
0
− sin((γi − γ j − γk )τ + ϕi − ϕ j − ϕk )
− sin((γi + γ j + γk )τ + ϕi + ϕ j + ϕk )
+ sin((γi + γ j − γk )τ + ϕi + ϕ j − ϕk )] dτ ,
= 0, (9.63)
T ai a2j ak T
1
lim μi (τ )μ 2j (τ )μk (τ ) dτ , = lim sin(γi τ + ϕi )
T →∞ T 0 T →∞ 2T 0
×(1 − cos(2γ j τ + 2ϕ j )) sin(γk τ + ϕk ) dτ ,
ai a2j ak T
= lim [cos((γi − γk )τ + ϕi − ϕk )
T →∞ 4T 0
− cos((γi + γk )τ + ϕi + ϕk )]
×(1 − cos(2γ j τ + 2ϕ j )) dτ
ai a2j ak T
= lim [2 cos((γi − γk )τ + ϕi − ϕk )
T →∞ 8T 0
− cos((γi − 2γ j − γk )τ + ϕi − 2ϕ j − ϕk )
− cos((γi + 2γ j − γk )τ + ϕi + 2ϕ j − ϕk )
+ cos((γi − 2γ j + γk )τ + ϕi − 2ϕ j + ϕk )
+ cos((γi + 2γ j + γk )τ + ϕi + 2ϕ j + ϕk )
−2 cos((γi + γk )τ + ϕi + ϕk )] dτ ,
= 0. (9.64)
The conditions 3ωi

= ω j , ωi
= 2ω j + ωk , and 2ωi
= ω j + ωk , arise due to the payoff
functions being non-quadratic and are not required for quadratic payoff functions.
References
1. Altman, E., Başar, T., Srikant, R.: Nash equilibria for combined flow control and routing in
networks: asymptotic behavior for a large number of users. IEEE Trans. Autom. Control 47,
917–930 (2002)
2. Apostol, T.M.: Mathematical Analysis, 2nd ed. Addison-Wesley, Reading (1974)
3. Ariyur, K.B., Kristic, M.: Real-Time Optimization by Extremum-Seeking Control. Wiley-
Interscience, Hoboken (2003)
4. Başar, T.: Control and game-theoretic tools for communication networks (overview). Appl.
Comput. Math. 6, 104–125 (2007)
5. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. SIAM, Philadelphia
(1999)
6. Bauso, D., Giarré, L., Pesenti, R.: Consensus in noncooperative dynamic games: a multiretailer
inventory application. IEEE Trans. Autom. Control 53, 998–1003 (2008)
7. Becker, R., King, R., Petz, R., Nitsche, W.: Adaptive closed-loop separation control on a high-
lift configuration using extremum seeking, AIAA J. 45, 1382–1392 (2007)
8. Carnevale, D., Astolfi, A., Centioli, C., Podda, S., Vitale, V., Zaccarian, L.: A new extremum
seeking technique and its application to maximize RF heating on FTU. Fusing Eng. Design 84,
554–558 (2009)
9. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press,
New York (2006)
10. Cochran, J., Kanso, E., Kelly, S.D., Xiong, H., Krstic, M.: Source seeking for two nonholo-
nomic models of fish locomotion. IEEE Trans. Robot. 25, 1166–1176 (2009)
11. Cochran, J., Krstic, M.: Nonholonomic source seeking with tuning of angular velocity. IEEE
Trans. Autom. Control 54, 717–731 (2009)
12. Fink, A.M.: Almost Periodic Differential Equations, Lecture Notes in Mathematics, vol. 377.
Springer, New York (1974)
13. Foster, D.P., Young, H.P.: Regret testing: learning to play Nash equilibrium without knowing
you have an opponent. Theor. Econ. 1, 341–367 (2006)
14. Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. The MIT Press, Cambridge
(1998)
15. Guay, M., Perrier, M., Dochain, D.: Adaptive extremum seeking control of nonisothermal
continuous stirred reactors. Chem. Eng. Sci. 60, 3671–3681 (2005)
16. Hale, J.K.: Ordinary Differential Equations. Wiley-Interscience, New York (1969)
17. Hart, S., Mansour, Y.: How long to equilibrium? The communication complexity of uncoupled
equilibrium procedures. Games Econ. Behav. 69, 107–126 (2010)
18. Hart, S., Mas-Colell, A.: Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ.
Rev. 95, 1830–1836 (2003)
19. Hart, S., Mas-Colell, A.: Stochastic uncoupled dynamics and Nash equilibrium. Games Econ.
Behav. 57, 286–303 (2006)
20. Jafari, A., Greenwald, A., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and
Nash equilibrium. In: Proceedings of the 18th International Conference on Machine Learning
(2001)
21. Khalil, H.K.: Nonlinear Systems, 3rd ed. Prentice Hall, Upper Saddle River (2002)
22. Killingsworth, N.J., Aceves, S.M., Flowers, D.L., Espinosa-Loza, F., Krstic, M.: HCCI engine
combustion-timing control: optimizing gains and fuel consumption via extremum seeking.
IEEE Trans. Control Syst. Technol. 17, 1350–1361 (2009)
23. Krstic, M., Frihauf, P., Krieger, J., Başar, T.: Nash equilibrium seeking with finitely- and
infinitely-many players. In: Proceedings of the 8th IFAC Symposium on Nonlinear Control
Systems, Bologna (2010)
24. Li, S., Başar, T.: Distributed algorithms for the computation of noncooperative equilibria.
Automatica 23, 523–533 (1987)
25. Luenberger, D.G.: Complete stability of noncooperative games. J. Optim. Theory Appl. 25,
485–505 (1978)
26. Luo, L., Schuster, E.: Mixing enhancement in 2D magnetohydrodynamic channel flow by
extremum seeking boundary control. In: Proceedings of the American Control Conference,
St. Louis (2009)
27. MacKenzie, A.B., Wicker, S.B.: Game theory and the design of self-configuring, adaptive
wireless networks. IEEE Commun. Mag. 39, 126–131 (2001)
28. Marden, J.R., Arslan, G., Shamma, J.S.: Cooperative control and potential games. IEEE Trans.
Syst. Man Cybern. B Cybern. 39, 1393–1407 (2009)
29. Moase, W.H., Manzie, C., Brear, M.J.: Newton-like extremum-seeking part I: theory. In:
Proceedings of the IEEE Conference on Decision and Control, Shanghai (2009)
30. Naimzada, A.K., Sbragia, L.: Oligopoly games with nonlinear demand and cost functions: two
boundedly rational adjustment processes. Chaos Solitons Fract. 29, 707–722 (2006)
31. Nešić, D., Tan, Y., Moase, W.H., Manzie, C.: A unifying approach to extremum seeking:
adaptive schemes based on estimation of derivatives. In: Proceedings of the IEEE Conference
on Decision and Control, Atlanta (2010)
32. Peterson, K., Stefanopoulou, A.: Extremum seeking control for soft landing of an electrome-
chanical valve actuator. Automatica 29, 1063–1069 (2004)
33. Rao, S.S., Venkayya, V.B., Khot, N.S.: Game theory approach for the integrated design of
structures and controls. AIAA J. 26, 463–469 (1988)
34. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave N-person games.
Econometrica 33, 520–534 (1965)
35. Scutari, G., Palomar, D.P., Barbarossa, S.: The MIMO iterative waterfilling algorithm. IEEE
Trans. Signal Process. 57, 1917–1935 (2009)
36. Semsar-Kazerooni, E., Khorasani, K.: Multi-agent team cooperation: a game theory approach.
Automatica 45, 2205–2213 (2009)
37. Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and distributed
convergence to Nash equilibria. IEEE Trans. Autom. Control 53, 312–327 (2005)
38. Sharma, R., Gopal, M.: Synergizing reinforcement learning and game theory—a new direction
for control. Appl. Soft Comput. 10, 675–688 (2010)
39. Stanković, M.S., Johansson, K.H., Stipanović, D.M.: Distributed seeking of Nash equilibria
in mobile sensor networks. In: Proceedings of the IEEE Conference on Decision and Control,
Atlanta (2010)
40. Stanković, M.S., Stipanović, D.M.: Extremum seeking under stochastic noise and applications
to mobile sensors. Automatica 46, 1243–1251 (2010)
41. Tan, Y., Nešić, D., Mareels, I.: On non-local stability properties of extremum seeking control.
Automatica 42, 889–903 (2006)
42. Young, H.P.: Learning by trial and error. Games Econ. Behav. 65, 626–643 (2009)
43. Zhang, C., Arnold, D., Ghods, N., Siranosian, A., Krstic, M.: Source seeking with nonholo-
nomic unicycle without position measurement and with tuning of forward velocity. Syst.
Control Lett. 56, 245–252 (2007)
44. Zhu, M., Martı́nez, S.: Distributed coverage games for mobile visual sensors (I): Reaching
the set of Nash equilibria. In: Proceedings of the IEEE Conference on Decision and Control,
Shanghai, China (2009)
Chapter 10
A Uniform Tauberian Theorem in Optimal
Control
Miquel Oliu-Barton and Guillaume Vigeral
Abstract In an optimal control framework, we consider the value VT (x) of the

problem starting from state x with finite horizon T , as well as the value Wλ (x) of
the λ -discounted problem starting from x. We prove that uniform convergence (on
the set of states) of the values VT (·) as T tends to infinity is equivalent to uniform
convergence of the values Wλ (·) as λ tends to 0, and that the limits are identical.
An example is also provided to show that the result does not hold for pointwise
convergence. This work is an extension, using similar techniques, of a related result
by Lehrer and Sorin in a discrete-time framework.
Keywords Tauberian theorem • Optimal control • Asymptotic value • Game

theory
10.1 Introduction
Finite horizon problems of optimal control have been studied intensively since the
pioneer work of Stekhov, Pontryagin, Boltyanskii [27], Hestenes [18], Bellman
[9] and Isaacs [19, 20] during the cold war—see for instance [7, 22, 23] for major
references, or [14] for a short, clear introduction. A classical model considers the
following controlled dynamic over R+
M. Oliu-Barton ()
Institut Mathématique de Jussieu, UFR 929, Université Paris 6, Paris, France
G. Vigeral
CEREMADE, Université Paris-Dauphine, Paris, France
200 M. Oliu-Barton and G. Vigeral

y (s) = f (y(s), u(s))
(10.1)
y(0) = y0
where y is a function from R+ to Rn , y0 is a point in Rn , u is the control function

which belongs to U, the set of Lebesgue-measurable functions from R+ to a metric
space U and the function f : Rn × U → Rn satisfies the usual conditions, that is:
Lipschitz with respect to the state variable, continuous with respect to the control
variable and bounded by a linear function of the state variable, for any control u.
Together with the dynamic, an objective function g is given, interpreted as the
cost function which is to be minimized and assumed to be Borel-measurable from
Rn ×U to [0, 1]. For each finite horizon t ∈]0, +∞[, the average value of the optimal
control problem with horizon t is defined as
t
1
Vt (y0 ) = inf g(y(s, u, y0 ), u(s)) ds. (10.2)
u∈U t 0
It is quite natural to define, whenever the trajectories considered are infinite, for any
discount factor λ > 0, the λ -discounted value of the optimal control problem, as
+∞
Wλ (y0 ) = inf λ e−λ s g(y(s, u, y0 ), u(s)) ds. (10.3)
u∈U 0
In this framework the problem was initially to know whether, for a given finite
horizon T and a given starting point y0 , a minimizing control u existed, solution
of the optimal control problem (T, y0 ). Systems with large, but fixed horizons
were considered and, in particular, the class of “ergodic” systems (that is, those
in which any starting point in the state space Ω is controllable to any point in Ω )
has been thoroughly studied [2, 3, 5, 6, 8, 11, 25]. These systems are asymptotically
independent of the starting point as the horizon goes to infinite. When the horizon is
infinite, the literature on optimal control has mainly focussed on properties of given
trajectories as the time tends to infinity. This approach corresponds to the uniform
approach in a game theoretical framework and is often opposed to the asymptotic
approach (described below), which we have considered in what follows, and which
has received considerably less attention.
In a game-theoretical, discrete time framework, the same kind of problem was
considered since [29], but with several differences in the approach: (1) the starting
point may be chosen at random (a probability μ may be given on Ω , which
randomly determines the point from which the controller will start the play); (2)
the controllability-ergodicity condition is generally not assumed; (3) because of the
inherent recursive structure of the process played in discrete time, the problem is
generally considered for all initial states and time horizons.
For these reasons, what is called the ”asymptotic approach”—the behavior of
Vt (·) as the horizon t tends to infinity, or of Wλ (·) as the discount factor λ tends
to zero—has been more studied in this discrete-time setup. Moreover, when it is
10 A Uniform Tauberian Theorem in Optimal Control 201
considered in Optimal Control, in most cases [4, 10] an ergodic assumption is made
which not only ensures the convergence of Vt (y0 ) to some V , but also forces the limit
function V to be independent of the starting point y0 . The general asymptotic case,
in which no ergodicity condition is assumed, has been to our knowledge studied
for the first time recently. In [11, 28] the authors prove in different frameworks the
convergence of Vt (·) and Wλ (·) to some non-constant function V (y0 ).
Some important, closely related questions are the following : does the con-
vergence of Vt (·) imply the convergence of Wλ (·)? Or vice versa? If they both
converge, does the limit coincide? A partial answer to these questions goes back
to the beginning of the twentieth century, when Hardy and Littlewood proved (see
[17]) that for any sequence of bounded real numbers, the convergence of the Cesaro
means is equivalent to the convergence of their Abel means, and that the limits are
then the same :
Theorem 10.1 ([17]). For any bounded sequence of reals {an }n≥1 , define Vn =
+∞
n ∑i=1 ai and Wλ = λ ∑i=1 (1 − λ )
1 n i−1 a . Then,
i
lim inf Vn ≤ lim inf Wλ ≤ lim sup Wλ ≤ lim sup Vn .

n→+∞ λ →0 λ →0 n→+∞
Moreover, if the central inequality is an equality, then all inequalities are equalities.
Noticing that {an} can be viewed as a sequence of costs for some deterministic
(uncontrolled) dynamic in discrete-time, this results gives the equivalence between
the convergence

of Vt and the convergence

of Wλ , to the same limit. In 1971, setting
Vt = 1t 0t g(s) ds and Wλ = λ 0+∞ e−λ s g(s) ds, for a given Lebesgue-measurable,
bounded, real function g, Feller proved that the same result holds for continuous-
time uncontrolled dynamics (particular case of Theorem 2, p. 445 in [15]).
Theorem 10.2 ([15]).
lim inf Vn ≤ lim inf Wλ ≤ lim sup Wλ ≤ lim sup Vn .

n→+∞ λ →0 λ →0 n→+∞
Moreover, if the central inequality is an equality, then all inequalities are equalities.
In 1992, Lehrer and Sorin [24] considered a discrete-time controlled dynamic,
defined by a correspondence Γ : Ω ⇒ Ω , with nonempty values, and by g, a bounded
real cost function defined on Ω . A feasible play at z ∈ Ω is an infinite sequence
y = {yn }n≥1 such that y1 = z and yn+1 ∈ Γ (yn ). The average and discounted
value functions are defined respectively by Vn (z) = inf 1n ∑ni=1 g(yi ) and Wλ (y0 ) =
inf λ ∑+∞
i=1 (1 − λ )
i−1 g(y ), where the infima are taken over the feasible plays at z.
i
Theorem 10.3 ([24]).
lim Vn (z) = V (z) uniformly on Ω ⇐⇒ lim Wλ (z) = V (z) uniformly on Ω .

n→+∞ λ →0
This result establishes the equivalence between uniform convergence of Wλ (y0 )

when λ tends to 0 and uniform convergence of Vn (y0 ) as n tends to infinity, in
the general case where the limit may depend on the starting point y0 . The uniform
condition is necessary: in the same article, the authors provide an example where
only pointwise convergence holds and the limits differs.
In 1998, Arisawa (see [4]) considered a continuous-time controlled dynamic and
proved the equivalence between the uniform convergence of Wλ and the uniform
convergence of Vt in the specific case of limits independent of the starting point.
Theorem 10.4 ([4]). Let d ∈ R, then
lim Vt (z) = d, uniformly on Ω ⇐⇒ lim Wλ (z) = d, uniformly on Ω .

t→+∞ λ →0+
This does not settle the general case, in which the limit function may depend on the
starting point.1 For a continuous-time controlled dynamic in which Vt (y0 ) converges
to some function V (y0 ), dependent on the state variable y0 , as t goes to infinity, we
prove the following
Theorem 10.5. Vt (y0 ) converges to V (y0 ) uniformly on Ω , if and only if Wλ (y0 )
converges to V (y0 ) uniformly on Ω .
In fact, we will prove this result in a more general framework, as described
in Sect. 10.2. Some basic lemmas which occur to be important tools will also be
proven on that section. Section 10.3 will be devoted to the proof of our main result.
Section 10.4 will conclude by pointing out, via an example, the fact that uniform
convergence is a necessary requirement for the Theorem 10.5 to hold. A very simple
dynamic is described, in which the pointwise limits of Vt (·) and Wλ (·) exist but
differ. It should be noted that our proofs (as well as the counterexample in Sect. 10.4)
are adaptations in this continuous-time framework of ideas employed in a discrete-
time setting in [24]. In the appendix we also point out that an alternative proof of our
theorem is obtained using the main theorem in [24] as well as a discrete/continuous
equivalence argument.
For completeness, let us mention briefly this other approach, mentioned
above as the uniform approach, and which has also been deeply studied, see
for exemple [12, 13, 16]. In these models, the optimal average cost value is not
taken over a finite period of time [0,t], which is then studied for t growing
to infinite, as in [4, 15, 17, 24, 28] or in our framework. On the contrary, only
infinite trajectories

are considered, among which the value Vt is defined as
infu∈U supτ ≥t τ1 0τ g(y(s, u, y0 ), u(s)) ds, or some other closely related variation.
The asymptotic behavior, as t tends to infinity, of the function Vt has also been
studied in that framework. In [16], both λ -discounted and average evaluations of an
infinite trajectory are considered and their limits are compared. However, we stress
1 Lemma 6 and Theorem 8 in [4] deal with this general setting, but we believe them to be incorrect
since they are stated for pointwise convergence and, consequently, are contradicted by the example
in Sect. 10.4.
out that the asymptotic behavior of those quantities is in general2 not related to the
asymptotic behavior of Vt and Wλ .
Finally, let us point out that in the framework of zero-sum differential games, that
is when the dynamic is controlled by two players with opposite goals, a Tauberian
theorem is given in the ergodic case by Theorem 2.1 in [1]. However, to our
knowledge the general, non ergodic case is still an open problem.
10.2 Model
10.2.1 General Framework
We consider a deterministic dynamic programming problem in continuous time,

defined by a measurable set of states Ω , a subset T of Borel-measurable functions
from R+ to Ω , and a bounded Borel-measurable real-valued function g defined
on Ω . Without loss of generality we assume g : Ω → [0, 1]. For a given state x,
define Γ (x) := {X ∈ T , X(0) = x} the set of all feasible trajectories starting from
x. We assume Γ (x) to be non empty, for all x ∈ Ω . Furthermore, the correspondence
Γ is closed under concatenation: given a trajectory X ∈ Γ (x) with X(s) = y, and a
trajectory Y ∈ Γ (y), the concatenation of X and Y at time s is

X(t) if t ≤ s
X ◦s Y := (10.4)
Y (t − s) if t ≥ s
and we assume that X ◦s Y ∈ Γ (x).

We are interested in the asymptotic behavior of the average and the discounted
values. It is useful to denote the average payoff of a play (or trajectory) X ∈ Γ (x) by:
t
1
γt (X) := g(X(s)) ds (10.5)
t 0
+∞
νλ (X) := λ e−λ s g(X(s)) ds. (10.6)
0
This is defined for t, λ ∈]0, +∞[. Naturally, we define the values as:
Vt (x) = inf γt (X) (10.7)

X∈Γ (x)
Wλ (x) = inf νλ (X). (10.8)

X∈Γ (x)
2 The reader may verify that this is indeed not the case in the example of Sect. 10.4.
Our main contribution is Theorem 10.5:
(A) Wλ −→ V, uniformly on Ω ⇐⇒ (B) Vt −→ V, uniformly on Ω . (10.9)

λ →0 t→∞
Notice that our model is a natural adaptation to the continuous-time framework

of deterministic dynamic programming problems played in discrete time ; as it was
pointed out during the introduction, this theorem is an extension to the continuous-
time framework of the main result of [24], and our proof uses similar techniques.
This result can be applied to the model presented in Sect. 10.1: let Ω =
Rd × U and for any (y0 , u0 ) ∈ Ω , define Γ(y0 , u0 ) = {(y(·), u(·)) |u ∈ U, u(0) =
u0 and y is the solution of (10.1)}. Then Ω , Γ and g satisfy the assumptions of this

section. Defining Vt and Wλ as in (10.7) and (10.8) respectively, since the solution
of (10.1) does not depend on u(0) we get that
Vt (y0 , u0 ) = Vt (y0 )
λ (y0 , u0 ) = Wλ (y0 ).
W
Theorem 10.5 applied to V and W

thus implies that Vt converges uniformly to a
function V in Ω if and only if Wλ converges uniformly to V in Ω .
10.2.2 Preliminary Results
We follow the ideas of [24], and start by proving two simple lemmas yet important
tools, that will be used in the proof. The first establishes that the value increases
along the trajectories. Then, we prove a convexity result linking the finite horizon
average payoffs and the discounted evaluations on any given trajectory.
Lemma 10.1. Monotonicity (compare with Proposition 1 in [24]). For all X ∈ T ,
for all s ≥ 0, we have
lim inf Vt (X(0)) ≤ lim inf Vt (X(s)) (10.10)

t→∞ t→∞
lim inf Wλ (X(0)) ≤ lim inf Wλ (X(s)). (10.11)

λ →0 λ →0
Proof. Set y := X(s) and x := X(0). For ε > 0, take T ∈ R+ such that s+T s
< ε.
Let t > T and take an ε -optimal trajectory for Vt , i.e. Y ∈ Γ (y) such that γt (Y ) ≤
Vt (y) + ε . Define the concatenation of X and Y at time s as in (10.4), where X ◦s Y
is in ∈ Γ (x) by assumption. Hence
s t
Vt+s (x) ≤ γt+s (X ◦s Y ) = γs (X) + γt (Y )
t +s t +s
≤ ε + γt (Y )
≤ 2ε + Vt (y).
Since this is true for any t ≥ T the result follows.

Similarly, for the discounted case let λ0 > 0 be such that
s
λ0 e−λ0 r dr = 1 − eλ0s < ε .
0
Let λ ∈]0, λ0 ] and take Y ∈ Γ (y) an ε -optimal trajectory for Wλ (y). Then:
s +∞
Wλ (x) ≤ νλ (X ◦s Y ) = λ e−λ r g(X(r)) dr + λ e−λ r g(Y (r − s)) dr
0 s
≤ ε + e−λ sνλ (Y )
≤ 2ε + Wλ (y).
Again, this is true for any λ ∈]0, λ0 ], and the result follows.

Lemma 10.2. Convexity (compare with Eq. (10.1) in [24]). For any play X ∈ T ,
for any λ > 0: +∞
νλ (X) = γs (X)μλ (s) ds, (10.12)
0
where μλ (s) ds := λ 2 s e−λ s ds is a probability density on [0, +∞].
Proof. It is enough to notice that the following relation holds, by integration by
parts:
+∞ +∞ s
1
νλ (X) = λ e−λ s g(X(s)) ds = λ 2 se−λ s g(X(r)) dr ds,
0 0 s 0

and that 0+∞ λ 2 se−λ s ds = 1.

The probability measure μλ plays an important role in the rest of the paper.
Denoting
β
M(α , β ; λ ) := μλ (s) ds = e−λ α (1 + λ α ) − e−λ β (1 + λ β ),
α
we prove here two estimates that will be helpful in the next section.
Lemma 10.3. The two following results hold (compare with Lemma 3 in [24]):

ε
(i) ∀t > 0, ∃ε0 such that ∀ε ≤ ε0 , M (1 − ε )t,t;
t ≥ 2e .
1

(ii) ∀δ > 0, ∃ε0 such that ∀ε ≤ ε0 , ∀t > 0, M ε t, (1 − ε )t; t √1 ε ≥ 1 − δ .
Proof. Notice that in these particular cases, M does not depend on t:

ε
(i) M(t(1− ε ),t; 1t ) = (2− ε )e−1+ε −2e−1 = 1e (ε +o(ε )) ≥ 2e , for ε small enough.
√ −
√
ε
√ √ √ √
(ii) M(t ε ,t(1 − ε ); t √ε ) = (1 + ε )e

1
− (1 − 1/ ε + ε ) exp −1/ ε + ε .
This expression tends to 1 as ε → 0, hence the result.

10.3 Proof of Theorem 10.5
10.3.1 From Vt to Wλ
Assume (B) : Vt (·) converges to some V (·) as t goes to infinity, uniformly on Ω . Our
proof follows Proposition 4 and Lemmas 8 and 9 in [24].
Proposition 10.1. For all ε > 0, there exists λ0 > 0 such that Wλ (x) ≥ V (x) − ε for
every x ∈ Ω and for all λ ∈]0, λ0 ].
Proof. Let T be such that Vt −V ∞ ≤ ε /2 for every t ≥ T . Choose λ0 > 0 such that
+∞
ε
λ2 se−λ s ds = 1 − (1 + λ T)e−λ T ≥ 1 −
T 4
for every λ ∈]0, λ0 ]. Fix λ ∈]0, λ0 ] and take a play Y ∈ Γ (x) which is ε /4-optimal
for Wλ (x). Since γs (X) ≥ 0, the convexity formula (10.12) from Lemma 10.2 gives:
+∞
ε
Wλ (x) + ≥ νλ (Y ) ≥ 0 + λ 2 se−λ s γs (Y ) ds
4 T
+∞
≥ λ2 se−λ sVs (x) ds
T
ε ε
≥ 1− V (x) −
4 2
ε ε ε2
= V (x) − V (x) − +
4 2 8
3ε
≥ V (x) − .

4
Lemma 10.4. ∀ε > 0, ∃M such that for all t ≥ M, ∀x ∈ Ω , there is a play X ∈ Γ (x)
such that γs (X) ≤ V (x) + ε for all s ∈ [ε t, (1 − ε )t].
Proof. By (B) there exists M such that Vr −V ≤ ε 2 /3 for all r ≥ ε M. Given t ≥ M
and x ∈ Ω , let X ∈ Γ (x) be a play (from x) such that γt (X) ≤ Vt (x) + ε 2 /3. For any
s ≤ (1 − ε )t, we have that t − s ≥ ε t ≥ ε M so Proposition 10.1 (Monotonicity)
imply that
ε2 ε2
Vt−s (X(s)) ≥ V (X(s)) − ≥ V (x) − . (10.13)
3 3
Since V (x) + ε 2 /3 ≥ Vt (x), we also have:

2ε 2 ε2
t V (x) + ≥ t Vt (x) +
3 3
s t
≥ t γt (X) = g(X(r)) dr + g(X(r)) dr
0 s
≥ sγs (X) + (t − s)Vt−s (X(s))

ε2
≥ sγs (X) + (t − s) V (x) − , by (10.13).
3
Isolating γs (X) we get:
tε2
γs (X) ≤ V (x) +
s
≤ V (x) + ε , for s/ε ≥ t,
and we have proved the result for all s ∈ [ε t, (1 − ε )t].

Proposition 10.2. ∀δ > 0, ∃λ0 such that ∀x ∈ Ω , for all λ ∈]0, λ0 ], we have
Wλ (x) ≤ V (x) + δ .
Proof. By Lemma 10.3(ii), one can choose ε small enough such that

1 δ
M ε t, (1 − ε )t; √ ≥ 1− ,
t ε 2
for any t. In particular, we can take ε ≤ δ /2. Using Lemma 10.4 with δ /2, we get
that for t ≥ t0 (and thus for λ (t) := t √1 ε ≤ t √
1
) and for any x ∈ Ω , there exists a
0 ε
play X ∈ Γ (x) such that
(1−ε )t
δ
νλ (t) (X) ≤ + λ (t)2 sesλ (t) γs (X) ds
2 εt
δ δ
≤ + V (x) + .

2 2
Propositions 10.1 and 10.2 establish the first part of Theorem 10.5: (B) ⇒ (A).
10.3.2 From Wλ to Vt
Now assume (A) : Wλ (·) converges to some W (·) as λ goes to 0, uniformly on Ω .

Our proof follows Proposition 2 and Lemmas 6 and 7 in [24]. Start by a technical
Lemma:
Lemma 10.5. Let ε > 0. For all x ∈ Ω and t > 0, and for any trajectory Y ∈ Γ (x)
which is ε /2-optimal for the problem with horizon t, there is a time L ∈ [0,t(1 −
ε /2)] such that, for all T ∈]0,t − L]:
L+T
1
g(Y (s)) ds ≤ Vt (x) + ε .
T L
Proof. Fix Y ∈ Γ (x) some ε /2-optimal play for Vt (x). The function s → γs (Y ) is
continuous on ]0,t] and satisfies γt (Y ) ≤ Vt (x) + ε /2. The bound on g implies that
γr (Y ) ≤ Vt (x) + ε for all r ∈ [t(1 − ε /2),t].
Consider now the set {s ∈]0,t] | γs (Y ) > Vt (x) + ε }. If this set is empty, then take
L = 0 and observe that for any r ∈]0,t],
r
1
g(Y (s)) ds ≤ Vt (x) + ε .
r 0
Otherwise, let L be the superior bound of this set. Notice that L < t(1 − ε /2) and
that by continuity γL (Y ) = Vt (x) + ε . Now, for any T ∈ [0,t − L],
Vt (x) + ε ≥ γL+T (Y )
L+T
L T 1
= γL (Y ) + g(Y (s)) ds
L+T L+T T L
L+T
L T 1
= (Vt (x) + ε ) + g(Y (s)) ds
L+T L+T T L
and the result follows.

Proposition 10.3. ∀ε > 0, ∃T such that for all t ≥ T we have Vt (x) ≥ W (x) − ε , for
all x ∈ Ω .
Proof. Let λ be such that Wλ − W ≤ ε /8, and T such that
+∞
ε
λ 2
se−λ s ds < .
T ε /4 8
Proceed by contradiction and suppose that ε > 0 is such that for every T , there exists
t0 ≥ T and a state x0 ∈ Ω such that Vt0 (x0 ) < W (x0 ) − ε .
Using Lemma 10.5 with ε /2, we get a play Y ∈ Γ (x0 ) and a time L ∈ [0,t0
(1 − ε /4)] such that, ∀s ∈ [0,t0 − L] (and, in particular, ∀s ∈ [0,t0 ε /4]),
L+s
1 ε ε
g (Y (r)) dr ≤ Vt0 (x0 ) + < W (x0 ) − .
s L 2 2
Thus,
ε
W (Y (L)) − ≤ Wλ (Y (L))
8
+∞
≤λ e−λ s g(Y (L + s)) ds
0
t0 ε /4 L+s
1 ε
≤ λ2 se−λ s g (Y (r)) dr ds +
0 s L 8
ε ε
≤ W (x0 ) − +
2 8
3ε
= W (x0 ) − .
8
This gives us W (Y (L)) ≤ W (x0 ) − ε /4, contradicting Proposition 10.1

(Monotonicity).

Proposition 10.4. ∀ε > 0, ∃T such that for all t ≥ T we have Vt (x) ≤ W (x) + ε , for
all x ∈ Ω .
Proof. Otherwise, ∃ε > 0 such that ∀T, ∃t ≥ T and x ∈ Ω with Vt (x) >
W (x) + ε . For any X ∈ Γ (x) consider the (continuous in s) payoff function
γs (X) = 1s 0s g(X(r)) dr. Of course, γt (X) ≥ Vt (x) > W (x)+ ε . Furthermore, because
of the bound on g,
γr (X) ≥ W (x) + ε /2, ∀r ∈ [t (1 − ε /2),t] .
By Lemma 10.3, we can take ε small enough, so that for all t,

1 ε
M t(1 − ε /2),t; ≥
t 4e
ε
holds. We set δ := 4e . By Proposition 10.3, there is a K such that Vt ≥ W (x) − δ8ε ,
for all t ≥ K. Fix K and consider
M(0, K; 1/t) = 1 − e−K/t (1 + K/t)
as a function of t. Clearly, it tends to 0 as t tends to infinity, so let t be such that

this quantity is smaller than δ16ε . Also, let t be big enough so that W1/t − W < δ5ε ,
which is a consequence of assumption (A).
We now set λ̃ := 1/t and consider the λ̃ -payoff of some play X ∈ Γ (x). We split
[0, +∞] in three parts : K = [0, K], R = [t(1 − ε /2),t], and (K ∪ R)c . The three parts
are disjoint for t large enough, so by the Convexity formula (10.12), for any λ > 0,

νλ̃ (X) = γs (X)μλ̃ (ds) + γs (X)μλ̃ (ds) + γs (X)μλ̃ (ds)
K R (K∪R)c
where μλ (s) ds = λ 2 se−λ s ds. Recall that
γs (X)|K ≥ 0
δε
γs (X)|(K∪R)c ≥ W (x) −
8
ε
γs (X)|R ≥ W (x) + .
2
It is thus straightforward that

ε δε δε
νλ̃ (X) ≥ 0 + δ × W (x) + + 1−δ − × W (x) −
2 16 8

1 1 1 δ δε
≥ W (x) + δ ε − − − +
2 16 8 8 64
δε
≥ W (x) + .
8
This is true for any play, so its infimum also satisfies Wλ̃ (x) ≥ W (x) + δ4ε , which is
a contradiction, for we assumed that Wλ̃ (x) < W (x) + δ5ε .

Propositions 10.3 and 10.4 establish the second half of Theorem 10.5: (A) ⇒ (B).
10.4 A Counter Example for Pointwise Convergence
In this section we give an example of an optimal control problem in which both

Vt (·) and Wλ (·) converge pointwise on the state space, but to two different limits.
As implied by Theorem 10.5, the convergence is not uniform on the state space.
Lehrer and Sorin were the first to construct such an example [24], in the
discrete-time framework. We consider here one of its adaptations to continuous
time, which was studied as Example 5 in [28],3 where the notations are the same
that in Sect. 10.1:
• The state space is Ω = R2+ .
• The payoff function is given by g(x, y) = 0 if x ∈ [1, 2], 1 otherwise.
• The set of control is U = [0, 1].
• The dynamic is given by f (x, y, u) = (y, u) (thus Ω is forward invariant).
An interpretation is that the couple (x(t), y(t)) represents the position and the speed
of some mobile moving along an axis, and whose acceleration u(t) is controlled.
Observe that since U = [0, 1], the speed y(t) increases during any play. We claim
that for any (x0 , y0 ) ∈ R2+ , Vt (x0 , y0 ) (resp Wλ (x0 , y0 )) converges to V (x0 , y0 ) as t
goes to infinity (respectively converges to W (x0 , y0 ) as λ tends to 0), where:
⎧
⎪
⎪ if y0 > 0 or x0 > 2
⎨1
V (x0 , y0 ) = 0 if y0 = 0 and 1 ≤ x0 ≤ 2
⎪
⎪
⎩ 1−x0 if y0 = 0 and x0 < 1
2−x0
3 We thank Marc Quincampoix for pointing out this example to us, which is simpler that our original
one.
⎧
⎪
⎪ if y0 > 0 or x0 > 2
⎨1
W (x0 , y0 ) = 0 if y0 = 0 and 1 ≤ x0 ≤ 2
⎪
⎪
⎩1 − (1−x0 )1−x0 if y0 = 0 and x0 < 1.
(2−x0 )2−x0
Here we only prove that V (0, 0) = 12 and W (0, 0) = 34 ; the proof for y0 = 0 and
0 < x0 < 1 is similar and the other cases are easy.
First of all we prove that for any t or λ and any admissible trajectory (that is,
any function X(t) = (x(t), y(t)) compatible with a control u(t)) starting from (0, 0),
γt (X) ≥ 12 and νλ (X) ≥ 34 . This is clear if x(t) is identically 0, so consider this is not
the case. Since the speed y(t) is increasing, we can define t1 and t2 as the times at
which x(t1 ) = 1 and x(t2 ) = 2 respectively, and moreover we have t2 ≤ 2t1 . Then,
min(t,t1 ) t
1
γt (X) = ds + ds
t 0 min(t,t2 )
t t
1 2
= 1 + min 1, − min 1,
t t
t t
2 2
≥ 1 + min 1, − min 1,
2t t
1
≥
2
and
t1 +∞
−λ s
νλ (X) = λe ds + λ e−λ s ds
0 t2
= 1 − exp(−λ t1 ) + exp(−λ t2 )
≥ 1 − exp(−λ t1 ) + exp(−2λ t1)
≥ min{1 − a + a2}
a>0
3
= .
4
On the other hand, one can prove [28] that lim supVt (0, 0) ≤ 1/2 : in the problem
with horizon t, consider the control “u(s) = 1 until s = 2/t and then 0”. Similarly
one proves that lim supWλ (0, 0) ≤ 3/4: in the λ -discounted problem, consider the
control “u(s) = 1 until s = λ /ln 2 and then 0”.
So the functions Vt and Wλ converge pointwise on Ω , but their limits V and
W are different, since we have just shown V (0, 0) = W (0, 0). One can verify that
neither convergence is uniform on Ω by considering Vt (1, ε ) and Wλ (1, ε ) for small
positive ε .
Remark 10.1. One may object that this example is not very regular since the payoff
g is not continuous and the state space is not compact. However a related, smoother
example can easily be constructed:
1. The set of controls is still [0, 1].
2. The continuous cost g(x) is equal to 1 outside the segment [0.9,2.1], to 0 on [1,2],
and linear on the two remainings intervals. √ √
3. The compact state space is Ω = {(x, y)|0 ≤ y ≤ 2x ≤ 2 2}.
4. The dynamic is the same that in the original example for x ∈ [0, 3], and f (x, y, u) =
((4 − x)y, (4 − x)u) for 3 ≤ x ≤ 4. The inequality y(t)y (t) ≤ x (t) is thus satisfied
on any trajectory, which implies that Ω is forward invariant under this dynamic.
With these changes the values Vt (·) and Wλ (·) still both converge pointwise on Ω to
some V (·) and W
(·) respectively, and V (0, 0)
=W (0, 0).
10.5 Possible Extensions
• We considered the finite horizon problem and the discounted one, but it should
be possible to establish similar Tauberian theorems for other, more complex,
evaluations of the payoff. This was settled in the discrete time case in [26].
• It would be very fruitful to establish necessary or sufficient conditions for
uniform convergence to hold. In this direction we mention [28] in which
sufficient conditions for the stronger notion of Uniform Value (meaning that
there are controls that are nearly optimal no matter the horizon, provided it is
large enough) are given in a general setting.
• In the discrete case an example is constructed in [26] in which there is no uniform
value despite uniform convergence of the families Vt and Wλ . It would be of
interest to construct such an example in continuous time, in particular in the
framework of Sect. 10.1.
• It would be very interesting to study Tauberian theorems for dynamic systems
that are controlled by two conflicting controllers. In the framework of differential
games this has been done recently (Theorem 2.1 in [1]): an extension of
Theorem 10.4 has been accomplished for two player games in which the limit
of VT or Wλ is assumed to be independent of the starting point. The similar result
in the discrete time framework is a consequence of Theorems 1.1 and 3.5 in [21].
Existence of Tauberian theorems in the general setup of two-persons zero-sum
games with no ergodicity condition remains open in both the discrete and the
continuous settings.
Acknowledgements This article was done as part of the PhD of the first author. Both authors wish
to express their many thanks to Sylvain Sorin for his numerous comments and his great help. We
also thank Hélène Frankowska and Marc Quincampoix for helpful remarks on earlier drafts.
Appendix
We give here another proof4 of Theorem 10.5 by using the analoguous result in
discrete time [24] as well as an argument of equivalence between discrete and
continuous dynamic.
Consider a deterministic dynamic programming problem in continuous time as
defined in Sect. 10.2.1, with a state space Ω , a payoff g and a dynamic Γ . Recall
that, for any ω ∈ Ω , Γ (ω ) is the non empty set of feasible trajectories, starting
from ω . We construct an associated deterministic dynamic programming problem
in discrete time as follows.
Let Ω = Ω × [0, 1] be the new state space and let g be the new cost function,
given by g(ω , x) = x. We define a multivalued-function with nonempty values Γ :
Ω ⇒Ω by
1
(ω , x) ∈ Γ(ω , x ) ⇐⇒ ∃X ∈ Γ (ω ), with X(1) = ω and g(X(t)) dt = x.
0
Following [24], we define, for any initial state ω

= (ω , x)
1 n
vn (ω
) = inf ∑ g(ωi )
n i=1
+∞
wλ (ω
) = inf λ ∑ (1 − λ )i−1g(ωi )
i=1
where the infima are taken over the set of sequences {ω i }i∈N such that ω
0 = ω
and
ωi+1 ∈ Γ(ω i ) for every i ≥ 0.
Theorem 10.5 is then the consequence of the following three facts.
Firstly, the main theorem of Lehrer and Sorin in [24], which states that uniform
convergence (on Ω ) of vn to some v is equivalent to uniform convergence of wλ to
the same v.
Secondly, the concatenation hypothesis (10.4) on Γ implies that for any
(ω , x)∈Ω
vn (ω , x) = Vn (ω )

where Vt (ω ) = infX∈Γ (ω ) 1t 0n g(X(s)) ds, as defined in equation (10.7). Conse-
quently, because of the bound on g, for any t ∈ R+ we have
2
|Vt (ω ) − vt (ω , x)| ≤
t
where t stands for the integer part of t.
4 We thank Frédéric Bonnans for the idea of this proof.

Finally, again because of hypothesis (10.4), for any λ ∈]0, 1],

+∞
wλ (ω , x) = inf λ (1 − λ )t g(X(t)) dt.
X∈Γ (ω ) 0
Hence, by equation (10.8) and the bound on the cost function, for any λ ∈]0, 1],
+∞

|Wλ (ω ) − wλ (ω , x)| ≤ λ (1 − λ )t − e−λ t dt
0
which tends uniformly (with respect to x and ω ) to 0 as λ goes to 0 by virtue of the

following lemma.
Lemma 10.6. The function
+∞

λ → λ (1 − λ )t − e−λ t dt
0
converges to 0 as λ tends to 0.

Proof. Since λ 0+∞ (1 − λ )t = λ 0+∞ e−λ t dt = 1, for any λ > 0, the lemma is
equivalent to the convergence to 0 of
+∞
E(λ ) := λ (1 − λ )t − e−λ t dt
0 +
where [x]+ denotes the positive part of x. Now, from the relation 1 − λ ≤ e−λ ,
true for any λ , one can easily deduce that, for any λ > 0, t ≥ 0, the relation
(1 − λ )t eλ t ≤ eλ holds. Hence,
+∞
E(λ ) = λ e−λ t (1 − λ )t eλ t − 1 dt
0 +
+∞
≤λ e−λ t (eλ − 1) dt
0
= eλ − 1
which converges to 0 as λ tends to 0.

References
1. Alvarez, O., Bardi, M.: Ergodic Problems in Differential Games. Advances in Dynamic Game
Theory, pp. 131–152. Ann. Int’l. Soc. Dynam. Games, vol. 9, Birkhäuser Boston (2007)
2. Alvarez, O., Bardi, M.: Ergodicity, stabilization, and singular perturbations for Bellman-Isaacs
equations. Mem. Am. Math. Soc. 960(204), 1–90 (2010)
3. Arisawa, M.: Ergodic problem for the Hamilton-Jacobi-Bellman equation I. Ann. Inst. Henri
Poincare 14, 415–438 (1997)
4. Arisawa, M.: Ergodic problem for the Hamilton-Jacobi-Bellman equation II. Ann. Inst. Henri
Poincare 15, 1–24 (1998)
5. Arisawa, M., Lions, P.-L.: On ergodic stochastic control. Comm. Partial Diff. Eq. 23(11–12),
2187–2217 (1998)
6. Artstein, Z., Gaitsgory, V.: The value function of singularly perturbed control systems. Appl.
Math. Optim. 41(3), 425–445 (2000)
7. Bardi, M., Capuzzo-Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton-Jacobi-
Bellman Equations. Systems & Control: Foundations & Applications. Birkhäuser Boston, Inc.,
Boston, MA (1997)
8. Barles, G.: Some homogenization results for non-coercive Hamilton-Jacobi equations.
Calculus Variat. Partial Diff. Eq. 30(4), 449–466 (2007)
9. Bellman, R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci. U.S.A, 38,
716–719 (1952)
10. Bettiol, P.: On ergodic problem for Hamilton-Jacobi-Isaacs equations. ESAIM: COCV 11,
522–541 (2005)
11. Cardaliaguet, P.: Ergodicity of Hamilton-Jacobi equations with a non coercive non convex
Hamiltonian in R2 /Z2 . Ann. l’Inst. Henri Poincare (C) Non Linear Anal. 27, 837–856 (2010)
12. Carlson, D.A., Haurie, A.B., Leizarowitz, A.: Optimal Control on Infinite Time Horizon.
Springer, Berlin (1991)
13. Colonius, F., Kliemann, W.: Infinite time optimal control and periodicity. Appl. Math. Optim.
20, 113–130 (1989)
14. Evans, L.C.: An Introduction to Mathematical Optimal Control Theory. Unpublished Lecture
Notes, U.C. Berkeley (1983). Available at http:/math.berkeley.edu/∼evans/control.course.pdf
15. Feller, W.: An Introduction to Probability Theory and its Applications, vol. II, 2nd ed. Wiley,
New York (1971)
16. Grune, L.: On the Relation between Discounted and Average Optimal Value Functions. J. Diff.
Eq. 148, 65–99 (1998)
17. Hardy, G.H., Littlewood, J.E.: Tauberian theorems concerning power series and Dirichlet’s
series whose coefficients are positive. Proc. London Math. Soc. 13, 174–191 (1914)
18. Hestenes, M.: A General Problem in the Calculus of Variations with Applications to the
Paths of Least Time, vol. 100. RAND Corporation, Research Memorandum, Santa Monica,
CA (1950)
19. Isaacs, R.: Games of Pursuit. Paper P-257. RAND Corporation, Santa Monica (1951)
20. Isaacs, R.: Differential Games. A Mathematical Theory with Applications to Warfare and
Pursuit, Control and Optimization. Wiley, New York (1965)
21. Kohlberg, E., Neyman, A.: Asymptotic behavior of nonexpansive mappings in normed linear
spaces. Isr. J. Math. 38, 269–275 (1981)
22. Kirk, D.E.: Optimal Control Theory: An Introduction. Englewood Cliffs, N.J. Prentice Hall
(1970)
23. Lee, E.B., Markus, L.: Foundations of Optimal Control Theory. SIAM, Philadelphia (1967)
24. Lehrer, E., Sorin, S.: A uniform Tauberian theorem in dynamic programming. Math. Oper.
Res. 17, 303–307 (1992)
25. Lions, P.-L., Papanicolaou, G., Varadhan, S.R.S.: Homogenization of Hamilton-Jacobi
Equations. Unpublished (1986)
26. Monderer, M., Sorin, S.: Asymptotic Properties in Dynamic Programming. Int. J. Game Theory
22, 1–11 (1993)
27. Pontryiagin, L.S., Boltyanskii, V.G., Gamkrelidge: The Mathematical Theory of Optimal
Processes. Nauka, Moskow (1962) (Engl. Trans. Wiley)
28. Quincampoix, M., Renault, J.: On the existence of a limit value in some non expansive optimal
control problems. SIAM J. Control Optim. 49, 2118–2132 (2011)
29. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39, 1095–1100 (1953)
Chapter 11
E-Equilibria for Multicriteria Games
Lucia Pusillo and Stef Tijs
Abstract In this paper we propose an equilibrium definition for non-cooperative

multicriteria games based on improvement sets. Our new definition generalizes the
idea of exact and approximate equilibria. We obtain existence theorems for some
multicriteria games.
Keywords Multicriteria games • Pareto equilibria • Approximate solutions

• Improvement sets
11.1 Introduction
We know that Game Theory studies conflicts, behaviour and decisions of more
than one rational agent. Players choose their actions in order to achieve preferred
outcomes. Often players have to “optimize” not one but more than one objective and
these are often not comparable, so multicriteria games help us to make decisions in
multi-objective problems. The first observation to make in studying these topics is
that in general there is not an optimal solution from all points of view.
Let us consider, for example, an interactive decision between a seller and a buyer.
The latter wishes to buy a car and he chooses among many models. He has to take
into account the price, the power, the petrol consumption and the dimension of the
car: it must be large enough to be comfortable for his family but not too large to
L. Pusillo ()
Dima – Department of Mathematics, University of Genoa, Via Dodecaneso 35,
16146 Genoa, Italy
S. Tijs
CentER and Department of Econometrics and Operations Research, Tilburg University,
P.O. Box 90153, 5000 LE Tilburg, The Netherlands
218 L. Pusillo and S. Tijs
have parking problems. The seller wishes to maximize his gain and to sell a good
and useful car in order to satisfy the buyer so, in future, he will come back or he
will recommend this car-dealer to his friends and thus the seller will have more
buyers. We can consider this interaction as a game where the two players have to
“optimize” many criteria. Starting from Vector Optimization (see [3–5]) the theory
of multi-objective games helps us in these multi-objective situations.
Shapley, in 1959, gave a generalization of the classical definition of Nash
equilibrium (the most widely accepted solution for non cooperative scalar games), to
Pareto equilibrium (weak and strong), for non cooperative games with many criteria.
Intuitively a feasible point in Rn is a Pareto equilibrium if there is no other feasible
point which is larger in every coordinate (we will be more precise in the subsequent
pages).
Let us consider the following game in matrix form, where player I has one
criterion and player II has two criteria:
C D
A (2) (5, 0) (0) (0, 1)
B (0) (−1, 0) (0) (0, 1)
In pure strategies the weak Pareto equilibria of the game are:

(A,C), (A, D), (B, D).
To know more about Pareto equilibria see also [1, 19].
In this paper we consider a new concept of equilibria for games: it is based on
improvement sets and at the same time, presents the two aspects of approximate
and exact equilibria. The idea of approximate equilibrium is very important, in fact
some games have no equilibria but they have approximate ones. Let us consider the
following example:

(1, 2) (1, 2 + 12 ) (1, 2 + 34 ) (1, 2 + n−1
n ) ···
(0, 0) (0, 0) (0, 0) (0, 0) ···
In this game there are no Nash equilibria (player II does not reach the payoff 3)
but there is an infinite number of NE. Intuitively, approximate equilibria mean that
deviations improve by at most .
There are also games without approximate equilibria, for example the game:

(1, 2) (1, 3) (1, 4) (1, 5) · · ·
(0, 0) (0, 1) (0, 2) (0, 3) · · ·
This game has neither NE nor NE.
Many auction situations lead to games without NE but with approximate NE (see
[6]); in [18], the author proved some theorems about approximate solutions.
11 E-Equilibria for Multicriteria Games 219
In a previous paper [13] the authors studied the extensions of approximate NE to

approximate Pareto equilibria for a particular class of games: multicriteria potential
games. Furthermore they extended the theory introduced in [10] for exact potential
games with one objective, to include muti-objective games. For an approach to
potential games see also [16]. In recent years much attention has been dedicated
to multicriteria games to study applications for the real world. Neither exact nor
approximate equilibrium has a natural generalization from the scalar to the vector
case. However, some useful ideas can be found in [1, 15, 17] and references in it.
The concept of approximate solution is also meaningful for the notion of Tikhonov
well-posedness as made in [8, 9, 11].
The notion of equilibrium studied in this paper is inspired by a previous paper
[2] dedicated to optimal points in vector optimization. This definition is based on
improvement sets E ⊂ Rn with two properties:
(a) 0 ∈
/ E (exclusion property).
(b) E is upper comprehensive i.e. x ∈ E, y x implies y ∈ E (comprehensive
property).
We consider a multicriteria game G with n players and m objectives and we say
that a strategy profile is an E-equilibrium for the game if it is an optimal point
with respect to the improvement set E and the utility functions ui (., x −i ). We write
(x1 , . . . , xn ) ∈ OE (G) when it is clear from the context that G is a game.
The paper is structured as follows: in Sect. 11.2 we give some preliminary
results; in Sect. 11.3 we give the notion of E-equilibrium for potential games
and we prove some existence results; in Sect. 11.4 we generalize this notion to
multicriteria games; in Sect. 11.5 we present the conclusions and some ideas for
further researches.
11.2 Definitions and Preliminary Results
Given a, b ∈ Rn we consider the following inequalities on Rn :
a b ⇔ ai ≥ bi ∀ i = 1, . . . , n;
a ≥ b ⇔ a b and a
= b;
a > b ⇔ ai > bi ∀i = 1, . . . , n.
Obviously we mean:
a b ⇐⇒ −a −b; a ≤ b ⇐⇒ −a ≥ −b; a < b ⇐⇒ −a > −b.
We write a, b to mean the scalar product of two vectors a, b ∈ Rn .

We say that A ⊂ Rn is upper bounded (u.b. for short) if there exists b ∈ Rn such
that a b ∀a ∈ A.
By Rn+ we mean the points in Rn with all coordinates positive or null, by Rn++
we mean the points in Rn with all coordinates strictly positive.
Given a game G = (Xi )i∈N , (ui )i∈N , a strategy profile x ∈ X = ∏k∈N Xk , let (
xi , x−i )
denote the profile (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ).
We recall the definitions of weak and strong Pareto equilibrium:
Definition 11.1. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria game with n players,
Xi is the strategy space for player i ∈ N, X = ∏k∈N Xk , ui : X → Rmi is the utility
function for player i who have mi criteria “to optimize”. A strategy profile x ∈ X
is a
1. Weak Pareto equilibrium if
∃ y ∈ X : ui (yi , x−i ) > ui (x), ∀i ∈ N
2. Strong Pareto equilibrium if
∃ y ∈ X : ui (yi , x−i ) ≥ ui (x), ∀i ∈ N
Intuitively a feasible point in Rn is a weak Pareto equilibrium if there is no other

feasible point which is larger in each coordinate.
Similarly, a feasible point in Rn is strong Pareto equilibrium if there is no other
feasible point which is larger in at least one coordinate and not smaller in all other
coordinates. The set of weak and strong Pareto equilibria for the game G will be
denoted by wPE(G) and sPE(G) respectively.
Definition 11.2. Let E ⊂ Rn . We say that E is upper comprehensive if x ∈ E and if
y x then also y ∈ E.
Let E ⊂ Rn \ {0}, and E an upper comprehensive set. We shall call E an
improvement set.
In Fig. 11.1 we see an improvement set, in Fig. 11.2 we see a non-improvement set.
We can define the optimal points of A ⊂ Rn w.r.t. the improvement set E (or the
E-optimal points of A) and we shall write OE (A) in the following way:

a ∈ OE (A) if and only if a ∈ A and (a + E) A = 0.
/
Obviously O0/ (A) = A ∀A ⊂ Rn , thus in the following we will consider E = 0.

/
The definition is a generalization of the notions used in scalar optimization, and
also of Pareto optimal points (for further details, see [2]).

Definition 11.3. Let w ∈ Δ = p ∈ Rn+ : ∑ni=1 pi = 1 and A ⊂ Rn . The set A is
called w-upper-bounded if there is k ∈ R such that w, a ≤ k for all a ∈ A.
Intuitively A is contained in a hyper-half space with w as normal of the “boundary”
hyperplane.
It is obvious that the property of “upper boundedness” implies the “w-upper
boundedness” property for all w ∈ Δ
Fig. 11.1 Example of an

improvement set E in R2 . The
set E is indicated by dots
Fig. 11.2 Example of a non

improvement set (indicated
by dots)
Definition 11.4. Let w ∈ Δ and let E an improvement set. Then w is called a

separator for E iff there exists a positive number t such that w, e > t for each
e ∈ E.
“A upper bounded subset of Rn ” is not a sufficient condition to have OE (A)
= 0/
as the following example proves:
Example 11.1. Let
A = {(x, y) ∈ R2 : y < 1/x, x < 0} and E = E+ then OE (A) = 0.

/
This example illustrates also the following:

Proposition 11.1. Let A ⊂ Rn , A open set, E an improvement set such that

dist(E, 0) = 0, then OE (A) = 0.
/
With dist(E, 0) we mean the usual distance in Rn between a set and a point.
Now we recall the definition of approximate NE for scalar games:
Definition 11.5. Given a scalar game G = (Xi )i∈N , (ui )i∈N , and ≥ 0, we remind
that an approximate Nash equilibrium, for short NE, is a strategy profile x ∈ X such
that:
ui (x) ≥ ui (xi , x−i ) − , ∀xi ∈ Xi , ∀i ∈ N.
Intuitively for an NE, deviations do not pay more than .
11.3 From Vector Optimization to Multicriteria Potential

Games
Let us consider non-cooperative multicriteria potential games, that is games where

the payoff functions have their values in Rm and there is a potential function P which
relates them. Note that in a potential game the number of objectives is the same for
all the players, so in this section m1 = m2 = . . . = mn = m are assumed.
Mathematically a strategic multicriteria potential game is a tuple
G = (Xi )i∈N , (ui )i∈N , where N = 1, . . . , n is the set of players, Xi is the strategy
space for player i ∈ N, X is the cartesian product ∏k∈N Xk of the strategy spaces
(Xi )i∈N and ui : X → Rm is the utility function for player i. We call G a potential
game if there exists a map P : X → Rm such that for all i ∈ N, ai , bi ∈ Xi ,
a−i ∈ X−i := ∏ j∈N\{i} X j , it turns out:
ui (ai , a−i ) − ui (bi , a−i ) = P(ai , a−i ) − P(bi , a−i ).
Essentially a multicriteria potential game can be seen as a multicriteria-optimization

problem with P as objective function (P : X → Rm ). To illustrate better the concept
of multicriteria potential game, look at the following example:
Example 11.2. Consider the potential game with two players and three objectives
as in the table below.
(2, −2, 0) (1, 0, 0) (0, 0, 0) (0, 2, 1)

(1, 3, 1) (0, 3, 1, ) (0, 0, 0) (0, 0, 1)
A potential function is
1, 0, 0 0, 2, 1
P:
0, 5, 1 0, 2, 1
Given the game G = (Xi )i∈N , (ui )i∈N , we mean GP = (Xi )i∈N , P), that is the
game where the utility function is P for all players.
For a finite multicriteria potential game G the set of weak Pareto equilibria is not
empty, wPE(G) = 0/ (see [13]).
Let us give the definition of optimal points of a function with respect to an
improvement set.
Definition 11.6. Given a function P : ∏ Xi → Rm , X = Xi × X−i , we say that
a ∈ OE (P) that is a is an optimal point for the function P with respect to the
improvement set E if (a + E) ∩ P(X) = 0.
/
Definition 11.7. A strategy profile x ∈ X is an E-equilibrium for the multicriteria
game G, where E = (E1 , . . . , Ei , . . . , En ), Ei is the improvement set for player i, if for
each player i and for each xi ∈ Xi it turns out ui (xi , x −i ) ∈
/ ui (
x) + E i .
We write x ∈ OE (G).
Theorem 11.1. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria potential game and
suppose that the potential functions are upper bounded. Let us suppose that there is
a hyperplane which separates E from 0.
Then OE (P(X)) = 0/
Proof. The proof follows by considering the known results about separation
theorems (see for example [3])

Remark that the condition “P upper bounded” is not equivalent to ui upper
bounded as the following example shows.
Example 11.3. Let us consider G = (R, R, u1 , u2 ) where u1 (x, y) = min{x, y} − y =
u2 (y, x)). So u1 (x, y) ≤ 0 and u2 (x, y) ≤ 0, but the potential function P(x, y) =
min{x, y} is not upper bounded on R.
For multicriteria potential games we have the following existence theorem:
Theorem 11.2. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria potential game and
suppose that the potential function P is w-upper bounded. Furthermore w strongly
separates E from {0}. Then OE (G) = 0.
/
Proof. To make the notations easier, we write the proof only in the case of two
players and two objectives, but the proof is analogous for n players with more then
two objectives.
Let y = P(
x) with w, y ≥ sup{w, a; a ∈ P(X)} − t/2.
I want to prove that x = (x1 , x2 ) ∈ OE (G).
Let us suppose by contradiction that it is not true. Then

(i) ∃x1 ∈ X1 s.t. u1 (x1 , x2 ) ∈ u1 ( x1 , x2 ) + E1,
and/or
(ii) ∃x2 ∈ X2 s.t. u2 ( x1 , x2 ) ∈ u2 ( x1 , x2 ) + E2.
Being G a game with exact potential P we have equivalently to (i) and (ii)
respectively (iii) and (iv).
(iii) ∃x1 ∈ X1 s.t. P(x1 , x2 ) − P( x1, x2 )) ∈ E1
and/or
(iv) ∃x2 ∈ X2 s.t. P( x1 , x2 ) − P( x1, x2 ) ∈ E2 .
Furthermore w, (P(x1 , x2 ) − P( x1 , x2 )) ≥ t > 0
that is:
∑2k=1 wk (Pk (x1 , x2 ) − Pk ( x1 , x2 )) ≥ t that is
w1 P1 (x1 , x2 ) + w2 P2 (x1 , x2 ) ≥ w1 (P1 (x1 , x2 ) + w2 P2 (x1 , x2 ) + t
So
w, P(x) + t ≤ sup{w, a.a ∈ P(X)} ≤ w, P( x) + t/2 which is a contradic-
tion; so OE (G) = 0. /

As it is already known in the scalar case, we have the following theorem:
Theorem 11.3. Let G = (Xi )i∈N , (ui )i∈N be a multicriteria potential game. Then
OE (G) = OE (GP ).
Proof. Let x ∈ OE (G), that is:
For each xi ∈ Xi it turns out ui (xi , x
−i ) ∈
/ ui (
x) + Ei for each i ∈ N
If and only if for each xi ∈ Xi it turns out ui (xi , x
−i ) − ui (
x) ∈
/ Ei
If and only if for each xi ∈ Xi it turns out P(xi , x −i ) − P(x) ∈
/ Ei
If and only if x ∈ OE (GP ).

11.4 Multicriteria Games
In this section we study E-optimal points for multicriteria games. In general it is not
easy to find the E-optimal points of a multicriteria game G, but in some important
class of problems we can reduce the research of E-optimal points to search for the
classical equilibria as shown in the following example.
Example 11.4. (a) If E1 = E2 = (0, +∞), m1 = m2 = 1 (that is the game is for one
criterion), it turns out that the E-optimal points are the Nash equilibria for the
game G, for short OE (G) = NE.
(b) If E1 = (, +∞) = E2 , > 0, It turns out that the E-optimal points are the
approximate Nash equilibria, for short OE (G) = NE.
(c) If E1 = Rn+ ; E2 = Rn+ , then the E-optimal points are the strong Pareto equilibria
of the game G, for short OE (G) = sPE(G).
Fig. 11.3 The improvement

set E in R2 (indicated by
shading) is R2 \[0, ] × [0, ]
Fig. 11.4 The improvement

set E in R2 (indicated by
shading) is {x ∈ R2 : xi ≥
i > 0 (i = 1, 2)}
(d) If E1 = Rn++ ; E2 = Rn++ , then the E-optimal points are the weak Pareto
equilibria of the game G, for short OE (G) = wPE(G).
(e) In the paper [13], the improvement set considered for multicriteria potential
games is E = R2+ \ [0, ] × [0, ] (see Fig. 11.3).
(f) In the paper [11] the improvement set considered for multicriteria games is
E = {x ∈ R2 : xi i , i ∈ R++ , i = 1, 2} (see Fig. 11.4).
Proposition 11.2. A strategy profile x ∈ X is an E-equilibrium for the multicriteria

game G if, for each player i, xi is a E-optimal point for the vector function ui (., x
−i )
with respect to the improvement set E that is
∀ xi ∈ Xi ,C ∩ (ui (
x) + Ei ) = 0/ where C = {z = ui (xi , x
−i )} ⊂ R .
n
Proof. The proof follows from the definition of E-optimal point of a set.

We remind that a relation >E defined on a set E is a preorder if the transitivity
property is valid:
a >E b, b >E c ⇒ a >E c ∀a, b, c ∈ E.
A relation in E is called an order if the transitivity and asymmetric properties are

valid:
a >E b ⇒ it is not valid b >E a, ∀a, b ∈ E
We can define a relation on E in the following way:

a >E b if and only if (a ∈ b + E) and we shall say “a is E-better then B”. Intuitively
the elements in (b + E) are those substantially better than b. This relation is not an
order in general because it is not transitive. If the improvement set has the property
that E + E ⊂ E then the transitive property is valid, for example this is true if E =
Rm+ , or if E is a convex set.
From what we have said we can ascertain the following:
Proposition 11.3. Let E = (E1 , E2 , . . . , En ) the improvement sets for the n players
(Ei for the player i). The following conditions are equivalent:
(i) a strategy profile x ∈ X is an E-equilibrium for the multicriteria game G if for
each i ∈ N, xi ∈ Xi s.t. ui (xi , x −i ) ∈ ui (
x) + E i .
(ii) if Ei i = 1, . . . n are convex sets, for all xi ∈ Xi ui (xi , x
−i ) is not Ei -better then
ui (
x) for each i ∈ N.
(iii) {ui (xi , x
−i ) : xi ∈ Xi } ∩ ui (
x) + Ei = 0/ for each i ∈ N.
Now let us define the E-best reply map.
Definition 11.8. Ei -Bi ( x−i ) = {
xi ∈ Xi : xi ∈ Xi : ui (xi , x−i ) ∈ ui (
x ) + Ei }
This set is called the E-best reply of the player i with respect to the improvement set
Ei , fixed the strategy of the other players. The map E-best reply is defined E-BR =
(E1 -B1 ,. . . ,En -Bn ).
Proposition 11.4. A strategy profile x ∈ X is an E-equilibrium for the multicriteria
game G if and only if xi ∈ Ei -Bi (x
−i ) for all i ∈ N
Proof. The proof follows from the definitions and the properties given.

11.5 Conclusions and Open Problem
In this paper we consider a new concept of equilibrium and approximate equilibrium

for multicriteria games using improvement sets. We have also given, as a first step,
an existence theorem for potential games. Some issues about open problems arise
and would be interesting topics to develop further:
1. Existence theorems of E-equilibria for generic multicriteria games could be
given.
2. In [7], one criterion Bayesian games were studied. Some results could be
extended in the light of these new definitions to multicriteria bayesian ones.
3. A new concept of E- equilibria for multicriteria interval games could be
interesting. Look at [14] for multicriteria interval games.
4. Perhaps we can study bargaining games through “improvement sets”.
5. The approximate core for cooperative games (see [12]) could be defined through
improvement sets.
Some of these issues are works in progress.
Acknowledgements This research was partially supported by MUR (Ministero Università

Ricerca- Italia-) via a contract with S.Tijs. We thank Rodica Branzei for her helpful comments on a
previous version of the paper. The valuable comments of the referees are gratefully acknowledged.
References
1. Borm, P.E.M., Tijs, S.H., van den Aarssen, J.C.M.: Pareto equilibria in multiobjective games.
Methods Oper. Res. 60, 303–312 (1989)
2. Chicco, M., Mignanego, F., Pusillo, L., Tijs, S.: Vector optimization problems via improvement
sets. JOTA 150(3), 516–529 (2011)
3. Ehrgott, M.: Multicriteria Optimization, 2nd ed. Springer, Berlin (2005)
4. Gutierrez, C., Jimenez, B., Novo, V.: A unified approach and optimatlity conditions for
approximate solutions of vector optimization problems. Siam J. Optim. 17(3), 688–710 (2006)
5. Loridan, P.: Well posedness in vector optimization. In: Lucchetti, R., Revalski, J. (eds.)
Recent Developments in Well Posed-Variational Problems, pp. 171–192. Kluwer Academic,
Dordrecht, The Netherland (1995)
6. Klemperer, P.: Auctions: Theory and Practice. Princeton University Press, Princeton, USA
(2004)
7. Mallozzi, L., Pusillo, L., Tijs, S.: Approximate equilibria for Bayesian games. JMAA 342(2),
1098–1102 (2007)
8. Margiocco, M., Pusillo, L.: Stackelberg well posedness and hierarchical potential games. In:
Jorgensen, S., Quincampoix, M., Vincent, T. (eds.) Advances in Dynamic Games – Series:
Annals of the International Society of Dynamic Games, pp. 111–128. Birkhauser, Boston
(2007)
9. Margiocco, M., Pusillo, L.: Potential games and well-posedness. Optimization 57, 571–579
(2008)
10. Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14, 124–143 (1996)
11. Morgan, J.: Approximations and well-posedness in multicriteria games. Ann. Oper. Res. 137,
257–268 (2005)
12. Owen, G.: Game Theory, 3rd ed. Academic, New York (1995)
13. Patrone, F., Pusillo, L., Tijs, S.: Multicriteria games and potentials. TOP 15, 138–145 (2007)
14. Pieri, G., Pusillo, L.: Interval values for multicriteria cooperative games. Auco Czech Econ.
Rev. 4, 144–155 (2010)
15. Puerto Albandoz, J., Fernandez Garcia F.: Teoria de Juegos Multiobjetivo, Imagraf Impresores
S.A., Sevilla (2006)
16. Pusillo, L.: Interactive decisions and potential games. J. Global Optim. 40, 339–352 (2008)
17. Shapley, L.S.: Equilibrium points in games with vector payoffs. Naval Res. Logistic Quart. 6,
57–61 (1959)
18. Tijs S.H., equilibrium point theorems for two persons games. Methods Oper. Res. 26,
755–766 (1977)
19. Voorneveld, M.: Potential games and interactive decisions with multiple criteria. Dissertation
series n. 61. CentER of Economic Research- Tilburg University (1999)
Chapter 12
Mean Field Games with a Quadratic
Hamiltonian: A Constructive Scheme
Olivier Guéant
Abstract Mean field games models describing the limit case of a large class of
stochastic differential games, as the number of players goes to +∞, were introduced
by Lasry and Lions [C R Acad Sci Paris 343(9/10) (2006); Jpn. J. Math. 2(1)
(2007)]. We use a change of variables to transform the mean field games equations
into a system of simpler coupled partial differential equations in the case of a
quadratic Hamiltonian. This system is then used to exhibit a monotonic scheme
to build solutions of the mean field games equations.
Keywords Mean field games • Forward–backward equations • Monotonic

schemes
12.1 Introduction
Mean field games (MFG) equations were introduced by Lasry and Lions [4–6]
to describe the dynamic equilibrium of stochastic differential games involving a
continuum of players.
Formally, we consider a continuum of agents, each agent being described by a
position Xt ∈ Ω [Ω being typically (0, 1)d ] following a stochastic process dXt =
at dt + σ dWt . In this stochastic process, at is controlled by the agent and Wt is
a Brownian motion specific to the agent under investigation—this independence
hypothesis being central in the sequel. These agents will interact in a mean field
O. Guéant ()
UFR de Mathématiques, Laboratoire Jacques-Louis Lions, Université Paris-Diderot,
175 rue du Chevaleret, 75013 Paris, France
230 O. Guéant
fashion, and we denote by m(t, ·) the probability distribution function describing the
distribution of the agents at time t.1
Each agent optimizes the same objective function, though possibly starting from
a different position x0 :

T
sup E ( f (Xt , m(t, Xt )) − C(at )) dt + uT (XT ) X0 = x0 ,
(at )t 0
where f and uT are functions whose regularity will be described subsequently and
where C is a convex (cost) function.
To this problem we associate the so-called MFG equations. These equations
consist in a backward Hamilton–Jacobi–Bellman (HJB) equation coupled with a
forward Kolmogorov (K) transport equation:
∂t u + σ2 Δ u + H(∇u) = − f (x, m),

2
(HJB)
σ2
(K) ∂t m + ∇ · (mH (∇u)) = 2 Δ m,
with prescribed initial condition m(0, ·) = m0 (·) ≥ 0 and terminal condition u(T, ·) =
uT (·), where H is the Legendre transform of the cost function C.
2
In this paper, we focus on the particular case of the quadratic cost C(a) = a2 and
2
hence a quadratic Hamiltonian H(p) = p2 . In this special case, a change of variables
was introduced by Guéant et al. in [3] to write the MFG equations as two coupled

heat equations with similar source terms. If indeed we introduce φ = exp σu2 and

ψ = m exp − σu2 , then the system reduces to
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ ,
2 σ
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ ,
2 σ

with φ (T, ·) = exp uσT (·)
2
0 (·)
and ψ (0, ·) = φm(0,·) .
We use this system to exhibit a constructive scheme for solutions to the MFG
equations. This constructive scheme, proposed by Lasry and Lions in [7], starts
1
with ψ 0 = 0 and builds recursively two sequences (φ n+ 2 )n and (ψ n+1 )n using the
following equations:
1 In our case, this assumption consists only in assuming that the initial datum is a probability
distribution function m0 .
12 Mean Field Games with a Quadratic Hamiltonian: A Constructive Scheme 231
1 σ2 1 1 1 1
∂t φ n+ 2 + Δ φ n+ 2 = − 2 f (x, φ n+ 2 ψ n )φ n+ 2 ,
2 σ
σ2 1 1
∂t ψ n+1 − Δ ψ n+1 = 2 f (x, φ n+ 2 ψ n+1 )ψ n+1 ,
2 σ

with φ n+ 2 (T, ·) = exp uTσ(·) and ψ n+1 (0, ·) = n+m01(·) .
1
2
φ 2 (0,·)
1
Then, φ and ψ are obtained as the monotonic limit of the two sequences (φ n+ 2 )n
and (ψ n )n under the usual assumptions on f .
In Sect. 12.2, we recall the change of variables and derive the associated system
of coupled parabolic equations. Section 12.3 is devoted to the introduction of the
functional framework, and we prove the main monotonicity properties of the system.
Section 12.4 presents a constructive scheme and proves that we can have two
monotonic sequences converging toward φ and ψ . Section 12.5 then gives additional
properties on the constructive scheme regarding the absence of mass conservation.
12.2 From MFG Equations to a Forward–Backward System

of Heat Equations with Source Terms
We consider the MFG equations introduced in [4–6] in the case of a quadratic

Hamiltonian. These partial differential equations are considered on the domain
[0, T ] × Ω , where Ω stands for (0, 1)d , and consist in the following equations:
σ2 1
(HJB) ∂t u + Δ u + |∇u|2 = − f (x, m),
2 2
σ2
(K) ∂t m + ∇ · (m∇u) = Δ m,
2
with:
• Boundary conditions: ∂∂ nu = ∂∂ mn = 0 on (0, T ) × ∂ Ω ;
• Terminal condition: u(T, ·) = uT (·), a given payoff whose regularity is to be
specified;
• Initial condition: m(0, ·) = m0 (·) ≥ 0, a given positive function in L1 (Ω ),
typically a probability distribution function.
The change of variables introduced in [3] is recalled in the following proposition:
Proposition 12.1. Let us consider a smooth solution (φ , ψ ) of the following system
(S), with φ > 0:
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ (Eφ ),
2 σ
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ (Eψ ),
2 σ
232 O. Guéant
with:
∂φ ∂ψ
• Boundary conditions: = 0 on (0, T ) × ∂ Ω .
=

∂n
∂n
• Terminal condition: φ (T, ·) = exp uσT (·)
2 .
m0 (·)
• Initial condition: ψ (0, ·) = φ (0,·) .
Then (u, m) = (σ 2 ln(φ ), φ ψ ) defines a solution of MFG.

Proof. Let us start with (HJB):
∂t φ ∇φ Δφ |∇φ |2
∂t u = σ 2 , ∇u = σ 2 Δu = σ2 − σ2 2 .
φ φ φ φ
Hence

σ2 1 ∂t φ σ 2 Δ φ
∂t u + Δ u + |∇u|2 = σ 2 +
2 2 φ 2 φ

σ2 1
= − 2 f (x, φ ψ )φ
φ σ
= − f (x, m).
Now, for equation (K):
∂t m = ∂t φ ψ + φ ∂t ψ ∇ · (∇um) = σ 2 ∇ · (∇φ ψ ) = σ 2 [Δ φ ψ + ∇φ · ∇ψ ]
Δ m = Δ φ ψ + 2∇φ · ∇ψ + φ Δ ψ .
Hence
∂t m + ∇ · (∇um) = ∂t φ ψ + φ ∂t ψ + σ 2 [Δ φ ψ + ∇φ · ∇ψ ]

= ψ ∂t φ + σ 2 Δ φ + φ ∂t ψ + σ 2 ∇φ · ∇ψ
2
σ 1
=ψ Δ φ − 2 f (x, φ ψ )φ
2 σ
2
σ 1
+φ Δ ψ + 2 f (x, φ ψ )ψ + σ 2 ∇φ · ∇ψ
2 σ
σ2 σ2
= Δ φ ψ + σ 2 ∇φ · ∇ψ + φ Δ ψ
2 2
σ2
= Δ m.
2
This proves the result since the boundary conditions and the initial and terminal
conditions are coherent.

Now we will focus our attention on the study of the preceding system of
equations (S) and use it to design a constructive scheme for the couple (φ , ψ ) and
thus for the couple (u, m) under regularity assumptions.
12.3 Properties of (S)
To study system (S), we introduce several hypotheses on f : we suppose that it

is a decreasing function of its second variable, continuous in that variable, and
uniformly bounded. Moreover, to simplify the exposition,2 we suppose that f ≤ 0.
The monotonicity hypothesis is to be linked to the usual proof of uniqueness for the
MFG equations [6].
Now let us introduce the functional framework we are working in.
Let us denote by P ⊂ C([0, T ], L2 (Ω )) the natural set of parabolic equations:
g ∈ P ⇐⇒ g ∈ L2 (0, T, H 1 (Ω )) and ∂t g ∈ L2 (0, T, H −1 (Ω )),
and let us introduce P = {g ∈ P, g ≥ }.

Proposition 12.2. Suppose that uT ∈ L∞ (Ω ). ∀ψ ∈ P0 , there is a unique weak
solution φ to the following equation (Eφ ):
σ2 1
∂t φ + Δ φ = − 2 f (x, φ ψ )φ (Eφ ),
2 σ

with ∂∂ φn = 0 on (0, T ) × ∂ Ω and φ (T, ·) = exp uTσ(·)
2 .
Hence Φ : ψ ∈ P0 → φ ∈ P is well defined.
Moreover, ∀ψ ∈ P0 , φ = Φ (ψ ) ∈ P for = exp − σ12 (uT ∞ + f ∞ T ) .
Proof. Let us consider ψ ∈ P0 .

Existence of a weak solution φ :
Let us introduce Fψ : ϕ ∈ L2 (0, T, L2 (Ω ))
→ φ weak solution of the following linear
parabolic equation:
σ2 1
∂t φ + Δ φ = − 2 f (x, ϕψ )φ ,
2 σ

∂φ
with ∂n = 0 on (0, T ) × ∂ Ω and φ (T, ·) = exp uTσ(·)
2 .
By the classical theory of linear parabolic equations, φ is in P ⊂ L2 (0, T, L2 (Ω )).
Our goal is to use Schauder’s fixed-point theorem on Fψ .
2 Interms of the initial MFG problem, the optimal control ∇u and the subsequent distribution m are
not changed if we subtract f ∞ to f .
234 O. Guéant
12.3.1 Compactness
Common energy estimates [1] give that there exists a constant C that only depends
on uT ∞ , σ , and f ∞ such that ∀(ψ , ϕ ) ∈ P0 × L2 (0, T, L2 (Ω ))
Fψ (ϕ )L2 (0,T,H 1 (Ω )) + ∂t Fψ (ϕ )L2 (0,T,H −1 (Ω )) ≤ C.
Hence Fψ maps the closed ball BL2 (0,T,L2 (Ω )) (0,C) to a compact subset of
BL2 (0,T,L2 (Ω )) (0,C).
12.3.2 Continuity
Let us now prove that Fψ is a continuous function.

Let us consider a sequence (ϕn )n of L2 (0, T, L2 (Ω )) with ϕn →n→∞ ϕ in the
L (0, T, L2 (Ω )) sense. Let us write φn = Fψ (ϕn ). We know from the preceding
2
compactness result that we can extract from (φn )n a new sequence denoted (φn )n
that converges in the L2 (0, T, L2 (Ω )) sense toward a function φ . To prove that Fψ is
continuous, we then need to show that φ cannot be different from Fψ (ϕ ).
Now, because of the energy estimates, we know that φ is in P and that we can
extract another subsequence (still denoted (φn )n ) such that
• φn → φ in the L2 (0, T, L2 (Ω )) sense;
• ∇φn ∇φ weakly in L2 (0, T, L2 (Ω ));
• ∂t φn ∂t φ weakly in L2 (0, T, H −1 (Ω ));
and
• ϕn → ϕ almost everywhere.
By definition, we have that ∀w ∈ L2 (0, T, H 1 (Ω )):
T T
σ2
∂t φn (t, ·), w(t, ·)H −1 (Ω ),H 1 (Ω ) dt − ∇φn (t, x) · ∇w(t, x) dx dt
0 2 0 Ω
T
1
=− 2 f (x, ϕn (t, x)ψ (t, x))φn (t, x)w(t, x) dx dt.
σ 0 Ω
By weak convergence, the left-hand side of the preceding equality converges

toward
T T
σ2
∂t φ (t, ·), w(t, ·)H −1 (Ω ),H 1 (Ω ) dt − ∇φ (t, x) · ∇w(t, x) dx dt.
0 2 0 Ω
Since f is a bounded continuous function, and using the dominated convergence

theorem, the right-hand side converges toward
T
1
− f (x, ϕ (t, x)ψ (t, x))φ (t, x)w(t, x) dx dt.
σ2 0 Ω
Hence φ = Fψ (ϕ ).
12.3.3 Schauder’s Theorem
By Schauder’s theorem, we then know that there exists a fixed-point φ to Fψ and

hence a weak solution to the nonlinear parabolic equation (Eφ ).
Positiveness of φ :
Let us consider a solution φ as above. If I(t) = 12 Ω (φ (t, x)− )2 dx, then:

I (t) = − ∂t φ (t, x)φ (t, x)− dx
Ω

1
=− ∇φ (t, x) · ∇(φ (t, x)− ) −
f (x, φ (t, x)ψ (t, x))φ (t, x)φ (t, x)− dx
Ω σ2

1
=− −|∇φ (t, x)|2 1φ (t,x)≤0 + 2 f (x, φ (t, x)ψ (t, x))(φ (t, x)− )2 dx
Ω σ

1
= |∇φ (t, x)| 1φ (t,x)≤0 −
2
f (x, φ (t, x)ψ (t, x))(φ (t, x)− )2 dx
Ω Ω σ2
≥ 0.
Since I(T ) = 0 and I ≥ 0, we know that I = 0. Hence, φ is positive.

Uniqueness:
Let us consider two weak solutions φ1 and φ2 to equation (Eφ ).

Let us introduce J(t) = 12 Ω (φ2 (t, x) − φ1 (t, x))2 dx. We have

J (t) = (∂t φ2 (t, x) − ∂t φ1 (t, x))(φ2 (t, x) − φ1 (t, x)) dx
Ω

1
=− ( f (x, φ2 (t, x)ψ (t, x))φ2 (t, x) − f (x, φ1 (t, x)ψ (t, x))φ1 (t, x))
Ω σ2

×(φ2 (t, x) − φ1 (t, x)) dx + |∇φ2 (t, x) − ∇φ1 (t, x)|2 dx.
Ω
Because of our assumptions on f , the function ξ ∈ R+ → 1

σ2
f (x, ψξ )ξ is a
decreasing function.
236 O. Guéant
Hence, since φ1 and φ2 are positive, J (t) ≥ 0. Since J(T ) = 0 and J ≥ 0, we

know that J = 0. Hence, φ1 = φ2 .
Lower bound to φ :
We can get a lower bound to φ through a subsolution taken as the solution of the
following ordinary differential equation:

1 uT ∞
φ (t) = 2 f ∞ φ (t) φ (T ) = exp − .
σ σ2
Let us indeed consider K(t) = 2 Ω ((φ (t) − φ (t, x))+ ) dx.

2 We have

K (t) = (φ (t) − ∂t φ (t, x))(φ (t) − φ (t, x))+ dx
Ω

1
= f ∞ φ (t)(φ (t) − φ (t, x))+ + |∇φ (t, x)|2 1φ (t)−φ (t,x) ≥ 0
σ2
Ω
1
+ 2 f (x, φ (t, x)ψ (t, x))φ (t, x)(φ (t) − φ (t, x))+ dx
σ

1
≥ 2 f ∞ φ (t) + f (x, φ (t, x)ψ (t, x))φ (t, x) (φ (t) − φ (t, x))+ dx
σ Ω

1
≥ 2 ( f ∞ + f (x, φ (t, x)ψ (t, x))) φ (t, x)(φ (t) − φ (t, x))+ dx
σ Ω
≥ 0.
Since K(T ) = 0 and K ≥ 0, we know that K = 0. Hence, φ (t, x) ≥ φ (t) =

u
− T ∞
e σ2 exp − σ12 f ∞ (T − t) ≥ , and the result follows.

Now we turn to a monotonicity result regarding Φ .

Proposition 12.3.
∀ψ1 ≤ ψ2 ∈ P0 , Φ (ψ1 ) ≥ Φ (ψ2 ).
Proof. Let us introduce φ1 =

Φ (ψ1 ) and φ2 = Φ (ψ2 ).
Let us introduce I(t) = 12 Ω ((φ2 (t, x) − φ1 (t, x))+ )2 dx. We have

I (t) = (∂t φ2 (t, x) − ∂t φ1 (t, x))(φ2 (t, x) − φ1 (t, x))+ dx
Ω

1
= |∇φ2 (t, x) − ∇φ1 (t, x)|2 1φ2 (t,x)−φ1 (t,x)≥0 −(φ2 (t, x) − φ1 (t, x))+
Ω σ2

× ( f (x, φ2 (t, x)ψ2 (t, x))φ2 (t, x) − f (x, φ1 (t, x)ψ1 (t, x))φ1 (t, x)) dx

1
≥ ( f (x, φ1 (t, x)ψ1 (t, x))φ1 (t, x) − f (x, φ2 (t, x)ψ2 (t, x))φ2 (t, x))
σ2 Ω
×(φ2 (t, x) − φ1 (t, x))+ dx

1
≥ ( f (x, φ1 (t, x)ψ2 (t, x))φ1 (t, x) − f (x, φ2 (t, x)ψ2 (t, x))φ2 (t, x))
σ2 Ω
×(φ2 (t, x) − φ1 (t, x))+ dx
≥ 0.
Hence, since I(T ) = 0 and I ≥ 0, we know that I = 0. Consequently, φ1 ≥ φ2 .

We now turn to the second equation (Eψ ) of system (S).
Proposition 12.4. Let us fix > 0 and suppose that m0 ∈ L2 (Ω ).
∀φ ∈ P , there is a unique weak solution ψ to the following equation (Eψ ):
σ2 1
∂t ψ − Δ ψ = 2 f (x, φ ψ )ψ (Eψ )
2 σ
with ∂∂ψn = 0 on (0, T ) × ∂ Ω and ψ (0, ·) = φm(0,·)

0 (·)
.
Hence Ψ : φ ∈ P → ψ ∈ P is well defined.
Moreover, ∀φ ∈ P , ψ = Ψ (φ ) ∈ P0 .
Proof. The proof of existence and uniqueness of a weak solution ψ ∈ P is the same
as in Proposition 12.2. The only thing to notice is that the initial condition ψ (0, ·) is
in Ł2 (Ω ) because m0 ∈ L2 (Ω ) and φ is bounded from
below by > 0.
Now, to prove that ψ ≥ 0, let us introduce I(t) = 12 Ω (ψ (t, x)− )2 dx; then

I (t) = − ∂t ψ (t, x)ψ (t, x)− dx
Ω

1
=− −∇ψ (t, x) · ∇(ψ (t, x)− )+
f (x, φ (t, x)ψ (t, x))ψ (t, x)ψ (t, x)− dx
Ω σ2

1
=− |∇ψ (t, x)| 1ψ (t,x)≤0 − 2 f (x, φ (t, x)ψ (t, x))(ψ (t, x)− ) dx
2 2
Ω σ
≤ 0.
Since I(0) = 0 and I ≥ 0, we know that I = 0. Hence, ψ is positive.

Now we turn to a monotonicity result regarding Ψ .
Proposition 12.5.
∀φ1 ≤ φ2 ∈ P , Ψ (φ1 ) ≥ Ψ (φ2 ).
Proof. Let us introduce ψ1 = Ψ (φ1 ) and ψ2 = Ψ (φ2 ).

238 O. Guéant
Let us introduce I(t) = 2 Ω ((ψ2 (t, x) − ψ1 (t, x))+ ) dx.

2 We have

I (t) = (∂t ψ2 (t, x) − ∂t ψ1 (t, x))(ψ2 (t, x) − ψ1 (t, x))+ dx
Ω

1
= − |∇ψ2 (t, x)−∇ψ1 (t, x)| 1ψ2 (t,x)−ψ1 (t,x)≥0 dx+ 2
2
(ψ2 (t, x)−ψ1 (t, x))+
Ω σ Ω
× ( f (x, φ2 (t, x)ψ2 (t, x))ψ2 (t, x) − f (x, φ1 (t, x)ψ1 (t, x))ψ1 (t, x)) dx

1
≤ ( f (x, φ2 (t, x)ψ2 (t, x))ψ2 (t, x) − f (x, φ1 (t, x)ψ1 (t, x))ψ1 (t, x))
σ2 Ω
×(ψ2 (t, x) − ψ1(t, x))+ dx

1
≤ ( f (x, φ1 (t, x)ψ2 (t, x))ψ2 (t, x) − f (x, φ1 (t, x)ψ1 (t, x))ψ1 (t, x))
σ2 Ω
×(ψ2 (t, x) − ψ1(t, x))+ dx
≤ 0.

Now, I(0) = 12 Ω m0 (x)(( φ (0,x)

1
− φ (0,x)
1
)+ )2 dx = 0. Hence since I ≥ 0, we know
2 1
that I = 0. Consequently, ψ1 ≥ ψ2 .

We will use these properties to design a constructive scheme for the couple
(φ , ψ ).
12.4 A Constructive Scheme to Solve System (S)
1
The scheme we consider involves two sequences (φ n+ 2 )n and (ψ n )n that are built
using the following recursive equations:
ψ 0 = 0,
1
∀n ∈ N, φ n+ 2 = Φ (ψ n ),
1
∀n ∈ N, ψ n+1 = Ψ (φ n+ 2 ).
Theorem 12.1. Suppose that uT ∈ L∞ (Ω ) and that m0 ∈ L2 (Ω ).

Then, the preceding scheme has the following properties:
1
• (φ n+ 2 )n is a decreasing sequence of P , where is as in Proposition 12.2.
• (ψ n )n is an increasing sequence of P0 , bounded from above by Ψ ().
1
• (φ n+ 2 , ψ n )n converges for almost every (t, x) ∈ (0, T )× Ω , and in L2 (0, T, L2 (Ω ))
toward a couple (φ , ψ ).
• (φ , ψ ) ∈ P × P0 is a weak solution of (S).
Proof. By immediate induction, we obtain from Propositions 12.2 and 12.4 that the
two sequences are well defined and in the appropriate spaces.
1
Now, as far as monotonicity is concerned, we have that ψ 1 = Ψ (φ 2 ) ≥ 0 = ψ 0 .
Hence, if for a given n ∈ N we have ψ n+1 ≥ ψ , then Proposition 3 gives
n
3 1
φ n+ 2 = Φ (ψ n+1 ) ≤ Φ (ψ n ) = φ n+ 2 .
Applying now the function Ψ we obtain

3 1
ψ n+2 = Ψ (φ n+ 2 ) ≥ Ψ (φ n+ 2 ) = ψ n+1 .
1
By induction, we then have that (φ n+ 2 )n is decreasing and (ψ n )n is increasing.
1 1
Moreover, since φ n+ 2 ≥ , we have that ψ n+1 = Ψ (φ n+ 2 ) ≤ Ψ ().
Now this monotonic behavior allows us to define two limit functions φ and ψ in
L2 (0, T, L2 (Ω )), and the convergence is almost everywhere and in L2 (0, T, L2 (Ω )).
Now we want to show that (φ , ψ ) is a weak solution of (S), and to this end we
use the energy estimates of the parabolic equations.
We know that there exists a constant C > 0 that only depends on uT ∞ , σ , and
f ∞ such that
1 1
∀n ∈ N, φ n+ 2 L2 (0,T,H 1 (Ω )) + ∂t φ n+ 2 L2 (0,T,H −1 (Ω )) ≤ C.
1
Hence, φ ∈ P, and we can extract a subsequence (φ n + 2 )n such that
1
• φ n + 2 → φ in the L2 (0, T, L2 (Ω )) sense and almost everywhere.
1
• ∇φ n + 2 ∇φ weakly in L2 (0, T, L2 (Ω )).
1
• ∂t φ n + 2 ∂t φ weakly in L2 (0, T, H −1 (Ω )).
Now, for w ∈ L2 (0, T, H 1 (Ω ))
T T
1 σ2 1
∂t φ n + 2 (t, ·), w(t, ·)H −1 (Ω ),H 1 (Ω ) dt − ∇φ n + 2 (t, x) · ∇w(t, x) dx dt
0 2 0 Ω

1 T 1 1
=− f (x, φ n + 2 (t, x)ψ n (t, x))φ n + 2 (t, x)w(t, x) dx dt.
σ 0 Ω
2
Using the weak convergence stated previously, the continuity hypothesis on

f , and the dominated convergence theorem for the last term, we get that ∀w ∈
L2 (0, T, H 1 (Ω ))
T T
σ2
∂t φ (t, ·), w(t, ·)H −1 (Ω ),H 1 (Ω ) dt − ∇φ (t, x) · ∇w(t, x) dx dt
0 2 0 Ω
T
1
=− f (x, φ (t, x)ψ (t, x))φ (t, x)w(t, x) dx dt.
σ2 0 Ω
240 O. Guéant
Hence, for almost every t ∈ (0, T ) and ∀v ∈ H 1 (Ω ),

σ2
∂t φ (t, ·), vH −1 (Ω ),H 1 (Ω ) − ∇φ (t, x) · ∇v(x) dx
2 Ω

1
=− f (x, φ (t, x)ψ (t, x))φ (t, x)v(x) dx,
σ2 Ω
and the terminal condition is obviously satisfied.

Obviously, we also have φ ≥ .
Now, turning to the second equation, the proof works the same. The only additional
thing to notice is that
m0 m0
ψ (0, ·) = lim ψ n+1 (0, ·) = lim = ,
n→∞ n→∞ φ n+ 12
(0, ·) φ (0, ·)
where the limits are in the L2 (Ω ) sense.

Hence (φ , ψ ) is indeed a weak solution of (S).

12.5 Concluding Remarks
In this chapter, we exhibited a monotonic way to build a solution to the system (S).
To understand well the nature of the change of variables and of the constructive
1
scheme we used, let us introduce the sequence (mn+1 )n , where mn+1 = φ n+ 2 ψ n+1 .
From Theorem 12.1, we know that (m )n converges almost everywhere and in L1
n+1
toward the function m = φ ψ for which we have the conservation of mass along the
trajectory. However, this property is not true for mn+1 as it is stated in the following
proposition.

Proposition 12.6. Let us consider n ∈ N and let us denote by M n+1 (t) = Ω mn+1
(t, x) dx the total mass of mn+1 at date t.
Then, there may be a loss of mass along the trajectory in the sense that:

d n+1 1 1
M (t) = ψ n+1 (t, x)φ n+ 2 (t, x) × f x, ψ n+1 (t, x)φ n+ 2 (t, x)
dt Ω
1

− f x, ψ n (t, x)φ n+ 2 (t, x) dx ≤ 0.
Proof. From the regularity obtained previously we can write
d n+1 1
M (t) = ∂t φ n+ 2 (t, ·), ψ n+1 (t, ·)H −1 (Ω ),H 1 (Ω )
dt
1
+∂t ψ n+1 (t, ·), φ n+ 2 (t, ·)H −1 (Ω ),H 1 (Ω )

σ2 1
= ∇φ n+ 2 (t, x) · ∇ψ n+1(t, x) dx
2 Ω

1 1 1
− 2 φ n+ 2 (t, x)ψ n+1 (t, x) f x, φ n+ 2 (t, x)ψ n (t, x) dx
σ Ω

σ2 1
− ∇φ n+ 2 (t, x) · ∇ψ n+1 (t, x) dx
2 Ω

1 1 1
+ 2 φ n+ 2 (t, x)ψ n+1 (t, x) f x, φ n+ 2 (t, x)ψ n+1 (t, x) dx
σ Ω

1
= ψ n+1 (t, x)φ n+ 2 (t, x)
Ω
1
1

× f x, ψ n+1 (t, x)φ n+ 2 (t, x) − f x, ψ n (t, x)φ n+ 2 (t, x)
≤ 0.

This property shows that the constructive scheme is rather original since it
basically consists in building probability distribution functions using sequences
of functions in L1 that only have the right total mass asymptotically. Despite this
absence of mass conservation, a discrete counterpart of this constructive scheme is
developed in a work in progress [2] to numerically compute approximations of the
solutions.
References
1. Evans, L.C.: Partial Differential Equations (Graduate Studies in Mathematics, vol. 19).
American Mathematical Society, Providence, RI (2009)
2. Guéant, O.: Mean field games equations with quadratic Hamiltonian: a specific approach. Math.
Models Methods Appl. Sci., 22, (2012)
3. Guéant, O., Lasry, J.M., Lions, P.L.: Mean field games and applications. In: Paris Princeton
Lectures on Mathematical Finance (2010)
4. Lasry, J.-M., Lions, P.-L.: Jeux champ moyen. I. Le cas stationnaire. C. R. Acad. Sci. Paris
343(9), 619–625
5. Lasry, J.-M., Lions, P.-L.: Jeux champ moyen. II. Horizon fini et contrôle optimal. C. R. Acad.
Sci. Paris 343(10), 679–684 (2006)
6. Lasry, J.-M., Lions, P.-L.: Mean field games. Jpn. J. Math. 2(1), 229–260 (2007)
7. Lasry, J.-M., Lions, P.-L.: Cours au collège de france: théorie des jeux champs moyens. http://
www.college-de-france.fr/default/EN/all/equ der/audio video.jsp. (2008)
Chapter 13
Differential Game-Theoretic Approach
to a Spatial Jamming Problem
Sourabh Bhattacharya and Tamer Başar
Abstract In this work, we investigate the effect of an aerial jamming attack on

the communication network of a team of UAVs flying in a formation. We propose
a communication and motion model for the UAVs. The communication model
provides a relation in the spatial domain for effective jamming by an aerial intruder.
We formulate the problem as a zero-sum pursuit-evasion game. In our earlier work,
we used Isaacs’ approach to obtain motion strategies for a pair of UAVs to evade
the jamming attack. We also provided motion strategies for an aerial intruder to
jam the communication between a pair of UAVs. In this work, we extend the
analysis to multiple jammers and multiple UAVs flying in a formation. We analyze
the problem in the framework of differential game theory, and provide analytical and
approximate techniques to compute non-singular motion strategies of the UAVs.
Keywords Pursuit-evasion games • Aerial vehicles • Jamming
13.1 Introduction
In the past few years, considerable research has been done to deploy multiple
UAVs in a decentralized manner to carry out tasks in military as well as civilian
scenarios. UAVs have shown promise in a wide range of applications. The recent
availability of low-cost UAVs suggests the use of teams of vehicles to perform
various tasks such as mapping, surveillance, search and tracking operations [10,43].
For these applications, there has been much focus to deploy teams of multiple UAVs
∗∗∗ This work was supported in part by a grant from AFOSR.

S. Bhattacharya () • T. Başar
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign,
Urbana, IL 61801, USA
246 S. Bhattacharya and T. Başar
in cooperative or competitive manners [31]. An extensive summary of important

milestones and future challenges in networked control of multiple UAVs has been
presented in [34].
In general, the mode of communication among UAVs deployed in a team mission
is wireless. This renders the communication channel vulnerable to malicious attacks
from aerial intruders flying in the vicinity. An example of such an intruder is
an aerial jammer. Jamming is a malicious attack whose objective is to disrupt
the communication of the victim network intentionally, causing interference or
collision at the receiver side. Jamming attack is a well-studied and an active area
of research in wireless networks. Many defense strategies have been proposed by
researchers against jamming in wireless networks. In [47], Wu et al. have proposed
two strategies to evade jamming. The first strategy, channel surfing, is a form of
spectral evasion that involves legitimate wireless devices changing the channel that
they are operating on. The second strategy, spatial retreats, is a form of special
evasion whereby legitimate devices move away from the jammer. In [45], Wood
et al. have presented a distributed protocol to map jammed region so that the
network can avoid routing traffic through it. The solution proposed by Cagalj
et al. [7] uses different worm holes (wired worm holes, frequency-hopping pairs,
and uncoordinated channel hopping) that lead out of the jammed region to report
the alarm to the network operator. In [46], Wood et al. have investigated how to
deliberately avoid jamming in IEEE 802.15.4 based wireless networks. In [8], Chen
has proposed a strategy to introduce a special node in the network called the anti-
jammer to drain the jammer’s energy. To achieve its goal, the anti-jammer configures
the probability of transmitting bait packets to attract the jammer to transmit.
For a static jammer and mobile nodes, the optimal strategy for the nodes is
to retreat away from the jammer after detecting jamming. In case of an aerial
jamming attack, optimal strategies for retreat are harder to compute due to the
mobility of the jammer and constraints in the kinematics of the UAVs. This attack
can be modeled as a zero-sum game [1] between the jammer and the UAVs. Such
dynamic games governed by differential equations can be analyzed using tools
from differential game theory [19, 23]. In the past, differential game theory has
been used as a framework to analyze problems in multi-player pursuit-evasion
games. Solutions for particular multi-player games were presented by Pashkov
and Terekhov [30], Levchenkov and Pashkov [22], Hagedorn and Breakwell [18],
Breakwell and Hagedorn [6], and Shankaran et al. [35]. More general treatments
of multi-player differential games were presented by Starr and Ho [37], Vaisbord
and Zhukovskiy [44], Zhukovskiy and Salukvadze [48] and Stipanović et al. [40].
The inherent hardness in obtaining an analytical solution to Hamilton–Jacobi–
Bellman–Isaacs equation has led to the development of numerical techniques for
the computation of the value function. Recent efforts in this direction to compute
approximate reachable sets have been provided by Mitchell and Tomlin [26] and
Stipanović et al. [38, 39].
In contradistinction with the existing literature, our work analyzes the behavior of
multiple UAVs in cooperative as well as non-cooperative scenarios in the presence
of a malicious intruder in the communication network. In this paper, we envision a
scenario in which aerial jammers intrude the communication network in a multiple
13 Differential Game-Theoretic Approach to a Spatial Jamming Problem 247
UAV formation. We model the intrusion as a continuous time pursuit-evasion game

between the UAVs and the aerial jammers. In contradistinction to the previous work
on pursuit-evasion games that formulate a payoff based on a geometric quantity in
the configuration space of the system, we formulate a payoff based on the capability
of the players in a team to communicate among themselves in the presence of a
jammer in the vicinity. In particular, we are interested in computing strategies for
spatial reconfiguration of a formation of UAVs in the presence of an aerial jammer
to reduce the jamming on the communication channel.
In [2], we investigated the problem of finding motion strategies for two UAVs to
evade jamming in the presence of an aerial intruder. We considered a differential
game-theoretic approach to compute optimal strategies by a team of UAVs. We
formulated the problem as a zero-sum pursuit-evasion game. The cost function
was picked as the termination time of the game. We used Isaacs’ approach to
derive the necessary conditions to arrive at the equations governing the saddle-
point strategies of the players. In [4], we extended the previous analysis to a team
of heterogeneous vehicles, consisting of UAVs and autonomous ground vehicles
(AGVs). In [3], we generalized the previous work to networks having an arbitrary
number of agents possessing different dynamics. We modeled the problem as one
of maintaining connectivity in a dynamic graph in which the existence of an edge
between two nodes depends on the state of the nodes as well as the jammer. Due to
the dependence of the combinatorial structure of the graph on the continuous-time
dynamics of the nodes we used the notion of state-dependent graphs, introduced in
[25], to model the problem. Applying tools from algebraic graph theory on the state-
dependent graphs provided us with greedy strategies for connectivity maintenance
for the agents as well as the jammer.
In this work, we extend our previous works to the case of multiple UAVs flying in
a formation in the vicinity of multiple aerial jammers. We use Isaacs’ conditions to
obtain the saddle point strategies as well as the retrogressive path equations (RPE)
for the UAVs as well as the jammers. Using tools from algebraic graph theory, we
present an approximation of the terminal manifold of the game. The RPE can be
integrated back in time from the terminal manifolds to provide an approximate value
of the game.
The rest of the sections are organized as follows. In Sect. 13.2, we present the
problem formulation, and provide a communication model and the kinematic model
for the UAVs. In Sect. 13.3, the saddle-point strategy is computed for the UAVs and
the jammers. In Sect. 13.4, we characterize the terminal manifolds appearing in the
game. In Sect. 13.5, we present simulation results for scenarios involving different
number of UAVs and jammers. Finally, Sect. 13.6 presents the conclusions.
13.2 Communication and Dynamic Model
In this section, we first introduce a communication model between two mobile nodes
in the presence of a jammer. Then we present the mobility models for the nodes. We
conclude the section by formally formulating the problems we study in the paper.
13.2.1 Jammer and Communication Model
Consider a mobile node (receiver) receiving messages from another mobile node
(transmitter) at some frequency. Both communicating nodes are assumed to be
lying on a plane. Consider a third node that is attempting to jam the communication
channel in between the transmitter and the receiver by sending a high power noise
at the same frequency. This kind of jamming is referred to as trivial jamming. Two
other types of jamming are:
1. Periodic jamming: This refers to a periodic noise pulse being generated by the
jammer irrespective of the packets that are put on the network.
2. Intelligent jamming: In this mode of jamming a jammer is put in a promiscuous
mode to destroy primarily the control packets.
A variety of metrics can be used to compare the effectiveness of various jamming
attacks. Some of these metrics are energy efficiency, low probability of detection,
and strong denial of service [27, 29]. In this paper, we use the ratio of the jamming-
power to the signal-power (JSR) as the metric. From [32], we have the following
models for the JSR (ξ ) at the receiver’s antenna.
1. Rn model
PJT GJR GRJ n log10 ( DDTR )
ξ= 10 JR
PT GTR GRT
2. Ground Reflection Propagation

2 4
PJT GJR GRJ hJ DTR
ξ=
PT GTR GRT hT DJR
3. Nicholson
PJT GJR GRJ 4 log10 ( DDTR )
ξ= 10 JR
PT GTR GRT
where PJT is the power of the jammer transmitting antenna, PT is the power of the
transmitter, GTR is the antenna gain from transmitter to receiver, GRT is the antenna
gain from receiver to transmitter, GJR is the antenna gain from jammer to receiver,
GRJ is the antenna gain from receiver to jammer, hJ is the height of the jammer
antenna above the ground, hT is the height of the transmitter antenna above the
ground, DTR is the Euclidean distance between the transmitter and the receiver, and
DJR is the Euclidean distance between the jammer and the transmitter. All the above
models are based on the propagation loss depending on the distance of the jammer
and the transmitter from the receiver. In all the above models JSR is dependent on
the ratio DDTR
JR
.
For digital signals, the jammer’s goal is to raise the ratio to a level such that the
bit error rate [33] is above a certain threshold. For analog voice communication,
Fig. 13.1 Configuration y

of a UAV
xi
y
i
O x
the goal is to reduce the articulation performance so that the signals are difficult to
understand. Hence we assume that the communication channel between a receiver
and a transmitter is considered to be jammed in the presence of a jammer if ξ ≥ ξtr
where ξtr is a threshold determined by many factors including application scenario
and communication hardware. If all the parameters except the mutual distances
between the jammer, transmitter and receiver are kept constant, we can conclude the
following from all the above models: If the ratio DDTR JR
≥ η then the communication
channel between a transmitter and a receiver is considered to be jammed. Here η is
a function of ξ , PJT , PT , GTR , GRT , GJR , GRJ and DTR . Hence if the transmitter is not
within a disc of radius η DJR centered around the receiver, then the communication
channel is considered to be jammed. We call this disc as the perception range. The
perception range for any node depends on the distance between the jammer and the
node. For effective communication between two nodes, each node should be able
to transmit as well as receive messages from the other node. Hence two nodes can
communicate if they lie in each other’s perception range.
We will adopt the above jamming and communication model, for the rest of the
paper.
13.2.2 System Model
We now describe the kinematic model of the nodes. In our analysis, each node
is a UAV. We assume that the UAVs are having a constant altitude flight. This
assumption helps to simplify our analysis to a planar case. Referring to Fig. 13.1,
the configuration of the ith UAV in the network can be expressed in terms of the
variables (xi , yi , φi ) in the global coordinate frame. The pair (xi , yi ) represents the
position of a reference point on UAVi with respect to the origin of the global
reference frame and φi denotes the instantaneous heading of the UAVi in the global
reference frame. Hence the state space for UAVi is Xi R2 × S1 . In our analysis, we
assume that the UAVs are a kinematic system and hence the dynamics of the UAVs
are not taken into account in the differential equation governing the evolution of the
system. The kinematics of the UAVs are assumed to be the following:
⎡ ⎤ ⎡ ⎤
ẋ cos φi
Ẋi := ⎣ ẏ ⎦ = ⎣ sin φi ⎦ =: fi (xi , σi1 ) (13.1)
φ̇ σi1
where, σi1 is the angular speed of UAVi . We assume that σi1 ∈ Ui {φ : [0,t] →
[−1, +1] | φ (·) is measurable}. Since the jammer is also an aerial vehicle we
model its kinematics as a UAV. The motion of the jammer is governed by a set of
equations similar to (13.1). The configuration of each jammer is expressed in terms
of the variables (xi , yi , φi ) and the kinematics is given by the following equation:
⎡ ⎤ ⎡ ⎤
ẋi cos φi
Ẋi := ⎣ ẏi ⎦ = ⎣ sin φi ⎦ =: fi (xi , σi2 )
φ˙i σi2
where, σi2 ∈ Ui as defined earlier. The state space for jammer i is Xi R2 × S1 . The
state space of the entire system is X = X1 × · · · × Xn × X1 × · · · × Xm R2(n+m) ×
(S1 )n+m . We use the following notation in order to represent the dynamics of the
entire system:
ẋ = f (x, {σi1 (·)}ni=1 , {σi2 (·)}m

i=1 ) (13.2)
where, x ∈ X, f = [ f1T , · · · , fnT , f1T , · · · , fmT ].
13.2.3 Problem Formulation
From the communication model presented in the previous section, the connectivity
of the network of UAVs depends on their position relative to the jammers. Given
m UAVs in the network, we define a graph G on m vertices as follows. The
vertex corresponding to UAVi is labeled as i. An edge exists between vertices i
and j iff there is a communication link between UAVi and UAV j . We define the
communication network to be disconnected when G has more than one component.
In this problem, the existence of a communication link between two nodes depends
on the relative distance of the two nodes from the jammers. Using the above model
for the communication network, we present the following problem statement.
Assume that G is initially connected. The jammers intend to move in order
to disconnect the communication network in minimum amount of time possible.
The UAVs must move in order to maintain the connectivity of the network for the
maximum possible amount of time. We want to compute the motion strategies for
the UAVs in the network. Our interest lies in understanding the spatial reconfigura-
tion of the formation so that the jammers can be evaded.
13.3 Optimal Strategies
In this section, we introduce the concept of optimal strategies for the vehicles.
Given the control histories of the vehicles, {σi1 (·)}ni=1 , {σi2 (·)}m
i=1 , the outcome
of the game is denoted by π : X × Uin+m → R and is defined as the time of
termination of the game:
π (x0 , {σi1 (·)}ni=1 , {σi2 (·)}m

i=1 ) = t f (13.3)
where t f denotes the time of termination of the game when the players play
({σi1 (·)}ni=1 , {σi2 (·)}m
i=1 ) starting from an initial point x0 ∈ X. The game terminates
when the communication network gets disconnected. The objective of the jammer
is to minimize the termination time and the objective of the UAVs is to maximize it.
Since the objective function of the team of UAVs is in conflict with that of the
team of jammers, the problem can be formulated as a multi-player zero sum team
game. A play ({σi1∗ (·)}ni=1 , {σi2∗ (·)}m i=1 ) is said to be the optimal for the players if
it satisfies the following conditions:
{σi1∗ }ni=1 = arg max π [x0, {σi1 (·)}ni=1 , {σi2∗ (·)}m

i=1 ]
{σi1 }ni=1
{σi2∗ }m
i=1 = arg min π [x0 , {σi (·)}i=1 , {σi (·)}i=1 ]
1∗ n 2 m
{σi2 }m
i=1
The value of a game, denoted by the function J : X → R, can then be defined as

follows:
J(x) = π [x0, {σi1∗ (·)}ni=1 , {σi2∗ (·)}m

i=1 ] (13.4)
The value of the game is unique at a point X in the state-space. An important

property satisfied by optimal strategies is the Nash equilibrium property. A set
of strategies ({σi1 }ni=1 , {σi2 }m
i=1 ) is said to be in Nash equilibrium if no unilateral
deviation in strategy by a player can lead to a better outcome for that player. This
can also be expressed mathematically by the following condition:
π [x0 , {σi1 }ni=1 , {σi2 }m

i=1 ] ≤ π [x0 , {σ
i1 }ni=1 , σ 2j , σ
−2 j ], ∀ j ∈ {1, · · · , m}
π [x0, σ 1j , σ−1 j , {σi2 }ni=1 ] ≤ π [x0, {σi1 }ni=1 , {σi2 }m

i=1 ], ∀ j ∈ {1, · · · , n} (13.5)
In the above expressions σ−1 j is used to represent the controls of all the UAVs
except UAV j . Similarly, σ−2 j is used to represent the controls of all the jammers
except the jth jammer. From (13.5), we can conclude that there is no motivation for a
player to deviate from its equilibrium strategy. In general, there may be multiple sets
of strategies for the players that are in Nash equlibrium. Assuming the existence of
a value, as defined in (13.4) and the existence of a unique Nash equilibrium, we can
conclude that the Nash equilibrium concept of person-by-person optimality given in
(13.5) is a necessary condition to be satisfied for the set of optimal strategies for the
players and furthermore, computing the set of strategies that are in Nash equilibrium
also gives us the set of optimal strategies. In the following analysis, we assume the
aforementioned conditions in order to compute the optimal strategies.
The following theorem provides a relation between the optimal strategy of each
player and the gradient of the value function, ∇J.
Theorem 13.1. Assuming that J(x) is a smooth function of x, the optimal strategies
({σi1∗ }ni=1 , {σi2∗ }m
i=1 ) satisfy the following condition:
σi1∗ = sign Jφi , i = 1, · · · , n

σi2∗ = −sign Jφi , i = 1, · · · , m (13.6)
Proof. The proof essentially follows the two-player version as provided in [19].
Let us consider a play after time t has elapsed from the beginning of the game at
which point the players are at a state x. The outcome functional is provided by the
following expression:
t+h
π (x(t), {σi1(·)}ni=1 , {σi2 (·)}m
i=1 ) = dt + J(x(t + h))
t
Using Taylor series approximation of J we obtain the following relation:
J(x(t+h))−J(x(t))=J(x(t)+ f (x(t), {σi1 (t)}ni=1 , {σi2 (t)}m

i=1 )h+h(h))−J(x(t))
(13.7)
where (h) is a vector with each entry belonging to o(h). Let δ be defined as follows:
δ = f (x(t), {σi1 (t)}ni=1 , {σi2 (t)}m

i=1 )h + h(h)
Using the Taylor series approximation for J around the point x(t), we get the
following expression for the RHS of (13.7):
n+m
= ∑ ∇J · δ + |δ |o(|δ |)
i=1

n m
=h ∑ (Jxi cos φi + Jyi sin φi + Jφi σi1 ) + ∑ (Jxi cos φi + Jyi sin φi + Jφi σi2 ) + α (h)
i=1 i=1
where limh→0 α (h) = 0.

n
π x(t), {σi (·)}i=1 , {σi (·)}i=1 = J(x(t)) + h 1 + ∑ (Jxi cos φi + Jyi sin φi + Jφi σi1 )
1 n 2 m
i=1

m
+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi +α (h)
2
i=1
(13.8)
First, let us consider the controls of the jammer. From the Nash property, we can
conclude that if σ 1j = σ 1∗
j , ∀ j ∈ [1, · · · , n] and σ−i = σ−i then σi = σi minimizes
2 2∗ 2 2∗
the left hand side of the above equation. Therefore, we can conclude the following:
1. The optimal control satisfies the following condition:
σi2∗ (t) = arg min Jφi σi2 (t) (13.9)

σi2
In a similar manner the controls of the UAVs satisfy the following condition:
σi1∗ (t) = arg max Jφi σi1 (t) (13.10)

σi1
2. In the case, when σij = σij∗ ∀i, j = 1, 2 in (13.8), we obtain the following
relation:

n
π x(t), {σi }i=1 , {σi }i=1 = J(x(t)) + h 1 + ∑ (Jxi cos φi + Jyi sin φi + Jφi σi1 )
1∗ n 2∗ m
i=1
J(x(t))

m
+ ∑ (Jxi cos φi + Jyi sin φi + Jφi σi2 ) + α (h)
i=1

n
⇒ h 1+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi1
i=1

m
+∑ (Jxi cos φi +Jyi sin φi +Jφi σi2 )+α (h) =0
i=1
n
⇒ 1+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi1
i=1
m
+ ∑ Jxi cos φi +Jyi sin φi +Jφi σi2 +α (h)=0
i=1
Taking limh→0 on both sides leads to the following relation:

n m
1+ ∑ (Jxi cos φi +Jyi sin φi +Jφi σi1 ) ∑ (Jxi cos φi + Jyi sin φi + Jφi σi2 ) = 0 (13.11)
i=1 i=1
(13.9), (13.10) and (13.11) extend the Isaacs’ conditions that provide the optimal
controls for two-player zero-sum differential games to the case of three-player zero-
sum differential games.
From [19], the Hamiltonian of the system is given by the following expression:
n m n
H x, ∇J, σi1∗ i=1 , σi2∗ i=1 = 1 + ∑ Jxi cos φi + Jyi sin φi + Jφi σi1
i=1
m
+ ∑ Jxi cos φi + Jyi sin φi + Jφi σi2
i=1
which is the left side of (13.11). Hence, (13.11) can equivalently be expressed as:
H (x, ∇J, {σi1∗ }ni=1 , {σi2∗ }m

i=1 ) = 0 (13.12)
In addition to the above conditions, the value function also satisfies the PDE
given in the following theorem.
Theorem 13.2. The value function follows the following partial differential equa-
tion (PDE) along the optimal trajectory, namely the retrogressive path equation
(RPE)
˚ = ∂H
(∇J) (13.13)
∂x
where˚denotes derivative with respect to retrograde time.

Therefore, the above equation is called the retrogressive path equations. The proof
of the above theorem follows the same lines as in the case of two-player zero-sum
games [19].
Now for our specific problem, the RPE for the vehicles is given by the following
set of equations:
• For UAVi in the formation the following holds true:
J˚xi = 0, J˚yi = 0
J˚φi = −Jxi sin φi + Jyi cos φi (13.14)
• For the jammer i in the formation the following holds true:
J˚xi = 0, J˚yi = 0
J˚φi = −Jxi sin φi + Jyi cos φi (13.15)
In many problems the value functions are not smooth enough to satisfy the Isaacs
equations. Many papers have worked around this difficulty, especially Fleming
[15, 16], Friedman [17], Elliott and Kalton [11, 12], Krassovski and Subbotin [21],
and Subbotin [41]. In [9], the authors present a new notion of “viscosity“ solution
for Hamilton-Jacobi equations and prove the uniqueness of such solutions in a
wide variety of situations. In [24], the author shows that the dynamic programming
optimality condition for the value function in differential control theory problems
implies that this value function is the viscocity solution of the associated HJB PDE.
The foregoing conclusions turn out to extend to differential game theory. In [36], the
authors show that in the context of differential games, the dynamic programming
optimality conditions imply that the values are viscosity solutions of appropriate
partial differential equations. In [13], the authors present a simplification of the
previous work. This work is based on the smoothness assumption of the value
function.
In the next section, we discuss the computation of the terminal manifold for the
game.
13.4 Terminal Conditions
In this section, we present a computation of the termination conditions by modeling

the communication network between the UAVs as a graph. In our problem, the
connectivity of the network of UAVs depends on their position relative to the
jammers. Given m UAVs in the network we define a graph on m vertices, G, as
follows. The vertex corresponding to UAVi is labeled as i. An edge exists between
vertices i and j iff there is a communication link between UAVi and UAV j . In this
problem, the existence of a communication link between two nodes depends on
the relative distances of the two nodes from the jammers. Therefore, the adjacency
relation between two nodes is dependent on the state of the system.
Now that we have a mapping from the state of the system to a graph G, we can
investigate the connectivity of the communication network from the connectivity of
G by defining the following quantity associated with graphs:
• Laplacian of a graph (L(G)) : It is an m × m matrix with entries given as
follows:

−1 if an edge exists between i and j
1. ai j =
0 if no edge exists between i and j
2. aii = − ∑m
k=1,k=i aik
The second-smallest eigenvalue of L(G) is called the Fiedler values, denoted as

λ2 (L(G)). It is also called the algebraic connectivity of G. It has emerged as an
important parameter in many systems problems defined over networks. In [14, 28,
42], it has also been shown to be a measure of the stability and robustness of the
networked dynamic system. Since this work deals with connectivity maintenance
in the presence of malicious intruders, λ2 (L(G)) arises as a natural parameter of
interest for both players.
Lemma 13.1 ([5]). : A graph G is connected if and only if λ2 (L(G)) > 0.
Therefore, all disconnected graphs on m vertices belong to the following set:
G̃ = {G|λ2 (L(G)) = 0}
Let F(λ , G) = det(L(G) − λ Im ) where, Im denotes the identity matrix of dimension

m × m.
Theorem 13.3.
G̃ = {G | Fλ (λ , G)|λ =0 = 0}
where Fλ denotes derivative with respect to λ .

Proof. ⇒ The smallest eigenvalue of the Laplacian of any graph is zero, i.e.,
λ1 (L(G)) = 0 [5]. Let G be a disconnected graph. From Lemma 13.1, we can
conclude that λ2 (L(G)) = 0. Therefore, λ = 0 is a repeated root of the equation
F(λ , G) = 0 ⇒ Fλ (λ , G) |λ =0 = 0.
⇐ F(0, G) = 0 since λ = 0 is an eigenvalue of the Laplacian matrix. In addition,
Fλ (λ , G) |λ =0= 0. Therefore, λ = 0 is a multiple root of F(λ , G) with algebraic
multiplicity greater than 1. Hence λ2 (L(G)) = 0 ⇒ G is disconnected.
From the expression for F(λ , G), we can conclude the following:

d((L(G) − λ Im ))
Fλ (λ , G) |λ =0 = tr adj (L(G) − λ Im ) |λ =0
dλ
= −tr[adj(L(G) − λ Im )] |λ =0
= −tr[adj(L(G))]
m
= − ∑ Mii
i=1
where Mii is the minor of L(G) corresponding to the diagonal element in the ith row.
Substituting the above relation in Theorem 13.3 leads to the following equation for
the variable {ai j }m
i, j=1 at the terminal manifold:
m
∑ Mii = 0 (13.16)
i=1
Since ai j ∈ [0, 1] the decision problem of whether there exists a solution to

the above equation is NP-complete [20]. Therefore, in the worst case, we have to
enumerate all possible combinations of the edge variables and verify the above
equation based on the assumption that P = NP. Let us assume that {ai j }m i, j=1 is
the set of edge variables that satisfies the above equation. The set of states of UAVi ,
UAV j and jammers that satisfy the constraint ai j = 0 or ai j = 1 can be given by the
following equation that represents a half-space:
gi j (xi , yi , x j , y j , {xk }m m
k=1 , {yk }k=1 ) ≥ 0
The set of states of the UAVs and the jammers that represent a disconnected
communication network is given by the following expression:

R= (gi j (xi , yi , x j , y j , {xk }m m
k=1 , {yk }k=1 ) ≥ 0)
i, j
The terminal manifold of the game is given by the boundary of the region R,
∂ R. The above expression characterizes the terminal manifold of the game. The
value of the game at termination is identically zero. In this analysis, we compute
the gradiant of the value at an interior point in a connected component of the
terminal manifold. Since the terminal manifold is discontinuous, optimal trajectories
emanating from them will give rise to singular surfaces [1, 23] which is a topic
of ongoing research. Assuming that a single connected component of the terminal
manifold, M, is a hypersurface in R3(n+m) we can parametrize it using 3(m + n) − 1
variables. Therefore, the tangent space at any point on M has a basis containing
3(m + n) − 1 elements, ti . Since J ≡ 0 along M we obtain the following set of
3(m + n) − 1 equations:
∇J 0 · ti = 0 ∀ i (13.17)
From (13.12), ∇J 0 must satisfy the following equation:

m n
∑ J0xi cos φi + J0yi sin φi + J0φi σi1∗ + ∑ J0x cos φi + J0y sin φi + J0φ σi2∗ = −1
i i i
i=1 i=1
(13.18)
Given a termination condition, we can compute ∇J 0 from (13.17) and (13.18). This
provides the boundary conditions for the RPE presented in the previous section.
In the next section, we present some examples.
13.5 Examples
In this section, we compute the optimal trajectories for the aerial vehicles in two
scenarios involving different numbers of UAVs and jammers.
u J
X o
o u = u max sign Jxi fi (x) Jxi = u Jxi fi (x)
x = f (x ,u ,t) i i xi
Fig. 13.2 The control loop for the system
13.5.1 n = 3, m = 1
First, we consider the case when a single jammer tries to disconnect the communi-
cation network in a team containing three UAVs. As stated in the previous section,
the cost function of the game is the time of termination. The equations of motion of
the vehicles are given as follows:
1. UAVs
ẋi = cos φi , ẏi = sin φi , φ̇ = σi1 , i = 1, 2, 3
2. Jammer
ẋ = cos φ , ẏ = sin φ , φ˙ = σ 2 (13.19)
Under appropriate assumptions on the value function, let J(x) represent the value at
the point x in the state space.
From Theorem 13.1, the expression for the optimal controls can be given as
follows:
σi1∗ = sign(Jφi ), i = 1, 2, 3
σ 2∗ = −sign(Jφ ) (13.20)
The RPE for the system leads to the following equations.
J˚xi = −σi1∗ Jyi , J˚yi = σi1∗ Jxi

J˚φi = −Jxi sin φi + Jyi cos φi
J˚x = −σ 2∗ Jy , J˚y = σ 2∗ Jx
J˚φ = −Jx sin φ + Jy cos φ (13.21)
where˚denotes the derivative of ∇J with respect to retrograde time.

Figure 13.2 summarizes the control algorithm for each vehicle. The controller of
each UAV takes as input the state variables and runs the RPE to compute the control.
Table 13.1 Table shows the a12 a13 a23 Fλ (0, G)

value of Fλ (0, G) for all
possible combinations of 0 0 0 0
(a12 , a13 , a23 ) 1 0 0 0
0 1 0 0
0 0 1 0
1 1 0 1
0 1 1 1
1 0 1 1
1 1 1 3
This control is then fed into the plant of the respective UAV. The plant updates the
state variables based on the kinematic equations governing the UAV. Finally the
sensors feed back the state variables into the controllers. In this case the sensors
measure the position and the orientation of each UAV (Table 13.1).
The Laplacian of the graph representing the connectivity of the communication
network is given by the following matrix:
⎡ ⎤
−(a12 + a13) a12 a13
L(G) = ⎣ a12 −(a12 + a23) a23 ⎦
a13 a23 −(a13 + a23)
From the above form of the Laplacian, we obtain the following expression for
F(λ , G).
F(λ , G) = −(λ 3 + 2λ 2(a12 + a13 + a23) + 3λ (a12a13 + a13a23 + a23 a12 ))
Differentiating the above expression with respect to λ and substituting λ = 0

provides the following equation:
Fλ (0, G) = 3(a12 a23 + a13a12 + a13 a23 ) = 0 (13.22)
The set of triples (a12 , a23 , a13 ) that satisfy Eq. (13.22) are (1, 0, 0), (0, 1, 0),
(0, 0, 1) and (0, 0, 0); see Table 13.1. The first three values represent the situation
in which the communication exists only between a pair of UAVs at termination
(Fig. 13.3). The last triple represents the scenario in which there is no communica-
tion between any pair of UAVs (Fig. 13.4).
Let us consider a termination situation corresponding to the triple (1, 0, 0). Let Dir
represent the closed disk of radius r centered at UAVi . Let ∂ R denote the boundary
of a region R. From the jamming model, we can infer that the jammer must lie
in region R1 = (D3η r31 ∪ D1η r31 ) ∩ (D3η r32 ∪ D2η r32 )/(D1η r12 ∪ D2η r12 ). The termination
manifold is represented by the hypersurfaces ∂ R1 . An example of such a situation is
when the jammer is on the boundary, ∂ D3η r , where r = min{r31 , r32 }. The terminal
manifold is characterized by the following equation.

(x − x3 )2 + (y − y3 )2 = r (13.23)
1
2
3
3
2
Fig. 13.3 The jammers can lie in the shaded region for a network graph of the form shown on
the right hand side. (a) Single-pixel camera developed at Rice University. (b) Images taken by a
single-pixel camera
3 1
2
3
single-pixel camera
This is an 11-dimensional manifold in a 12-dimensional state space. Therefore,

we can characterize the manifold by using eleven independent variables. We let
x1 , y1 , φ1 , x2 , y2 , φ2 , x3 , y3 , φ3 , x and y represent the independent variables. Let J 0
represent the value of the game on the terminal manifold. Since J 0 ≡ 0 on the
terminal manifold, ∇J 0 satisfies the following equations at an interior point in the
manifold:

Jx03 + Jy0 ∂∂ xy = 0, Jy03 + Jy0 ∂∂ yy = 0, Jx0 + Jy0 ∂∂ yx =0, Jφ03 =0
3 3
J0φ = 0, Jx01 = 0, Jy01 = 0, Jφ01 = 0, Jx02 = 0, Jy02 = 0, Jφ02 = 0 (13.24)
From Eq. (13.16), we can conclude the following:
∂ y x − x3
=
∂ x3 r2 − (x − x3)2
Fig. 13.5 Trajectories of the UAVs and Jammer
∂ y
=1
∂ y3
∂ y x − x3
= (13.25)
∂x
r − (x − x3)2
2
Substituting the above values of the gradients in Eq. (13.23), we get the following
expression for Jy0 :
1
Jy0 = (13.26)
cos φ3 ∂∂ xy + sin φ3 ∂∂ yy + cos φ ∂∂ yx − sin φ
3 3
Substituting (13.18) in (13.26), we obtain the following expression for Jy0 :
1
Jy0 = (13.27)
√ 2 x −x 3 2 (cos φ3 + cos φ ) + (sin φ3 − sin φ )
r −(x −x3 )
For the triple (0, 0, 0), the jammer must lie in the region R2 = (D3η r31 ∪ D1η r31 ) ∩
(Dη r32 ∪D2η r32 )∩(D1η r12 ∪D2η r12 ). An analysis similar to the above can be carried out
3
in order to compute the trajectories emanating back from the terminal conditions.
Figure 13.5 shows a simulation of the trajectories of the UAVs and the jammers
from a terminal state. The final states (x, y, φ ) of the three UAVs is given by
(20, −10, 0), (40, 30, 0.14) and (−20, 10, 0.15). The final state of the jammer is
given by (50, −10, −0.17). The figure on the left shows the trajectories of the UAVs.
The jammer traces the path shown on the extreme right. The figure on the right
shows the connectivity of the UAVs. The network of UAVs is initally connected. At
termination, the jammers disconnect the network by isolating one of the UAVs.
Next, we consider the case when there is a couple of jammers trying to disconnect
a team of four UAVs.
13.5.2 n = 4, m = 2
As in the earlier section, the equations of motion of the vehicles are given as follows:
1. UAVs
ẋi = cos φi , ẏi = sin φi , φ̇ = σi1 , i = 1, 2, 3, 4
2. Jammer
ẋi = cos φi , ẏi = sin φi , φ̇i = σi2 , i = 1, 2
UAVi is used to represent the ith UAV in the formation and UAVJi is used to
represent the ith jammer in the formation. Under appropriate assumptions of the
value function as discussed in Sect. 13.3, let J(x) represent the value at the point x
in the state space.
From Theorem 13.1, the expression for the optimal controls can be given as
follows:
σi1∗ = sign(Jφi ), i = 1, 2, 3
σ 2∗
= −sign(Jφ ) (13.28)
The RPE for the system leads to the following equations.
J˚xi = −σi1∗ Jyi , J˚yi = σi1∗ Jxi

J˚φi = −Jxi sin φi + Jyi cos φi
J˚x = −σ 2∗ Jy , J˚y = σ 2∗ Jx
J˚φ = −Jx sin φ + Jy cos φ (13.29)
where “ ˚ ” denotes derivative with respect to retrograde time. The Laplacian of the
adjacency matrix is given by the following:
⎡ ⎤
−(a12 + a13 + a14 ) a12 a13 a14
⎢ ⎥
⎢ a12 −(a12 + a23 + a24 ) a23 a24 ⎥
L(G) = ⎢ ⎥
⎣ a13 a23 −(a13 + a23 + a34 ) a34 ⎦
a14 a24 a34 −(a14 + a24 + a34 )
From the above form of the Laplacian, we obtain the following expression for
F(λ , x).
F(λ , G) = λ 4 + 2λ 3 (a12 + a13 + a14 + a23 + a24 + a34) + λ 2 (3a12 b + 3a12a14

+ 3a12a23 + 3a12a24 + 4a12a34 + 3a13a14 + 3a13a23 + 4a13a24 + 3a13a34
+ 4a14a23 + 3a14a24 + 3a14a34 + 3a23a24 + 3a23a34 + 3a24a34 )
Table 13.2 Table shows the a12 a13 a14 a23 a24 a34 Fλ (0, x)
value of Fλ (0.G) for all
possible combinations of 0 0 0 0 0 0 0
(a12 , a13 , a23 , a24 , a34 ) 1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
1 1 0 0 0 0 0
1 0 1 0 0 0 0
1 0 0 1 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
0 1 1 0 0 0 0
0 1 0 1 0 0 0
0 1 0 0 1 0 0
0 1 0 0 0 1 0
0 0 1 1 0 0 0
0 0 1 0 1 0 0
0 0 1 0 0 1 0
0 0 0 1 1 0 0
0 0 0 1 0 1 0
0 0 0 0 1 1 0
0 0 0 1 1 1 0
1 1 1 0 0 1 0
1 0 1 0 1 0 0
1 1 0 1 0 0 0
+ 4λ (a12a13 a14 + a12a13 a24 + a12a13 a34 + a12a14 a23 + a12 a14 a34
+ a12 a23 a24 + a12 a23 a34 + a12 a24 a34 + a13 a14 a23 + a13 114 a24 + a13a23 a24
+ a13 a24 a34 + a14a23 a24 + a14a23 a34 + a14a24 a34 )
Evaluating the derivative of the above expression at λ = 0, we obtain the following

equation:
Fλ (0, G) = 4(a12 a13 a14 + a12a13 a24 + a12a13 a34 + a12 a14 a23 + a12a14 a34
+a12 a23 a24 + a12 a23 a34 + a12a24 a34 + a13a14 a23 + a13a14 a24
+a13 a23 a24 + a13 a24 a34 + a14a23 a24 + a14a23 a34 + a14a24 a34 ) = 0
Table 13.2 enumerates all the combinations of (a12 , a13 , a14 , a23 , a24 , a34 ) for
which Fλ (0, G) = 0. From these values of the edge variables, we can construct all
the graphs that are disconnected. Figure 13.6 shows all the equivalence classes of
disconnected graphs under the equivalence relation of isomorphism.
(0,0,0,0,0,0) (1,0,0,0,0,0) (1,1,0,0,0,0) (1,0,0,0,0,1) (0,0,0,1,1,1)
(2) (3) (4) (5)

(1)
Fig. 13.6 All the equivalence classes of graphs that are disconnected under the equivalence
relation of isomorphism
1 2 2
3 4
3
single-pixel camera
Now we consider a situation in which the network graph G has the edge structure
as shown in Fig. 13.6(4) at termination time. In order to attain the target network
structure, at least one of the jammers has to lie in the shaded region shown in
Fig. 13.7. Let R = (D1η r12 ∩ D2η r12 ∩ D3η r34 ∩ D4η r34 )/(D1η r13 ∪ D2η r24 ∪ D3η r13 ∪ D4η r24 ).
Consider a termination situation in which UAVJ1 ∈ ∂ D2η r12 ∩ ∂ R and UAVJ2 ∈ /
(D1η r12 ∪ D2η r12 ∪ D3η r34 ∪ D4η r34 ). In other words, only UAVJ1 is responsible for
disconnecting the communication network.
The terminal manifold is characterized by the following equation:

(x1 − x2 )2 + (y1 − y2 )2 = r (13.30)
This is a 17-dimensional manifold in an 18-dimensional state space. Therefore,

we can characterize the manifold by using seventeen independent variables. We
let x1 , y1 , x2 , y2 , x3 , y3 , x4 , y4 , x2 , y2 , φ1 , φ2 , φ3 , φ4 , φ1 , φ2 and x1 represent the indepen-
dent variables. Let J 0 represent the value of the game at the termination manifold
given by Eq. (13.30). Since J 0 ≡ 0 on the terminal manifold, ∇J satisfies the

following equations at an interior point in the manifold:
∂ y1 ∂ y1 ∂ y1
Jx02 + Jy0 = 0, Jy02 + Jy0 = 0, Jx0 + Jy0 = 0, Jφ0 = 0
1 ∂ x2 1 ∂ y2 1 1 ∂ x 2
1
Jφ03 = 0, Jφ0 = 0, Jx01 = 0, Jy01 = 0, Jφ01 = 0, Jx04 = 0, Jy04 = 0, Jx0 = 0

1 2
Jφ02 = 0, Jx03 = 0, Jy03 = 0, Jφ0 = 0, Jx03 = 0, Jy03 = 0, Jφ04 = 0, Jy0 =0

2 2
(13.31)
From (13.30), we obtain the following expressions for the derivatives of the
dependent variable y1 :
∂ y1 x − x2
= 1
∂ x2 r2 − (x1 − x3 )2
∂ y1
=1
∂ y2
∂ y1 x − x2
= 1 (13.32)
∂ x1 r2 − (x1 − x2 )2
Substituting the above values of the gradients in Eq. (13.11), we get the following
expression for Jy0 :
1
1
Jy0 = ∂ y ∂ y ∂ y
(13.33)
1
cos φ2 ∂ x1 + sin φ2 ∂ y1 + cos φ1 ∂ x1 − sin φ1
2 2 1
Substituting (13.32) in (13.33), we obtain the following expression for Jy0 :

1
1
Jy0 = x1 −x2
(13.34)
1
√2 (cos φ2 + cos φ1 ) + (sin φ2 − sin φ1 )
r −(x1 −x2 )2
Figure 13.8 shows a simulation of the trajectories of the UAVs and the jammers
from a terminal state. The final states (x, y, φ ) of the four UAVs are given by
(40, 10, 0), (20, 20, 0.14), (20, −20, −0.25) and (40, −10, −0.55). The final states
of the four jammers are given by (30, 0, 0.15) and (10, 0, −0.17). The figure on
the left shows the trajectories of the UAVs. The two UAVs on the extreme right
represent the jammers. The figure on the right shows the connectivity of the UAVs.
The network of UAVs is initally connected. At termination, the jammers disconnect
the network into two disjoint components.
Fig. 13.8 Trajectories of the UAVs and Jammer
13.6 Conclusions
In this work, we have investigated the effect of an aerial jamming attack on

the communication network of a team of UAVs flying in a formation. We have
introduced a communication and motion model for the UAVs. The communication
model provided a relation in the spatial domain for effective jamming by an aerial
intruder. We have analysed a scenario in which multiple aerial intruders are trying
to jam a communication channel among UAVs flying in a formation. We have
formulated the problem of jamming as a zero-sum pursuit-evasion game, and
analyzed it in the framework of differential game theory. We have used Isaacs’
approach to compute the saddle-point strategies of the UAVs as well as the jammers
and used tools from algebraic graph theory to characterize the termination manifold.
Finally, we have provided simulation results for two scenarios involving different
number of UAVs and jammers.
Future work will involve a simulation study on the construction of globally
optimal trajectories. This would also require analysis and construction of singular
surfaces.
References
1. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd ed. SIAM Series in
Classics in Applied Mathematics, Philadelphia (1999)
2. Bhattacharya, S., Başar, T.: Game-theoretic analysis of an aerial jamming attack on a UAV
communication network. In: Proceedings of the American Control Conference, pp. 818–823.
Baltimore, Maryland (2010)
3. Bhattacharya, S., Başar, T.: Graph-theoretic approach to connectivity maintenance in mobile
networks in the presence of a jammer. In: Proceedings of the IEEE Conference on Decision
and Control, pp. 3560–3565. Atlanta (2010)
4. Bhattacharya, S., Başar, T.: Optimal strategies to evade jamming in heterogeneous mobile
networks. Autonomous Robots, 31(4), 367–381 (2011)
5. Biggs, N.: Algebraic Graph Theory. Cambridge University Press, Cambridge, U.K., 1993.
6. Breakwell, J.V., Hagedorn, P.: Point capture of two evaders in succession. J. Optim. Theory
Appl. 27, 89–97 (1979)
7. Cagalj, M., Capcun, S., Hubaux, J.P.: Wormhole-based anti-jamming techniques in sensor
networks. IEEE Trans. Mobile Comput. 6, 100–114 (2007)
8. Chen, L.: On selfish and malicious behaviours in wireless networks – a non-cooperative game
theoretic approach. PhD Thesis. Ecole Nationale Superieure des Telecommunications, Paris
(2008)
9. Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton–Jacobi equations. Trans. Am.
Math. Soc. 277(1), 1–42 (1976)
10. Ding, X.C., Rahmani, A., Egerstedt, M.: Optimal multi-UAV convoy protection. Conf. Robot
Commun. Config. 9, 1–6 (2009)
11. Elliott, R.J., Kalton, N.J.: Boundary value problems for nonlinear partial differential operators.
J. Math. Anal. Appl. 46, 228–241 (1974)
12. Elliott, R.J., Kalton, N.J.: Cauchy problems for certain Isaacs–Bellman equations and games
of survival. Trans. Am. Math. Soc. 198, 45–72 (1974)
13. Evans, L.C., Souganidis, P.E.: Differential games and representation formulas for solutions of
Hamilton–Jacobi–Isaacs equations. Indiana Univ. Math. J. 33(5), 773–797 (1984)
14. Fax, J.A., Murray, R.M.: Information flow and cooperative control of vehicle formations. IEEE
15. Fleming, W.: The Cauchy problem for degenerate parabolic equations. J. Math. Mech. 13,
987–1008 (1964)
16. Fleming, W.H.: The convergence problem for differential games. Ann. Math. 52, 195–210
(1964) [Princeton University Press, Princeton]
17. Friedman, A.: Differential Games. Wiley, New York (1971)
18. Hagedorn, P., Breakwell, J.V.: A differential game of approach with two pursuers and one
evader. J. Optim. Theory Appl. 18, 15–29 (1976)
19. Isaacs, R.: Differential Games. Wiley, New York (1965)
20. Karp, R.: Reducibility among combinatorial problems. Complexity of Computer Computa-
tions, pp. 85–103. Plenum, New York (1972)
21. Krassovski, N., Subbottin, A.: Jeux Differentiels. Mir Press, Moscow (1977)
22. Levchenkov, A.Y., Pashkov, A.G.: Differential game of optimal approach of two inertial
pursuers to a noninertial evader. J. Optim. Theory Appl. 65, 501–518 (1990)
23. Lewin, J.: Differential Games: Theory and Methods for Solving Game Problems with Singular
Surfaces. Springer, London (1994)
24. Lions, P.L.: Generalized Solutions of Hamilton–Jacobi Equations. Pitman, Boston (1982)
25. Mesbahi, M.: On state-dependent dynamic graphs and their controllability properties. IEEE
26. Mitchell, I.M., Tomlin, C.J.: Overapproximating reachable sets by Hamilton–Jacobi projec-
tions. J. Sci. Comput. 19, 323–346 (2003)
27. Noubir, G., Lin, G.: Low power denial of service attacks in data wireless lans and counter-
measures, ACM SIGMOBILE Mobile Computing and Communications Review, 7(3), 29–30
(2003)
28. Olfati-Saber, R., Murray, R.M.: Consensus problems in networks of agents with switching
topology and time delay. IEEE Trans. Autom. Control, 49(9), 1520–1533 (2004)
29. Papadimitratos, P., Haas, Z.J.: Secure routing for mobile ad hoc networks In: Proceedings of
SCS Communication Networks and Distributed Systems Modeling and Simulation Conference
(CNDS 2002), San Antonio, Texas, 1, 27–31 (2002)
30. Pashkov, A.G., Terekhov, S.D.: A differential game of approach with two pursuers and one
evader. J. Optim. Theory Appl. 55, 303–311 (1987)
31. Pavone, M., Savla, K., Frazzoli, E.: Sharing the load: mobile robotic networks in dynamic
environments. IEEE Robot. Autom. Mag. 16, 52–61 (2009)
32. Poisel, R.A.: Modern Communication Jamming Principles and Techniques. Artech, Norwood
(2004)
33. Proakis, J.J., Salehi, M.: Digital Communications. McGraw-Hill, New York (2007)
34. Samad, T., Bay, J.S., Godbole, D.: Network-centric systems for military operations in urban
terrian: the role of UAVs. Proc. IEEE, 95(1), 92–107 (2007)
35. Shankaran, S., Stipanović, D., Tomlin, C.: Collision avoidance strategies for a three player
game. In: Preprints of the 9th International Symposium on Dynamic Games and Applications.
Wroclaw, Poland (2008)
36. Souganidis, P.E.: Approximation schemes for viscosity solutions of Hamilton–Jacobi equa-
tions. PhD Thesis. University of Wisconsin-Madison (1983)
37. Starr, A.W., Ho, Y.C.: Nonzero-sum differential games. J. Optim. Theory Appl. 3, 184–206
(1969)
38. Stipanović, D.M., Hwang, I., Tomlin, C.J.: Computation of an over-approximation of the
backward reachable set using subsystem level set functions. In: Dynamics of Continuous,
Discrete and Impulsive Systems, Series A: Mathematical Analysis, 11, 399–411 (2004)
39. Stipanović, D.M., Shankaran, S., Tomlin, C.: Strategies for agents in multi-player pursuit-
evasion games. In: Preprints of the Eleventh International Symposium on Dynamic Games
and Applications. Tucson, Arizona (2004)
40. Stipanović, D.M., Melikyan, A.A., Hovakimyan, N.V.: Some sufficient conditions for multi-
player pursuit evasion games with continuous and discrete observations. Ann. Int. Soc. Dyn.
Games 10, 133–145 (2009)
41. Subbottin, A.: A generalization of the basic equation of the theory of differential games. Soviet
Math. Doklady 22, 358–362 (1980)
42. Tanner, A.J.H., Jadbabaie, A., Pappas, G.: Flocking in fixed and switching networks. IEEE
43. Tisdale, J., Kim, Z., Hedrick, J.: Autonomous UAV path planning and estimation. IEEE Robot.
Autom. Mag. 16, 35–42 (2009)
44. Vaisbord, E.M., Zhukovskiy, V.I.: Introduction to Multi-player Differential Games and Their
Applications. Gordon and Breach, New York (1988)
45. Wood, A.D., Stankovic, J.A., Son, S.H.: Jam: a jammed-area mapping service for sensor
networks. In: Proceedings of 24th IEEE Real-Time Systems Symposium (RTSS 03), pp. 286–
297, December 2003
46. Wood, A.D., Stankovic, J.A., Zhou, G.: Deejam: defeating energy-efficient jamming in IEEE
802.15.4 basedwireless networks. In: Proceedings of 4th Annual IEEE Conference on Sensor,
Mesh and Ad Hoc Communications and Networks (SECON 07), pp. 60–69 (2007)
47. Wu, X., Wood, T., Trappe, W., Zhang, Y.: Channel surfing and spatial retreats: defenses against
wireless denial of service. In: Proceedings of 3rd ACM Workshop on Wireless Security (WiSe
04), pp. 80–89 (2004)
48. Zhukovskiy, V.I., Salukvadze, M.E.: The Vector Valued Maxmin, vol. 193. Academic,
San Diego (1994)
Chapter 14
Study of Linear Game with Two Pursuers
and One Evader: Different Strength of Pursuers
Sergey A. Ganebny, Sergey S. Kumkov, Stéphane Le Ménec,

and Valerii S. Patsko
Abstract The paper deals with a problem of pursuit-evasion with two pursuers and
one evader having linear dynamics. The pursuers try to minimize the final miss
(an ideal situation is to get exact capture), the evader counteracts them. Results of
numerical construction of level sets (Lebesgue sets) of the value function are given.
A feedback method for producing optimal control is suggested. The paper includes
also numerical simulations of optimal motions of the objects in various situations.
Keywords Game theory • Differential games • Group pursuit-evasion games

• Maximal stable bridges • Numerical schemes for differential games
14.1 Introduction
Group pursuit-evasion games (several pursuers and/or several evaders) are studied
intensively in the theory of differential games [2, 4, 6, 7, 11, 16, 17, 20].
From a general point of view, a group pursuit-evasion game (without any hi-
erarchy among players) can be often treated as an antagonistic differential game
where all pursuers are joined into one player, whose objective is to minimize some
functional, and, similarly, all evaders are joined into another player, who is the
opponent to the first one. The theory of differential games gives an existence theo-
rem for the value function of such a game. But, usually, any more concrete results
S.A. Ganebny • S.S. Kumkov • V.S. Patsko ()

Institute of Mathematics and Mechanics, Ural Branch of Russian
Academy of Sciences, Ekaterinburg, Russia
e-mail: [email protected]; [email protected]; [email protected]
S. Le Ménec
EADS/MBDA, Paris, France
270 S.A. Ganebny et al.
(for example, concerning effective construction of the value function) cannot be

obtained. This is due to high dimension of the state vector of the corresponding
game and absence of convexity of time sections of level sets (Lebesgue sets) of the
value function. Just these reasons can explain why group pursuit-evasion games are
very difficult and are usually investigated by means of specific methods and under
very strict assumptions.
In this paper, we consider a pursuit-evasion game with two pursuers and one
evader. Such a model formulation arises from analysis of an applied problem where
two aircrafts (or missiles) intercept another one in the horizontal plane. The pecu-
liarity of the game explored in the paper is that solvability sets (the sets wherefrom
the interception can be guaranteed with a miss which is not greater than some given
value) and optimal feedback controls of the objects can be built numerically in the
framework of a one-to-one antagonistic game. Such an investigation is the aim of
this paper.
The paper is based on the problem formulation suggested in [12, 13]. In these
works, a case is studied when each pursuer is “stronger” than the evader. In our
paper, we research the game without this assumption.
14.2 Formulation of Problem
In Fig. 14.1, one can see a possible initial location of the pursuers and evader when
they move towards each other. Also, the evader can move from both pursuers, or
from one of them, but towards another pursuer.
Let us assume that the initial velocities are parallel and quite large, and control
accelerations affect only lateral components of object velocities. Thus, one can
suppose that instants of passages of the evader by each of the pursuers are fixed.
Below, we call them termination instants and denote by T f 1 and T f 2 , respectively.
We consider both cases of equal and different termination instants. The players’
controls define the lateral deviations of the evader from the first and second pursuers
at the termination instants. Minimum of absolute values of these deviations is called
the resulting miss. The objective of the pursuers is minimization of the resulting
miss, the evader maximizes it. The pursuers generate their controls by a coordinated
effort (from one control center).
Fig. 14.1 Schematic initial

positions of the pursuers and
evader
14 Study of Linear Game with Two Pursuers and One Evader 271
In the relative linearized system, the dynamics is the following (see [12, 13]):
ÿ1 = −aP1 + aE , ÿ2 = −aP2 + aE ,

ȧP1 = (AP1 u1 − aP1)/lP1 , ȧP2 = (AP2 u2 − aP2)/lP2 ,
ȧE = (AE v − aE )/lE . (14.1)
Here, y1 and y2 are the current lateral deviations of the evader from the first and
second pursuers; aP1 , aP2 , aE are the lateral accelerations of the pursuers and evader;
u1 , u2 , v are the players’ command controls; AP1 , AP2 , AE are the maximal values
of the accelerations; lP1 , lP2 , lE are the time constants describing the inertiality of
servomechanisms.
The controls have bounded absolute values:
|u1 | ≤ 1, |u2 | ≤ 1, |v| ≤ 1.
The linearized dynamics of the objects in the problem under consideration is

typical (see, for example, [19]).
Consider new coordinates x1 and x2 which are the values of y1 and y2 forecasted
to the corresponding termination instants T f 1 and T f 2 under zero players’ controls.
One has
xi = yi + ẏi τi − aPilPi
2
h(τi /lPi ) + aE lE2 h(τi /lE ), i = 1, 2.
Here, xi , yi , aPi , and aE depend on t, and
τi = T f i − t, h(α ) = e−α + α − 1.
We have xi (T f i ) = yi (T f i ).
Passing to a new dynamics in “equivalent” coordinates x1 and x2 (see [12, 13]),
we obtain
ẋ1 = −AP1 lP1 h(τ1 /lP1 )u1 + AE lE h(τ1 /lE )v,

ẋ2 = −AP2 lP2 h(τ2 /lP2 )u2 + AE lE h(τ2 /lE )v. (14.2)
Join both pursuers P1 and P2 into one player which will be called the first player.
The evader E is the second player. The first player governs the controls u1 and u2 ;
the second one governs the control v. We introduce the following payoff functional:

ϕ x1 (T f 1 ), x2 (T f 2 ) = min x1 (T f 1 ), x2 (T f 2 ) . (14.3)
It is minimized by the first player and maximized by the second one. Thus, we get
a standard antagonistic game with dynamics (14.2) and payoff functional (14.3).
This game has [1, 8–10] the value function V (t, x), where x = (x1 , x2 ). For each
initial position (t0 , x0 ), the value V (t0 , x0 ) of the value function V equals the pay-
off guaranteed for the first (second) player by its optimal feedback control. Each
level set
Fig. 14.2 Various variants of the stable bridge evolution in an individual game

Wc = (t, x) : V (t, x) ≤ c
of the value function coincides with the maximal stable bridge (see [9, 10]) built
from the target set

Mc = (t, x) : t = T f 1 , |x1 | ≤ c ∪ (t, x) : t = T f 2 , |x2 | ≤ c .
The set Wc can be treated as the solvability set for the pursuit-evasion game with the
result c.
When c = 0, we have the situation of the exact capture. The exact capture implies
equality to zero of at least one of yi at the instant T f i , i = 1, 2.
The works [12, 13] consider only cases with the exact capture, and pursuers
“stronger” than the evader. The latter means that the parameters APi , AE , and lPi ,
lE (i = 1, 2) are such that the maximal stable bridges in the individual games (P1 vs.
E and P2 vs. E) grow monotonically in the backward time.
Considering individual games of each pursuer vs. the evader, one can introduce
parameters [18] μi = APi /AE and εi = lE /lPi . They and only they define the structure
of the maximal stable bridges in the individual games. Namely, depending on values
of μi and μi εi , there are four cases of the bridge evolution (see Fig. 14.2):
• Expansion in the backward time (a strong pursuer)
• Contraction in the backward time (a weak pursuer)
• Expansion of the bridge until some backward time instant and further contraction
• Contraction of the bridge until some backward time instant and further expansion
(if the bridge still has not broken).
Respectively, given combinations of pursuers’ capabilities and individual games
durations (equal/different), there are significant number of variants for the problem
with two pursuers and one evader. Some of them are considered below.
The main objective of this paper is to construct the sets Wc for typical cases
of the game under consideration. The difficulty of the problem is that the time
sections Wc (t) of these sets are non-convex. Constructions are made by means of
an algorithm for constructing maximal stable bridges worked out by the authors for
problems with two-dimensional state variable. The algorithm is similar to one used
in [15]. Another objective is to build optimal feedback controls of the first player
(that is, of the pursuers P1 and P2) and the second one (the evader E).
14.3 Idea of Numerical Method
As it was mentioned above, a level set Wc of the value function is the maximal
stable bridge for dynamics (14.2) built in the space t, x from the target set Mc . A time
section Wc (t) of the bridge Wc at the instant t is a set in the plane of two-dimensional
variable x.
To be definite, let T f 1 ≥ T f 2 . Then for any t ∈ (T f 2 , T f 1 ], the set Wc (t) is a vertical
strip around the axis x2 . Its width along the axis x1 equals the width of the bridge in
the individual game P1 vs. E at the instant τ = T f 1 − t of the backward time. At the
instant t = T f 1 , the half-width of Wc (T f 1 ) is equal to c.
Denote by Wc (T f 2 + 0) the right limit of the set Wc (t) as t → T f 2 + 0. Then the
set Wc (T f 2 ) is cross-like obtained by union of the vertical strip Wc (T f 2 + 0) and a
horizontal one around the axis x1 with the width equal 2c along the axis x2 .
When t ≤ T f 2 , the backward construction of the sets Wc (t) is made starting from
the set Wc (T f 2 ).
The algorithm which is suggested by the authors for constructing the appro-
ximating sets W c (t), uses a time grid in the interval [0, T f 1 ]: tN = T f 1 , tN−1 , . . . ,
tS = T f 2 , tS−1 ,tS−2 , . . . . For any instant tk from the taken grid, the set W c (tk ) is
built on the basis of the previous set W c (tk+1 ) and a dynamics obtained from (14.2)
by fixing its value at the instant tk+1 . So, dynamics (14.2) which varies in the
interval (tk ,tk+1 ] is changed by a dynamics with simple motions [8]. The set W c (tk )
is regarded as a collection of all positions at the instant tk , wherefrom the first player
guarantees guiding the system to the set W c (tk+1 ) under “frozen” dynamics (14.2)
and discrimination of the second player, that is, when the second player announces
its constant control v, |v| ≤ 1, in the interval [tk ,tk+1 ].
Due to symmetry of dynamics (14.2) and the sets Wc (T f 1 ), Wc (T f 2 ) with respect
to the origin, one gets that for any t ≤ T f 1 the time section Wc (t) is symmetric also.
Up to now, different workgroups suggested many algorithms for constructing
the value function in differential games of quite generic type (see, for example,
[3, 5, 14, 21]). The problem under consideration has linear dynamics and the second
order on the phase variable. Due to this, we use a specific method. This allows us to
make very fast computations of many variants of the game.
14.4 Strong Pursuers, Equal Termination Instants
Add dynamics (14.2) by a “cross-like” target set

Mc = {|x1 | ≤ c} ∪ {|x2| ≤ c}, c ≥ 0,
at the instant T f = T f 1 = T f 2 . Then we get a standard linear differential game

with fixed termination instant and non-convex target set. The collection {Wc } of
maximal stable bridges describes the value function of the game (14.2) with payoff
functional (14.3).
For the considered case of two stronger pursuers, choose the following
parameters:
AP1 = 2, AP2 = 3, AE = 1,
lP1 = 1/2, lP2 = 1/0.857, lE = 1,
T f 1 = T f 2 = 6.
1. Structure of maximal stable bridges. Figure 14.3 shows results of constructing

the set W = W0 (that is, with c = 0). In the figure, one can see several time
sections W (t) of this set. The bridge has a quite simple structure. At the initial
instant τ = 0 of the backward time (when t = 6), its section coincides with the
Fig. 14.3 Two strong pursuers, equal termination instants: time sections of the bridge W
target set M0 which is the union of two coordinate axes. Further, at the instants
t = 4, 2, 0, the cross thickens, and two triangles are added to it. The widths of
the vertical and horizontal parts of the cross correspond to sizes of the maximal
stable bridges in the individual games with the first and second pursuers. These
triangles are located in the II and IV quadrants (where the signs of x1 and x2 are
different, in other words, when the evader is between the pursuers) and give the
zone where the capture is possible only under collective actions of both pursuers
(trying to avoid one of the pursuer, the evader is captured by another one).
These additional triangles have a simple explanation from the point of view of
problem (14.1). Their hypotenuses have slope equal to 45◦ , that is, are described
by the equation |x1 | + |x2 | = const. Consider the instant τ when the hypotenuse
reaches a point (x1 , x2 ). It corresponds to the instant when the pursuers cover
together the distance |x1 (0)| + |x2 (0)| which is between them at the initial in-
stant t = 0. Therefore, at this instant, both pursuers come to the same point. Since
the evader was initially between the pursuers, it is captured at this instant.
The set W (maximal stable bridge) built in the coordinates of system (14.2)
coincides with the description of the solvability set obtained analytically in [12,
13]. The solvability set for system (14.1) is defined as follows: if in the current
position of system (14.1) at the instant t, the forecasted coordinates x1 , x2 are
inside the time section W (t), then under the controls u1 , u2 the motion is guided
to the target set M0 ; on the contrary, if the forecasted coordinates are outside the
set W (t), then there is an evader’s control v which deviates system (14.2) from
the target set. Therefore, there is no exact capture in the original system (14.1).
Time sections Wc (t) of other bridges Wc , c > 0, have the shape similar to W (t).
In Fig. 14.4, one can see the sections Wc (t) at t = 2 (τ = 4) for a collection {Wc }
corresponding to some series of values of the parameter c. For other instants t,
the structure of the sections Wc (t) is similar. The sets Wc (t) describe the value
function x → V (t, x).
2. Feedback control of the first player. Rewrite system (14.2) as
ẋ = D1 (t)u1 + D2 (t)u2 + E(t)v,

|u1 | ≤ 1, |u2 | ≤ 1, |v| ≤ 1.
Here, x = (x1 , x2 ); vectors D1 (t), D2 (t), and E(t) look like

D1 (t) = −AP1lP1 h((T f 1 − t)/lP1), 0 , D2 (t) = 0, −AP2 lP2 h((T f 2 − t)/lP2) ,

E(t) = AE lE h((T f 1 − t)/lE ), AE lE h((T f 2 − t)/lE ) .
We see that the vector D1 (t) (D2 (t)) is directed along the horizontal (vertical)
axis; when T f 1 = T f 2 , the angle between the axis x1 and the vector E(t) equals
45◦ ; when T f 1
= T f 2 , the angle changes in time.
Analyzing the change of the value function V along a horizontal line in the
plane x1 , x2 for a fixed instant t, one can conclude that the minimum of the
function is reached in the segment of intersection of this line and the set W (t).
Fig. 14.4 Two strong pursuers, equal termination instants: level sets of the value function, t = 2
With that, the function is monotonic at both sides of the segment. For points at
the right (at the left) from the segment, the control u1 = 1 (u1 = −1) directs the
vector D1 (t)u1 to the minimum.
Splitting the plane into horizontal lines and extracting for each line the seg-
ment of minimum of the value function, one can gather these segments into a set
in the plane and draw a switching line through this set which separates the plane
into two parts at the instant t. At the right from this switching line, we choose the
control u1 = 1, and at the left the control is u1 = −1. On the switching line, the
control u1 can be arbitrary obeying the constraint |u1 | ≤ 1. The easiest way is to
take the vertical axis x2 as the switching line.
In the same way, using the vector D2 (t), we can conclude that the horizontal
axis x1 can be taken as the switching line for the control u2 .
Thus,
⎧
⎪
⎪ 1, if xi > 0,
⎨
∗
ui (t, x) = −1, if xi < 0, (14.4)
⎪
⎪
⎩
any ui ∈ [−1, 1], if xi = 0.
The switching lines (the coordinate axes) at any t divide the plane x1 , x2 into
4 cells. In each of these cells, the optimal control (u∗1 , u∗2 ) of the first player is
constant.

The vector control u∗1 (t, x), u∗2 (t, x) is applied in a discrete scheme (see [9,
10]) with some time step Δ : a chosen control is kept constant during a time
step Δ . Then, on the basis of the new position, a new control is chosen, etc.
When Δ → 0, this control guarantees to the first player a result not greater
than V (t0 , x0 ) for any initial position (t0 , x0 ).
3. Feedback control of the second player. Now let us describe the optimal control
of the second player. When T f 1 = T f 2 , the vectogram E(t)v : v ∈ [−1, 1] of the
second player in system (14.2) is a segment parallel to the diagonal of I and III
quadrants. Thus, the second player can shift the system along this line only.
Using the sets Wc (t) at some instant t, let us analyze the change of the func-
tion x → V (t, x) along the lines parallel to this diagonal. Consider an arbitrary
line from this collection such that it passes through the II quadrant. One can
see that local minima are attained at points where the line crosses the axes Ox1
and Ox2 , and a local maximum is in the segment where the line passes through the
rectilinear diagonal part of the boundary of some level set of the value function.
The situation is similar for lines passing through the IV quadrant.
Thus, the switching lines for the second player’s control v can be constructed
from three parts: the axes Ox1 and Ox2 , and some slope line Π (t). The latter
has two half-lines passing through the middles of the diagonal parts on the level
set boundaries in the II and IV quadrants. In our case, when the position of the
system is on the switching line, the control v can take arbitrary values |v| ≤ 1.
Inside each of 6 cells, to which the plane is divided by the switching lines, the
control is taken either v = +1, or v = −1. Such a control pulls the system towards
the points of maximum. Applying this control in a discrete scheme with time
step Δ , the second player guarantees that the result will be not less than V (t0 , x0 )
for any initial position (t0 , x0 ) as Δ → 0.
Note. Since W (t) = ∅, then the global minimum of the function x → V (t, x) is
attained at any x ∈ W (t) and equal 0. Thus, when the position (t, x) of the system
is such that x ∈ W (t), the players can choose, generally speaking, any controls
under their constraints. If x ∈ / W (t), the choices should be made according to the
rules described above and based on the switching lines.
4. Optimal motions. In Fig. 14.5, one can see the results of optimal motion simula-
tions. This figure contains time sections W (t) (thin solid lines; the same sections
as in Fig. 14.3), switching lines Π (0) at the initial instant and Π (6) at the ter-
mination instant of the direct time (dotted lines), and two trajectories for two
different initial positions: ξI (t) (thick solid line) and ξII (t) (dashed line). The
motion ξI (t) starts from the point x01 = 40, x02 = −25 (marked by a black circle)
which is inside the initial section W (0) of the set W . So, the evader is captured:
the endpoint of the motion (also marked by a black circle) is at the origin. The
initial point of the motion ξII (t) has coordinates x01 = 25, x02 = −50 (marked by a
star). This position is outside the section W (0), and the evader escapes from the
exact capture: the endpoint of the motion (also marked by a star) has non-zero
coordinates.
Fig. 14.5 Two strong pursuers, equal termination instants: result of optimal motion simulation
Fig. 14.6 Two strong pursuers, equal termination instants: trajectories in the original space
Figure 14.6 gives the trajectories of the objects in the original space. Values of
longitudinal components of the velocities are taken such that the evader moves
towards the pursuers. For all simulations here and below, we take
y01 = −x01 , y02 = −x02 , ẏ01 = ẏ02 = 0, a0P1 = a0P2 = a0E = 0.
Solid lines correspond to the first case when the evader is successfully captured
(at the termination instant, the positions of both pursuers are the same as the po-
sition of the evader). Dashed lines show the case when the evader escapes: at the
termination instant no one of the pursuers superposes with the evader. In this case,
one can see that the evader aims itself to the middle between the terminal positions
of the pursuers (this guarantees the maximum of the payoff functional ϕ ).
14.5 Strong Pursuers, Different Termination Instants
Take the parameters as in the previous section, except the termination instants. Now
they are T f 1 = 7 and T f 2 = 5. Investigation results are shown in Figs. 14.7–14.9.
The maximal stable bridge W = W0 for system (14.2) with the taken target set
M0 = {t = T f 1 , x1 = 0} ∪ {t = T f 2 , x2 = 0}
is built in the following way. At the instant τ1 = 0 (that is, t = T f 1 ), the section of the
bridge coincides with the vertical axis x1 = 0. At the instant τ1 = 2 (that is, t = T f 2 ),
we add the horizontal axis x2 = 0 to the bridge expanded during passed time period.
Further, the time sections of the bridge are constructed using standard procedure
under relation τ2 = τ1 − 2.
In the same way, bridges Wc , c > 0, corresponding to the target sets
Mc = {t = T f 1 , |x1 | ≤ c} ∪ {t = T f 2 , |x2 | ≤ c}
can be built.
Fig. 14.7 Two strong pursuers, different termination instants: the bridge W and optimal motions
Fig. 14.8 Two strong pursuers, different termination instants: level sets of the value function, t = 2
Fig. 14.9 Two strong pursuers, different termination instants: trajectories in the original space
Results of construction of the set W are given in Fig. 14.7. When τ1 > 2, time
sections W (t) grow both horizontally and vertically; two additional triangles appear,
but now they are curvilinear. Analytical description of these curvilinear parts of the
boundary is difficult. Due to this, in [12, 13], there is only an upper estimation for
the solvability set for this variant of the game.
Total structure of the sections Wc (t) at t = 2 (τ1 = 5, τ2 = 3) is shown in Fig. 14.8.
Optimal feedback controls of the pursuers and evader are constructed in the same
way as in the previous example, except that the switch line Π (t) for the evader is
formed by the corner points of the additional curvilinear triangles of the sets Wc (t),
c ≥ 0.
In Fig. 14.7, the trajectory for the initial point x01 = 50, x02 = −25 is shown as a
solid line between two points marked by starts. The trajectories in the original space
are shown in Fig. 14.9. One can see that at the beginning the evader escapes from the
second pursuer and goes down, after that the evader’s control is changed to escape
from the first pursuer and the evader goes up.
14.6 Two Weak Pursuers, Different Termination Instants
Now we consider a variant of the game when both pursuers are weaker than the
evader. Let us take the parameters
AP1 = 0.9, AP2 = 0.8, AE = 1, lP1 = lP2 = 1/0.7, lE = 1,
and different termination instants T f 1 = 7, T f 2 = 5.

Since in this variant, the evader is more maneuverable than the pursuers, they
cannot guarantee the exact capture.
Fix some level of the miss, namely, x1 (T f 1 ) ≤ 2.0, x2 (T f 2 ) ≤ 2.0. Time sec-
tions W2.0 (t) of the corresponding maximal stable bridge are shown in Fig. 14.10.
The upper-left subfigure corresponds to the instant t = 7 when the first pursuer
stops to act. The upper-right subfigure shows the picture for the instant t = 5 when
the second pursuer finishes its pursuit. At this instant, the horizontal strip is added
which is a little wider than the vertical one contracted during the passed period
of the backward time. Then, the bridges contracts both in horizontal and vertical
directions, and two additional curvilinear triangles appear (see middle-left subfig-
ure). The middle-right subfigure gives the view of the section when the vertical strip
collapses, and the lower-left subfigure shows the configuration just after the collapse
of the horizontal strip. At this instant, the section loses connectivity and disjoins
into two parts symmetrical with respect to the origin. Further, these parts continue
to contract (as it can
be seen
in the lower-right subfigure) and finally disappear.
Time sections Wc (t) and corresponding switching lines of the first player are
given in Fig. 14.11 at the instant t = 0 (τ1 = 7, τ2 = 5). The dashed line is the
switching line for the component u1 ; the dotted one is for the component u2 . The
switching lines are obtained as a result of the analysis of the function x → V (t, x) in
horizontal (for u1 ) and vertical (for u2 ) lines. In some region around the origin, the
switching line for u1 (respectively, for u2 ) differs from the vertical (horizontal) axis.
If in the considered horizontal (vertical) line the minimum of the value function is
attained in a segment, then the middle of such a segment is taken as a point for the
switching line. Arrows show directions of components of the control in four cells.
Similarly, in Fig. 14.12, switching lines and optimal controls are displayed for the
second player. Here, the switching lines are drawn with thick solid lines. We have
four cells where the second player’s control is constant.
For simulations, let us take the initial position x01 = 12, x02 = −12 for system (14.2).
In Fig. 14.13, trajectories of the objects are shown in the original space. At the
Fig. 14.10 Two weak pursuers, different termination instants: time sections of the maximal stable
bridge W2.0
beginning of the pursuit, the evader closes to the first (lower) pursuer. It is done to
increase the miss from the second (upper) pursuer at the instant T f 2 . Further closing
is not reasonable, and the evader switches its control to increase the miss from the
first pursuer at the instant T f 1 .
Fig. 14.11 Two weak pursuers, different termination instants: switching lines and optimal controls
for the first player (the pursuers), t = 0
14.7 One Strong and One Weak Pursuers, Different

Termination Instants
Let us change the parameters of the second pursuer in the previous example and
take the following parameters of the game:
AP1 = 2, AP2 = 1, AE = 1, lP1 = 1/2, lP2 = 1/0.3, lE = 1.
Now the evader is more maneuverable than the second pursuer, and an exact capture
by this pursuer is unavailable. Assume T f 1 = 5, T f 2 = 7.
In Fig. 14.14, there are sections of the maximal stable bridge W5.0 (that is, for
c = 5.0) for six instants: t = 7.0, 5.0, 2.5, 1.4, 1.0, 0.0. The horizontal part of its
time section W5.0 (τ ) decreases with growth of τ , and breaks further. The vertical
part grows. Even after breaking the individual stable bridge of the second pursuer
(and respective collapse of the horizontal part of the cross), additional capture zones
still exist and are kept in time.
Fig. 14.12 Two weak pursuers, different termination instants: switching lines and optimal controls
for the second player (the evader), t = 0
Fig. 14.13 Two weak pursuers, different termination instants: trajectories of the objects in the
original space
Switching lines of the first and second players for the instant t = 1 are given in
Figs. 14.15 and 14.16. These lines are obtained by processing collection Wc (t = 1)
computed for different values of c. In comparison with the previous case of two
weak pursuers, the switching lines for the first player have simpler structure.
Fig. 14.14 One strong and one weak pursuers, different termination instants: time sections of the
maximal stable bridge W5.0
Here, as in the previous section, the trajectories of the objects are drawn in the
original space only (see Fig. 14.17). For simulations, the initial lateral deviations are
taken as x01 = 20, x02 = −20. Longitudinal components of the velocities are such that
the evader moves towards one pursuer, but from another.
Fig. 14.15 One strong and one weak pursuers, different termination instants: switching lines and
optimal controls for the first player (the pursuers), t = 1
14.8 Varying Advantage of Pursuers, Equal Termination

Instants
Another interesting case is when the pursuers have equal capabilities such that, at
the beginning of the backward time, the bridges in the individual games contract
and further expand. That is, at the beginning of the direct time, the pursuers have
advantage over the evader, but at the final stage the evader is stronger.
Parameters of the game are taken as follows:
AP1 = AP2 = 1.5, AE = 1, lP1 = lP2 = 1/0.3, lE = 1.
Termination instants are equal: T f 1 = T f 2 = 10.

In Fig. 14.18, time sections of the maximal stable bridge W1.5 built for c = 1.5
are shown for six instants: t = 10.0, 7.0, 5.7, 4.5, 1.3, 0.0. At the termination instant,
the terminal set is taken as a cross (the upper-left subfigure).
Fig. 14.16 One strong and one weak pursuers, different termination instants: switching lines and
optimal controls for the second player (the evader), t = 1
Fig. 14.17 One strong and one weak pursuers, different termination instants: trajectories of the
objects in the original space
At the beginning of backward time, the structure of the bridges is similar to

the case of two weak pursuers: widths of both vertical and horizontal strips of the
“cross” decreases, and two straight-linear additional triangles of joint capture zone
appear (the upper-right subfigure). Then at some instant, both strips collapse, and
Fig. 14.18 Varying advantage of the pursuers, equal termination instants: time sections of the
maximal stable bridge W1.5
only the triangles constitute the time section of the bridge (the central left subfigure).
Further, the triangles continue to contract, so they become two pentagons separated
by an empty space near the origin (the central right subfigure in Fig. 14.18). Trans-
formation to pentagons can be explained in the following way: the first player using
Fig. 14.19 Varying advantage of the pursuers, equal termination instants: switching lines and
optimal controls for the first player (the pursuers), t = 0
its controls expands the triangles vertically and horizontally, and the second player
contracts them in diagonal direction. So, vertical and horizontal edges appear, but
the diagonal part becomes shorter. Also, in general, size of each figure decreases
slowly.
Due to action of the second player, at some instant, the diagonal disappears, and
the pentagons convert to squares (this is not shown in Fig. 14.18). After that, the
pursuers take advantage, and total contraction is changed by growth: the squares
start to enlarge. When some time passes, due to the growth, the squares touch each
other at the origin (the lower-left subfigure in Fig. 14.18). Since the enlargement
continues, their sizes grow, and the squares start to overlap forming one “eight-like”
shape (the lower-right subfigure in Fig. 14.18).
Figures 14.19 and 14.20 show time sections of a collection of maximal stable
bridges and switching lines for the first and second players, respectively, for the
instant t = 0.
As above, the simulated trajectories are shown in the original space only. For
simulation, the following initial conditions are taken: x01 = 5, x02 = −20. Longitudi-
nal components of the velocities are such that the evader moves from both pursuers.
Fig. 14.20 Varying advantage of the pursuers, equal termination instants: switching lines and
optimal controls for the second player (the evader), t = 0
Fig. 14.21 Varying advantage of the pursuers, equal termination instants: trajectories of the
objects in the original space
The computed trajectories are given in Fig. 14.21. As it was said earlier, since at
the final stage of interception the pursuers are weaker than the evader, they cannot
guarantee the exact capture but only some non-zero level of the miss.
14.9 Conclusion
Presence of two pursuers acting together and minimizing the miss from the evader
leads to non-convexity of time sections of the value function when the situation
is considered as a standard antagonistic differential game where both pursuers are
joined into one player. In the paper, results of numerical study of this problem are
given for several variants of the parameters. The structure of the solution depends
on the presence or absence of dynamic advantage of one or both pursuers over
the evader. Optimal feedback control methods of the pursuers and evader are built
by preliminary construction and processing the level (Lebesgue) sets of the value
function (maximal stable bridges) for some quite fine grid of values of the payoff.
Switching lines obtained for each scalar component of controls depend on time,
and only they, not the level sets, are used for generating controls. Optimal controls
are produced at any current instant depending on the location of the state point
respectively to the switching lines at this instant. Accurate proof of the suggested
optimal control method needs for some additional study.
Acknowledgements This work was supported by Program of Presidium RAS “Dynamic Systems
and Control Theory” under financial support of UrB RAS (project no. 12-Π -1-1002) and also by
the Russian Foundation for Basic Research under grants nos. 10-01-96006 and 11-01-12088.
References
1. Bardi, M., Capuzzo-Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton–Jacobi–
Bellman Equations. Birkhauser, Boston (1997)
2. Blagodatskih, A.I., Petrov, N.N.: Conflict Interaction Controlled Objects Groups. Udmurt State
University, Izhevsk, Russia (2009). (in Russian)
3. Cardaliaguet, P., Quincampoix, M., Saint-Pierre, P.: Set-valued numerical analysis for optimal
control and differential games. In: Bardi, M., Raghavan, T.E., Parthasarathy, T. (eds.) Annals of
the International Society of Dynamic Games, vol. 4, pp. 177–247. Birkhauser, Boston (1999)
4. Chikrii, A.A.: Conflict-Controlled Processes, Mathematics and its Applications, vol. 405.
Kluwer Academic Publishers Group, Dordrecht (1997)
5. Cristiani, E., Falcone, M.: Fully-discrete schemes for the value function of pursuit-evasion
games with state constraints. In: Annals of the International Society of Dynamic Games, vol.
10: Advances in Dynamic Games and Applications, pp. 177–206. Birkhauser, Boston (2009)
6. Grigorenko, N.L.: The problem of pursuit by several objects. In: Differential Games—
Developments in Modelling and Computation (Espoo, 1990), Lecture Notes in Control and
Inform. Sci., vol. 156, pp. 71–80. Springer, Berlin (1991)
7. Hagedorn, P., Breakwell, J.V.: A differential game with two pursuers and one evader. J. Optim.
Theory Appl. 18(2), 15–29 (1976)
8. Isaacs, R.: Differential Games. Wiley, New York (1965)
9. Krasovskii, N.N., Subbotin, A.I.: Positional Differential Games. Nauka, Moscow (1974).
(in Russian)
10. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer-Verlag,
New York (1988)
11. Levchenkov, A.Y., Pashkov, A.G.: Differential game of optimal approach of two inertial
pursuers to a noninertial evader. J. Optim. Theory Appl. 65, 501–518 (1990)
12. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Abstracts of
13th International Symposium on Dynamic Games and Applications, pp. 149–151. Wroclaw
University of Technology, Wroclaw (2008)
13. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Annals of
the International Society of Dynamic Games, vol. 11: Advances in Dynamic Games and
Applications, pp. 209–226. Birkhauser, Boston (2011)
14. Mitchell, I.: Application of level set methods to control and reachability problems in continuous
and hybrid systems. PhD Thesis. Stanford University (2002)
15. Patsko, V.S., Turova, V.L.: Level sets of the value function in differential games with the
homicidal chauffeur dynamics. Int. Game Theory Rev. 3(1), 67–112 (2001)
16. Petrosjan L.A.: Differential Games of Pursuit, Leningrad University, Leningrad (1977).
(in Russian)
17. Pschenichnyi, B.N.: Simple pursuit by several objects. Kibernetika 3, 145–146 (1976).
(in Russian)
18. Shima, T., Shinar, J.: Time varying linear pursuit-evasion game models with bounded controls.
J. Guidance Control Dyn. 25(3), 425–432 (2002)
19. Shinar, J., Shima, T.: Non-orthodox guidance law development approach for the interception
of maneuvering anti-surface missiles. J. Guidance Control Dyn. 25(4), 658–666 (2002)
20. Stipanovic, D.M., Melikyan, A.A., Hovakimyan, N.: Some sufficient conditions for multiplayer
pursuit-evasion games with continuous and discrete observations. In: Bernhard, P., Gaitsgory,
V., Pourtallier, O. (eds.) Annals of the International Society of Dynamic Games, vol. 10:
Advances in Dynamic Games and Applications, pp. 133–145. Springer, Berlin (2009)
21. Taras’ev, A.M., Tokmantsev, T.B., Uspenskii, A.A., Ushakov, V.N.: On procedures for
constructing solutions in differential games on a finite interval of time. J. Math. Sci. 139(5),
6954–6975 (2006)
Chapter 15
Salvo Enhanced No Escape Zone
Stéphane Le Ménec
Abstract This project proposes an innovative algorithm for designing target

allocation strategies and missile guidance laws for air defense applications. This
algorithm is localized in the control station of the area to protect; i.e. runs in
a centralized manner. It has been optimized according to cooperative principles
in a way to increase the defense team performances, which are interception of
all threats and interception as soon as possible. Scenarios in naval and ground
context have been defined for performance analysis by comparison to a benchmark
target allocation policy. The cooperative target allocation algorithm is based on the
following features: No Escape Zones (differential game NEZ) computation to char-
acterize the defending missile capturability characteristics; In Flight (re) Allocation
(IFA algorithm, late committal guidance) capability to deal with target priority
management and pop up threats; capability to generate and counter alternative
target assumptions based on concurrent beliefs of future target behaviors, i.e. Salvo
Enhanced No Escape Zone (SENEZ) algorithm. The target trajectory generation has
been performed using goal oriented trajectory extrapolation techniques. The target
allocation procedure is based on minimax strategy computation in matrix games.
Keywords Game theory • Differential games • Minimax techniques • Guidance

systems • Co-operative control • Prediction methods • Missiles
15.1 Introduction
This research program has focused on the problem of naval-based air defense
systems which must defend against attacks from multiple targets. Modern anti-air
warfare systems, capable of tackling the most sophisticated anti ship missiles are
S. Le Ménec ()
EADS/MBDA, 1 Avenue Réaumur, 92 358 Le Plessis-Robinson Cedex, France
294 S. Le Ménec
based on homing missiles which employ inertial navigation with low frequency or
no command update during their mid course phase before becoming autonomous,
employing an active seeker for the terminal phase. Technology developments in the
field of modular data links may allow the creation of a multi-link communication
network to be established between anti-air missiles and the launch platform. The
future prospect of such ad hoc networks makes it possible to consider cooperative
strategies for missile guidance. Many existing guidance schemes are developed on
the basis of one-on-one engagements which are then optimized for many-on-many
scenarios [6, 8]. A priori allocation rules and natural missile dispersion can allow a
salvo of missiles to engage a swarm of targets; however, this does not always avoid
some targets leaking through the salvo, whilst other targets may experience overkill.
Cooperative guidance combines a number of guidance technology strands and these
have been studied as part of the research program underline:
• Prediction of the target behavior;
• Mid-course guidance to place the missile in position to acquire and engage the
target;
• Allocation/re-allocation processes based on estimated target behavior and NEZ;
• Terminal homing guidance to achieve an intercept.
In the terminal phase, guidance has been achieved by handover to the DGL
guidance law [16] based on the differential game theory [7]. Two approaches to
missile allocation have been considered. The first one relates to Earliest Interception
Geometry (EIG) concepts [15]. This article focus on the second one exploiting the
NEZ defined by the linear differential game (DGL) guidance law which either acts to
define an Allocation Before Launch (ABL) plan or refine an earlier plan to produce
an In-Flight Allocation (IFA) plan.
A statement of the problem is given in Sect. 15.2 SENEZ Concept. In Sect. 15.6
Matrix Game Target Allocation Algorithm details of pre-flight and in-flight allo-
cation planning are described. Missile guidance, both mid-course and terminal, is
discussed in Sect. 15.7 Guidance Logics. The simulation results from a Simulink
6DoF (6 degree of freedom) model are reviewed in Sect. 15.9 SENEZ Results.
Sections 15.10 SENEZ Perspective and 15.11 Conclusion are about the study
conclusions and some remarks concerning the exploitation of these cooperative
guidance technologies. Finally, Sect. 15.12 Acronyms and notations summarizes
the meaning of the abbreviations we use and the variables we have in various
mathematical formulas.
15.2 SENEZ Concept
There are occasions when the weapon system policy for defending against threats
involves firing two or more missiles at the same target. Without any action taken,
the missiles will naturally disperse en-route to the target, each arriving at the
point of homing with a slightly different geometry. In such a case, there will be
15 Salvo Enhanced No Escape Zone 295
Fig. 15.1 Example of multi-shoot in the SENEZ firing policy; we optimize the management of
missile salvos (target allocation process and the guidance laws) to cover uncertainties about the
evolution of targets; indeed, at the beginning of the flight the target positions are updated at a low
cadence; the missile control systems are provided with target measurements at high data rate only
after target acquisition by their on-board sensor; the missile seekers with limited range are depicted
by blue cones
a significant overlap of the NEZ. A SENEZ was introduced to optimized this type
of engagement, with the cooperating missiles increasing their chances of at least one
missile intercepting the target (Fig. 15.1).
In the naval or ground application, it is often the case that a number of assets
may be situated in close vicinity to each other. In this situation, it may be difficult
to predict which asset an inbound threat is targeting. In the case of air-to-air
engagements, there are various break manoeuvres which a target aircraft could
execute to avoid an interceptor. These paths can be partitioned into a small number
of bundles determined by the number of missiles in the salvo.
By selecting well chosen geometric paths it should be possible to direct the
defending missiles in such a way that each partition of the possible target trajectory
bundles falls within the NEZ of at least one missile. Consider a naval case
of a two missile salvo, and a threat that is initially heading straight towards
the launch vessel; there is a possibility that the threat may break left or right
at some point. One defending missile can be directed to cover the break right
and straight-on possibilities; the second missile would defend against the break
left and straight-on possibilities. By guiding to bundle partitions prior to the start
of homing, the NEZ of the firing is enhanced. At least one of the missiles will be
296 S. Le Ménec
able to intercept the target. This SENEZ firing policy differs from the more standard
shoot–look–shoot policy which considers the sequential firing of missiles where a
kill assessment is performed before firing each new missile launch.
15.3 Goal Oriented Target Prediction
Different approaches have been studied to predict target positions [14]. Results
detailed in the following are based on the version implementing the goal oriented
approach; which is based on the hypothesis that the target will guide to a goal.
The target trajectories have been classified into three categories: threats coming
from the left (with respect to the objective), from the front and from the right
(Fig. 15.2). We generate these three assumption target trajectories defining one way-
point per trajectory class. We compute the trajectories that lead to the threat object
passing by the way-points using Trajectory Shaping Guidance (TSG) [17]. The basic
TSG is similar to PN (Proportional Navigation) with a constraint on the final Line-
Of-Sight (LOS) angle in addition. This means that near impact, the LOS angle λ
equals a desired value λF . A 3D version of this law is applied from the threat’s
initial position to the way-point. When the way-point is reached, a switch is made
from TSG to standard PN to guide on the objective. The LOS final angle of the TSG
law is chosen to bring the threat aiming directly at the objective when it reaches
the way-point. Figure 15.3 illustrates how assumption target trajectories have been
generated.
A set of three way-points per target is defined using polar parameters (angle ψwpt
and radius Rwpt ). All way-points belong to a circle of radius Rwpt centered on the
supposed objective. Way-points are then spread with ψwpt as an angular gap, using
Fig. 15.2 Trajectory classification using three classes

Fig. 15.3 2D target trajectory generation using way-points, TSG and PN as terminal homing
guidance
Fig. 15.4 Way-points geometry for target trajectory generation
the initial objective-threat line as a symmetry line defined at RADAR detection.

In this way there is one trajectory per hypothesis, as seen in Fig. 15.4.
Way-points are defined for each target depending on its position at the time it
is detected. To avoid a high disturbance of the defending missiles guidance, it is
298 S. Le Ménec
Fig. 15.5 Evolution of the engagement; way-points do not move; way-points trajectories become
impossible
assumed that these way-points do not change as the engagement evolves. Some
hypotheses will become progressively less likely to be true and others appear to
be a good approximation of reality. In due course, some hypotheses will become
unachievable and will be discarded during the cost computation process (Fig. 15.5).
The SENEZ target allocation algorithm is in charge of evaluating all missile-target-
hypothesis engagements [9, 11]. This means the algorithm must be able to tell for
each case if successful interceptions are possible and to give a cost on a scale that
enables comparisons.
Usage of the following letters is now reserved:
• W is the number of way-points considered;
• N is the number of defensive missile that can be allocated to a target (i.e. that are
not already locked on a target, or destroyed);
• P is the number of active and detected threats.
We will now use the following notation to name engagements (i.e. guidance
hypotheses):
Mi T j Hk (15.1)
Table 15.1 Costs over target trajectory alternatives and defending missile beliefs
This means we are talking of the engagement of Missile i (1 ≤ i ≤ N) against Target

j (1 ≤ j ≤ P), assuming it is behaving as described by hypothesis k (1 ≤ k ≤ W ).
Another useful notation is on the other hand:
T j Hk (15.2)
This is used to name what the target does (in this case target j is following
hypothesis k). Based on the assumption that the target and missile may guide in
three different ways H1 , H2 and H3 , a three by three matrix leading to nine costs can
be presented in the following manner (Table 15.1). As the number of missiles and
targets increase, the size of the matrix will grow accordingly.
15.4 No Escape Zone
The target allocation algorithms developed during this study require the evaluation
of many tentative engagements, considering both various target behavior assump-
tions and different defending missile assignments. The mid course trajectories are
extrapolated using simulation models. However, after seeker acquisition, NEZ are
used for the homing engagement kinematic evaluation. The well known classical
DGL1 model [16] is used, except that time varying control bounds are considered
to account for defending missile drag. Acceleration control bounds have been
computed in accordance with 6DoF simulation runs. The time derivative of the
standard DGL1 NEZ boundary, as a function of the normalized time to go θ is
given by:

dZlimit θ
(θ ) = τP2 aE max ε0 h (θ ) − μ0 h (15.3)
dθ ε0
tgo
θ = (15.4)
τP
tgo = tF − t (15.5)
−α
h(α ) = e +α −1 (15.6)
300 S. Le Ménec
aP max
μ0 = (15.7)
aE max
τE
ε0 = (15.8)
τP

θ
Z(θ ) = y + ẏ τ − ÿP τP2 h(θ ) + ÿE τE2 h (15.9)
ε0
where y is the perpendicular miss respect to the initial Line Of Sight (LOS) direction
and ẏ is the first order time derivative (perpendicular velocity). ÿP and ÿE are
respectively the missile and target components of the acceleration perpendicular
to the initial LOS. Z is the ZEM (Zero Effort Miss), which is a well known concept
in missile guidance [17]. aP max is the maximum missile acceleration. aE max is the
maximum target acceleration. τP and τE are respectively the pursuer and the evader
time constants. μ0 is the pursuer to evader maneuvering ratio and ε0 the ratio of time
constants (evader to pursuer). t is the regular forward time and tF is the final time;
fixed terminal time defined by longitudinal (along the initial LOS) missile target
range equal to 0 (see [16] for more details).
By integration in backward time of Eq. (15.3) with initial condition Z (θ = 0) =
0, the NEZ limits can be computed as described by the upper and lower symmetric
boundaries of Fig. 15.6. Then, a simple model for the maneuverability has been
introduced as a linear function of θ .
μ (θ ) = μ0 + ν θ (ν ≤ 0) (15.10)
The meaning of this equation is that μ which is the ratio of the maximum
Pursuer acceleration aP max (θ ) over the maximum Evader acceleration aE max (θ )
is increasing as the time to go θ decreases (the missile gets nearer to the target; μ0
value of μ at t = tF ). If thinking about vehicle’s manoeuvring drag, then we assume
that this phenomenon has more impact on the Evader than on the Pursuer. After
integration of this equation, one obtains the new NEZ limits (upper positive limit,
the negative one is symmetric):

θ3 θ2
Zlimit (θ ) = Zlimit (μ0 , ε0 ) (θ ) + ν τP2 aE max − − (θ + 1) e−θ + 1 (15.11)
3 2
The term Zlimit (μ0 , ε0 ) on the left side is the standard DGL1 bound. The term on the
right side is the correcting term we obtain due to the linear variation of μ . When
ν < 0; this term actually closes the NEZ at a certain time as shown in Fig. 15.6.
There still exist other refinements for considering non constant velocity profiles
using DGL1 kinematics [12]. When running 3D Simulink simulations, the attain-
ability calculus is performed by considering two orthogonal NEZs, associated to the
horizontal and to the vertical planes.
Fig. 15.6 Example of NEZ with variable μ
15.5 Cost Computations
Costs are computed through trajectory extrapolations and NEZ considerations.

Target trajectories are extrapolated as explained previously using TSG and PN.
Missile trajectories are extrapolated using PN guidance on a Predicted Interception
Point (PIP); however, other mid-course guidance laws such as DGGL [13] can be
considered. Coordinates of this point are computed using the time to go:
RMT
tgo = (15.12)
Vc
Where RMT is the missile-target distance and Vc the closing velocity. Then, for any
time t of the trajectory:
XY ZPIP (t) = XY ZT (t + tgo ) (15.13)

302 S. Le Ménec
Where XY ZT are the target coordinates, in inertial frame. The PIP is assumed
to have both its velocity and acceleration equal to 0. For every time sample of
the target’s trajectory, the PIP coordinates are calculated, then the PN command
of the missile and finally integrating this command generates the missile states
at next sample time. For initial extrapolations, i.e. when missiles are not already
in flight, it is assumed that their velocity vector is aimed directly at the way-
point of the hypothesis chosen. This is also used in the model when actually
shooting missiles. PN on PIP objective makes use of the assumed knowledge of the
target’s behavior and allows the SENEZ target allocation algorithm to launch several
defending missiles against the same real threat following different mid course paths.
The SENEZ principle is indeed to shoot multiple missiles to anticipate target’s
behavior such as doglegs, and new target detections. Once missile trajectories have
been computed, the costs are evaluated. The NEZ concept is applied as well as
a modeling of the field of view of the missile’s seeker. Two zones are defined;
the first zone determines if a target can be locked by the seeker (information); the
second zone determines if the target can be intercepted (attainability). The cost is
simply the relative time when the target enters the intersection of both zones. If it
never happens, the cost value is infinite. If the threat is already in both zones at the
first sample time, the cost is zero.
When guiding on a hypothesis such as M1 T1 H1 , it is supposed that the seeker
always looks at the predicted position of threat T1 , hypothesis H1 . This gives at every
sample time the aiming direction of the seeker. This seeker direction is tested against
all other hypotheses to check if a target is within the field of view at this sample
time. If positive, an interception test using the NEZ evaluates whether interception
is possible. As soon as a target enters the field of view and becomes reachable for a
hypothesis, the cost is updated to the trajectory’s current time. The cost computation
concludes when all costs, i.e. of all hypotheses, have been computed, or when the
last trajectory sample has been reached.
This cost logic has been chosen because of the following:
• It takes into account what the missile can or cannot lock on (seeker cone).
• It takes into account the missile’s ability to reach the threats (NEZ).
• In most cases, it can be assumed that low costs imply short interception times.
15.6 Matrix Game Target Allocation Algorithm
After costs have been computed, the algorithm has to find the best possible
allocation plan. This means we need to construct allocation plans and combine
costs. The overall criterion for allocation plan discrimination is about minimizing
the time to intercept all the threats which is likely equivalent to maximize the range
between the area to protect and the closest location of threat interception. Consider
the following illustrative example. One threat T1 attacks one objective, with three
possible hypotheses H1 , H2 and H3 . Two missiles M1 and M2 are allocated to
Table 15.2 Allocation plan

cost matrix T1 H1 T1 H2 T1 H3 Ci, j
M1 T1 H1 − M2 T1 H2 1.5 5.2 1.0 5.2
M1 T1 H1 − M2 T1 H3 1.5 1.8 1.0 1.8
M1 T1 H2 − M2 T1 H1 2.1 5.5 1.2 5.5
M1 T1 H2 − M2 T1 H3 ∞ 1.8 1.0 ∞
M1 T1 H3 − M2 T1 H1 2.1 1.8 1.5 2.1
M1 T1 H3 − M2 T1 H2 ∞ 1.8 1.0 ∞
this target. First, it is necessary to determine the possible combinations, excluding

options where the two missiles cover the same target hypothesis. We also compute
the cost matrix of each missile as described in Sects. 15.3/15.5. Remember that low
cost values imply hopefully early interceptions. Infinite values mean interception is
not possible.
Using combinations of min max operators we construct the whole problem’s cost
matrix (Table 15.2) and advice the best one (mini max game equilibrium, [1]). The
best allocation plan (Ci∗, j∗ ) is the plan that minimizes the cost value whatever is the
target trajectory.
min (Ci, j ) = min (max (min (CM1 T1 Hi | T1 Hk , CM2 T1 H j | T1 Hk ))) (15.14)

i, j i, j k
Where i, j are target way-point beliefs defining the defending missile strategies
(mid course trajectories) and k is the way-point number defining the threat strategies
(trajectories).
The best allocation plan of this simple case is thus M1 T1 H1 − M2 T1 H3 , (i∗ = 1;
∗
j = 3) which means guiding M1 based on hypothesis H1 of T1 and M2 on
hypothesis H3 of the same target. By playing this plan, the second hypothesis is
covered with a satisfactory cost of 1.8, and no additional missile is needed.
This algorithm could also be used to optimize the number of missile to be
involved. i.e. if no satisfactory solution exists, i.e. if the costs are higher than a
threshold, the procedure can re-start with an additional missile, three missiles in
this case.
The same principle applies when there are more than two missiles, and more
than one target (the SENEZ algorithm has been written and evaluated in general
scenarios). The mathematical formula for the construction and optimization of
allocation plans cost matrix then becomes as follow:
find(A, B) | min CA, B = min (max (min (CMk TA(k) HB(k) | Ti H j ))) (15.15)
A, B A, B i, j k
where
• k is the missile number (between 1 and N; maximum number of defending
missile).
• A (k) is the index of the allocated target (to missile k).
304 S. Le Ménec
• B (k) is the index of the hypothesis used for target A (k).

• i (1 ≤ i ≤ T ) and j (1 ≤ j ≤ W ) so that Ti is an incoming target and H j one of the
possible hypotheses.
Obviously, when looking for the maximum (in the previous formulae), one scans all
possible Ti H j . An A, B vector pair represents one allocation plan. To be valid, one
allocation plan must comply with the following constraints:
• All incoming targets should appear at least one time in A.
• A target-hypothesis (target number/way-point number) cannot appear more than
one time per allocation plan.
The algorithm has then to find among all possible plans (A, B combinations), the
plan that minimizes CA, B . By defining heuristics, it is possible to prune potential
allocation plans and to focus the algorithm on the most promising solutions
(A∗ , Dijkstra algorithms). This kind of algorithm has been tested in scenarios
involving a large number of threats; i.e. saturating attacks [13].
15.7 Guidance Logics
The two diagrams Figs. 15.7 and 15.8 summarize the defending guidance phases
(mid-course, homing phase) and explain how the 6DoF Simulink Common Model
operates.
Fig. 15.7 During mid course, the guidance logics block extrapolates targets states and PIP
coordinates. It also determines if the seeker locks on one of the targets
Fig. 15.8 In homing phase, guidance logics sends true states of locked target. The seeker block
applies noises for measurement computation. Kalman filter estimates target’s states. Finally a
DGL1 command is applied
15.8 Scenario Description
Several scenarios for air defense in the ground and naval context have been defined.
A target allocation benchmark policy, with neither re-allocation, nor SENEZ
features, has been defined for comparison purpose. Scenario 3 (Fig. 15.9) deals with
ground defense where Air Defense Units (ADUs) are located around (Defended
Area, circle) the objective to be protected (RADAR, diamond mark in the center of
the Defended Area). A threat aircraft launches a single missile and then escapes the
radar zone. The aircraft and missile are supersonic.
The benchmark policy consists in launching a defending missile as soon as a
threat appears in the radar detection range. The benchmark algorithm starts by
launching one missile on the merged target. When both targets split, a second missile
is shot. This second defending missile will intercept the attacking missile. Due to
the sharp escape manoeuvres of the aircraft the first defending missile misses the
aircraft. After missing the aircraft, the benchmark algorithm launches a third missile
to chase the escaping aircraft. This last missile never reaches its target.
15.9 SENEZ Results
When the aircraft crosses the RADAR range the SENEZ algorithm launches two de-
fending missiles (Fig. 15.10). In ground scenarios, several ADUs are considered, the
algorithm automatically deciding by geometric considerations which ADU to use
when launching defending missiles. For simplicity, in naval and ground scenarios
only one location is considered as the final target goal (ground objective to protect,
306 S. Le Ménec
Fig. 15.9 Benchmark trajectories in scenario 3; the line coming from the right side which turns
on its right side is the trajectory of an aircraft which launches a missile towards the area to protect
(circle); the defense missiles are launched from the same launching base located on the border
of the area to protect at bottom of the figure; the first missile misses the aircraft; the second one
intercepts the threat/missile and the last one also misses reaching the aircraft
RADAR diamond mark). Simple waypoints are used to generate target trajectory
assumptions, even if it is possible to extend the concept to more sophisticated target
trajectory assumptions.
Figure 15.10 explains what happens when using the SENEZ algorithm and
what the improvements with respect to the benchmark policy are. The defending
missiles are M2 (on the left) and M3 (on the right). The aircraft trajectory is T1
turning on the right side. T2 is the missile launched by the aircraft. The defending
missiles intercept when the threat trajectories switch from plain to dot lines. The
dot lines describe what happens when using the benchmark policy in place of the
SENEZ algorithm. The remaining dot lines are the target trajectory assumptions
continuously refined during the engagement. A straight line assumption was
considered by the algorithm, however defended missiles assigned to the right and to
the left threats are enough to cover the three way-point assumptions elaborated when
the initial threat appears. The SENEZ algorithm intercepts the attacking missile
at longer distance than the benchmark algorithm, around a 1km improvement.
Moreover, SENEZ only launches two defending missiles and also intercepts the
launching aircraft which the benchmark algorithm fails to do. The fact that SENEZ
directs missiles to the left and right sides, plus the fact that SENEZ launches
earlier than the benchmark explains the SENEZ performance improvement. The
Monte Carlo simulations confirm these explanations. The benchmark strategy as the
Fig. 15.10 SENEZ target allocation algorithm on scenario 3; the dashed lines are the target
assumption trajectories that the algorithm considers for launching of defense missiles; a third
hypothesis “flying straight” to counter the incoming aircraft is also taken into account, however
two defense missiles are sufficient to cover the three hypotheses; the assumption “straight line” is
then removed from the figure; the black crosses explain where the SENEZ interceptions occur; we
superimpose the reference trajectories for purposes of comparison
SENEZ algorithm are able to intercept the missile/threat. The interception of the
missile/threat occurs a little earlier in the SENEZ case. The main difference is that
with the SENEZ algorithm the aircraft has been intercepted in most of the cases
(see Fig. 15.11). The mean value for intercepting is around 20 s. with SENEZ and
never happens before 20 s. with the benchmark strategy. Moreover, intercepting the
aircraft often misses in the benchmark simulations (80 s is the maximum time of the
simulation).
Monte-Carlo runs have been executed for all the scenarios, comparing intercep-
tion times obtained with the benchmark model to those obtained with the SENEZ.
Disturbances for these runs were as follow:
• Seeker noise;
• Initial position of the targets (disturbance with standard deviation equal to 50 m);
• Initial Euler angles of the target (disturbance with standard deviation equal
to 2.5◦ ).
308 S. Le Ménec
Fig. 15.11 Monte Carlo results on scenario 3; mean (μ ) and standard deviation (σ ) for the
benchmark policy (top) and the SENEZ policy (down) when intercepting the target T1 (aircraft
threat)
Performance analyses have also been executed on various other scenarios for ground
and naval applications contexts. Moreover, parametric studies have been conducted
on the following aspects:
• Waypoint placements;
• The drag coefficient of the defensive missiles;
• The radius and range of the seekers;
• Plus some variations on the scenario definition as time of appearance of the
second target in scenario 3.
Attention is also paid to finding way-point placements that would be convenient for
all ground to air scenarios, or all surface to air scenarios. The optimal placement of
the way-points highly depends on the scenario. This tends to prove there would be
an advantage in increasing the number of way-points/ missiles corresponding to an
increased number of SENEZ hypotheses.
Potential benefits were first illustrated on all the scenarios considered against
targets performing highly demanding evasive manoeuvres as well as apparent single
targets that resolve into two splitting targets. The trajectories obtained gave a
better idea of the SENEZ behavior. However, the way that target hypotheses are
issued proved to be critical. This has been demonstrated by the parametric studies
as placement of the way-points changed greatly the results from one scenario to
another. The sensitivity to parameters such as drag and seeker features has also
been investigated. Results obtained during these parametric studies seem to show the
initial number of way-points/hypotheses per target chosen three might be too low.
Statistical studies have also been conducted. While providing improved

performances in terms of time of last interception in most cases, the standard
deviation greatly increased in some scenarios due to misses among the first salvo.
These misses may be due to the simplified Kalman estimator used in our model, to
the choice of the mid-course guidance made (classical PN on PIP for these tests), to
the logics used for seeker pointing, or to an insufficient number of way-points.
15.10 SENEZ Perspectives
SENEZ guidance attempts to embed the future possible target behavior into the
guidance strategy by using goal oriented predictions of partitioned threat trajectories
to drive missile allocation and guidance commands. As such the SENEZ approach
offers an alternative to mid-course guidance schemes which guide the intercepting
missile or missiles towards a weighted track. The general application of SENEZ
would lead to a major change in weapon C2 philosophy for naval applications which
may not be justifiable.
The SENEZ engagement plan requires that a missile be fired at each partitioned
set of trajectories. This is different from many existing naval firing policies which
would fire a single missile to the target at long range and would delay firing another
missile until later when, if there were sufficient time, a kill assessment would be
undertaken before firing a second round. Depending on the evolution of target
behaviour, current C2 algorithms may fire a second missile before the potential
interception by the first missile. So existing systems tend to follow a more sequential
approach, the naval platform needing to preserve missile stocks so that salvo firings
are limited; unlike air platform the naval platform cannot withdraw rapidly from
an engagement. The proposed engagement plan is purely geometric in formation as
opposed to current schemes which use probabilities that the target is making for a
particular goal [2]. This latter type of engagement plan will generally result in fewer
missiles being launched. In the SENEZ scheme, a missile salvo will be fired more
often because the potential target trajectories are all equally likely. For instance,
when the target is at long range, it is likely that its choice of asset to attack is of
same probability, whereas at the inner range boundary, it is most likely that the
target is straight-flying towards its intended target.
Despite these potentially negative assessments of the SENEZ concept, there will
be occasions when current C2 algorithms will determine that it is necessary to
launch a salvo against a particular threat. For instance, a particularly high value
asset such as an aircraft carrier may be targeted and a high probability of successful
interception is required. In such circumstances there could be merit in the SENEZ
approach. Essentially, in the naval setting SENEZ may be considered as a possible
enhancement for the salvo firing determined by the engagement planning function
in existing C2 systems.
310 S. Le Ménec
For air-to-air systems the scope for considering a SENEZ form of guidance may
be greater. It is often policy for aircraft to fire two missiles at an opposing aircraft
engaged at medium range. With a two aircraft patrol, the leader and the wing aircraft
will each fire a missile at the target, there is an opportunity to shape the guidance
so that possible break manoeuvres are covered. With separate platforms firing the
missiles it would be necessary for inter-platform communication so that each missile
could be allocated to a unique trajectory partition.
By the way, many extensions could be addressed. First of all, new mid course
guidance scheme, so called particle guidance or trade-off mid course guidance
could be considered to guide on several tracks rather than assuming one unique
target [3]. Moreover, by considering allocation plans with two defending missiles
on one target NEZ as computed in [5, 10] it could be possible to involve defending
missile with diminished performances (no up link during the mid-course, less
kinematics performance, low cost seeker). Considering inter-missile communication
capabilities the current centralized algorithm could be improved with decentralized
features. Decentralization would allow to distribute the processing to individual
missiles rather than concentrating the process computation in one unique location;
i.e. the frigate to protect [4].
15.11 Conclusion
Cooperative guidance is a technique which is likely to emerge as a technology

in future weapon systems. Future weapon system scenarios will include the
need to engage multiple threats which places greater demands on the guidance
chain compared with one-on-one. This project has developed various component
technologies supporting the concept of cooperative guidance.
For the terminal phase, differential game guidance laws were applied where the
NEZ was used to characterize the ability of the missile to capture the target. The
focus of this article is concentrated on the way in which some of these technologies
are combined to provide an enhanced capability when salvos are launched to deal
with target threats, the SENEZ concept. Allocation algorithms have been extended
to consider the future possible behavior of the target; the technique can determine
how many missiles to fire and provide the initialization for the missiles in the salvo.
Initial results have demonstrated the potential of the SENEZ concept where in
some cases this technique has produced results that were better than the baseline
allocation algorithm. Although the potential has been demonstrated it remains to
examine the full robustness of the approach in terms of range of scenarios and
optimization of parameter setting.
15.12 Acronyms and Notations (Tables 15.3 and 15.4)

Table 15.3 Acronym table NEZ No Escape Zone
SENEZ Salvo Enhanced No Escape Zone
EIG Earliest Interception Geometry circle
ZEM Zero Effort Miss
DGL linear Differential game Guidance Law
DGL1 DGL guidance law version considering first
order dynamics for both players
TSG Trajectory Shaping Guidance law
PN Proportional Navigation guidance law
DGGL Differential Geometry Guidance Law
IFA In Flight target Allocation algorithm
ABL target Allocation Before Launch algorithm
LOS Line Of Sight
PIP missile target Predicted Interception Point
IMU Inertial Measurement Unit
ADU Air Defense Unit
C2 Command and Control systems
MCM ITP Materials and Components for Missiles
Innovation and Technology Partnership
Table 15.4 Notation table

Rwpt Range from area to protect to waypoints
ψwpt Heading angle of waypoints respect to the North direction
three waypoints: −ψwpt , +ψwpt and 0 (North)
Mi T j Hk Missile i mid course path considering
Target j flying through waypoint Hypothesis k
T j Hk Target j flying through waypoint Hypothesis k
N Maximum number of defending missiles
P Maximum number of threats
W Number of waypoints
Z ZEM; Zero Effort Miss
Zlimit Boundary of NEZ; maximum ZEM
tF Final time
tgo Time to go (tF − t)
θ Normalized tgo
y Evader pursuer perpendicular miss distance
ẏ Evader pursuer perpendicular velocity
ÿP Perpendicular pursuer acceleration; close to pursuer Latax
ÿE Perpendicular evader acceleration
τP Pursuer time lag
τE Evader time lag
aP max Pursuer maximum acceleration
aE max Evader maximum acceleration
μ0 Maximum pursuer acceleration
Over maximum evader acceleration at θ = 0
0 Time lag of the evader over time lag of the pursuer at θ = 0
ν Changes in μ ratio according to θ
RMT Missile target range
CA,B Allocation plan cost
312 S. Le Ménec
Acknowledgements This work was funded by the French—UK Materials and Components for
Missiles—Innovation and Technology Partnership (MCM ITP) research programme.
References
1. Basar, T., Olsder, G.J.: Dynamic Non Cooperative Game Theory, 2nd ed. CLASSICS in
Applied Mathematics, CL 23, Society for Industrial and Applied Mathematics, Philadelphia,
ISBN 978-0-89871-429-6 (1999)
2. Bessière, P. and the BIBA – INRIA Research Group – Projet CYBERMOVE: Survey –
Probabilistic Methodology and Techniques for Artefact Conception and Development, INRIA
Research Report No. 4730, ISSN 0249-6399 ISRN INRIA/RR-4730-FR+ENG (February
2003)
3. Dionne, D., Michalska, H., Rabbath, C.-A.: Predictive guidance for pursuit-evasion engage-
ments involving multiple decoys. J. Guidance Control Dyn. 30(5), 1277–1286 (2007)
4. Farinelli, A., Rogers, A., Petcu A., Jennings, N.-R.: Decentralised coordination of low-power
embedded devices using the max-sum algorithm. In: Padgham, Parkes, Müller, Parsons (eds.)
Proceedings of the 7th International Conference on Autonomous Agents and Multiagent
Systems, AAMAS-08, May 12–16, 2008, Estoril, Portugal, pp. 639–646 (2008)
5. Ganebny, S.A., Kumkov, S.S., Le Ménec, S., Patsko, V.S.: Numerical study of two-on-one
pursuit-evasion game. In: Preprints of the 18th IFAC World Congress, Milano, Italy, August
28–September 2, 2011, pp. 9326–9333 (2011)
6. Ge, J., Tang, L., Reimann, J., Vachtsevanos, G.: Suboptimal approaches to multiplayer
pursuit-evasion differential games. In: AIAA 2006–676 Guidance, Navigation, and Control
Conference, August 21–24, 2006, Keystone, Colorado (2006)
7. Isaacs, R.: Differential Games, a Mathematical Theory with Applications to Warfare and
Pursuit, Control and Optimization. Wiley, New York (1965)
8. Jang, J.S., Tomlin C. .: Control strategies in multi-player pursuit and evasion game. In:
AIAA 2005-6239 Guidance, Navigation, and Control Conference, August 15–18, 2005, San
Francisco, California (2005)
9. Le Ménec, S.: Cooperative mid course guidance law based on attainability constrains:
invited session: advanced methods for the guidance and control of autonomous vehicles. In:
Proceedings of the European Control Conference, August 23–26, 2009, Hungary, MoA3.5, pp.
127–131, ISBN 978-963-311-369-1 (2009)
10. Le Ménec, S.: Linear differential game with two pursuers and one evader. In: Breton M.,
Szajowski, K. (eds.) Annals of the International Society of Dynamic Games, Advances in
Dynamic Games, Theory, Applications, and Numerical Methods for Differential and Stochastic
Games. Birkhäuser, Springer, New York, ISBN 978-0-8176-8088-6 (2011)
11. Le Ménec, S., Shin, H.-S., Tsourdos, A., White, B., Zbikowski, R., Markham K.: Cooperative
missile guidance strategies for maritime area air defense. In: 1st IFAC Workshop on Distributed
Estimation and Control in Networked Systems (NecSys09), 24–26 September 2009, Venice,
Italy (2009)
12. Shima, T., Shinar, J., Weiss, H.: New interceptor guidance law integrating time-varying and
estimation-delay models. J. Guidance Control Dyn. 26(2), 295–303 (2003)
13. Shin, H.-S., Le Ménec, S., Tsourdos, A., Markham, K., White, B., Zbikowski, R.: Cooperative
guidance for naval area defence. In: 18th IFAC Symposium on Automatic Control in
Aerospace, September 6–10, 2010, Nara, Japan (2010)
14. Shin, H.-S., Piet-Lahanier, H., Tsourdos, A., Le Ménec, S., Markham K., White B.-A.:
Membership set-based mid course guidance: application to manoeuvring target interception.
In: Preprints of the 18th IFAC World Congress, August 28–September 2, 2011 Milano, Italy,
pp. 3903–3908 (2011)
15. Shin, H.-S., Tsourdos, A., Le Ménec, S., Markham, K., White, B.: Cooperative mid course
guidance for area air defence. In: AIAA 2010–8056 Guidance, Navigation and Control, August
2–5, 2010, Toronto, Ontario, Canada (2010)
16. Shinar, J., Shima, T.: Non-orthodox guidance law development approach for the interception
of maneuvering anti-surface missiles. J. Guidance Control Dyn. 25(4), 658–666 (2002)
17. Zarchan, P.: Tactical and Strategic Missile Guidance, Progress in Astronautics and Aeronautics,
vol. 219, 5th revised ed. (2007)
Chapter 16
A Method of Solving Differential Games Under
Integrally Constrained Controls
Aleksandr A. Belousov, Aleksander G. Chentsov, and Arkadii A. Chikrii
Abstract This study deals with linear game problems under integral constraints on
controls. The proposed scheme leans upon the ideas of the method of resolving
functions [Chikrii, Conflict Controlled Processes. Kluwer, Boston (1997)]. The
analog of the Pontryagin condition formulated in the paper, makes it feasible to
derive sufficient conditions for the finite-time termination of differential game.
Obtained results are illustrated with the typical game state of affairs “simple
motion” and continue researches [Chikrii and Belousov, Mem. Inst. Math. Mech.
Ural Div. Russ. Acad. Sci. 15(4):290–301 (2009) (in Russian); Nikol’sky, Diff. Eq.
Minsk. 8(6):964–971 (1972) (in Russian); Subbotin and Chentsov, Optimization of
Guarantee in Problems of Control. Nauka, Moscow (1981) (in Russian)].
Keywords Differential game • Pursuit game • Integral constraint • Set-valued

mapping • Resolving function
A.A. Belousov () • A.A. Chikrii

Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraine, Kiev, Ukraine
A.G. Chentsov
Institute of Mathematics and Mechanics, Ural Division of Russian
Academy of Sciences, Ekaterinburg, Russia
316 A.A. Belousov et al.
16.1 Problem Statement
The dynamics of the game is given by the differential equation
ż = Az + Bu + Cv, z(0) = z0 , (16.1)

where z ∈ Rn , u ∈ Rm , v ∈ Rl , A, B, C are constant matrices of dimensions n × n,
n × m, n × l, respectively.
Controls of the pursuing player u(·) and the evading one v(·) are Lebesgue
measurable functions, satisfying integral constraints:
∞ ∞
u(τ )2 dτ 1, v(τ )2 dτ 1. (16.2)
0 0
Such controls will be called admissible.

Terminal set M is a linear subspace from Rn .
Definition 16.1. The game will be considered as terminated at the time T = T (z0 )
if for any admissible control of the evader v(t) there exist admissible controls of the
pursuer u(t) that bring a solution to Eq. (16.1) z(t), from the initial state z0 to the
terminal set at instant T exactly: z(T ) ∈ M.
It is assumed that when constructing its control u(t) at the instant t, the pursuer
can use information on its adversary control that has been acquired up to this time
v(τ ), τ ∈ [0,t].
Denote by π the orthoprojector from Rn onto subspace L, which is a complement
to M in Rn . Let us introduce an assumption on the game parameters that may thought
of as an analog of Pontryagin’s condition [4] as applied to differential games with
integrally constrained controls.
Assumption 16.1. There exists a number λ , 0 λ < 1, such that for all positive t
the following inclusion is fulfilled:
π eAt CV ⊂ λ π eAt BU, (16.3)
where U = {u ∈ Rm : u2 1} and V = {v ∈ Rl : v2 1} are unit balls in the

control domains.
In the sequel, this assumption is thought to hold.
16.2 Auxiliary Statements
Let us fix an initial position z0 and introduce a set-valued mapping

Ω (t, τ , v) = γ ∈ R : γπ eAt z0 + π eAτ Cv ∈ (1 − λ )γ + λ v2 π eAτ BU ,
(16.4)
16 Solving Differential Games Under Integrally Constrained Controls 317
where (t, τ , v) ∈ R+× R+× Rl , R+ = [0, ∞). Consider an auxiliary function (a

so-called resolving function [1]):
γ (t, τ , v) = sup Ω (t, τ , v). (16.5)
In what follows, we investigate the properties of this set-valued mapping and

function.
Lemma 16.1. The following relationships hold:
γ (t, τ , v) 0 for all (t, τ , v) ∈ R+× R+× Rl ;
If π eAt z0 = 0, then γ (t, τ , v) = +∞ for all (τ , v) ∈ R+× Rl ;
If π eAt z0
= 0, then γ (t, τ , v) < ∞ for all (τ , v) ∈ R+× Rl .
Proof. To prove the first statement it is sufficient to show that 0 ∈ Ω (t, τ , v), that is,

π eAτ Cv ∈ λ v2 π eAτ BU. (16.6)
For v = 0 this inclusion is evidently satisfied. Taking into account (16.3) and the
inclusion 0 ∈ U at v
= 0 we have a chain of inclusions:
v √
π eA τ C ∈ π eAτ CV ⊂ λ π eAτ BU ⊂ λ π eAτ BU,
v
which means that γ (t, τ , v) 0.

In view of (16.6) it is evident that when π eAt z0 = 0, the inclusion γ ∈ Ω (t, τ , v)
holds for any positive number γ .
Let us fix the vector (t, τ , v) ∈ R+× R+× Rl such that π eAt z0 = 0. The norm
γπ eAt z0 + π eAτ Cv grows in γ linearly (for sufficiently large γ ) and the norm of
the vector from the right-hand side of the inclusion in (16.4) is bounded by the
function

(1 − λ )γ + λ v2 max π eAτ Bu,
u1
which, as a function of γ , grows no faster than a square root. Therefore, for

sufficiently large γ , the inclusion in (16.4) fails, that is, γ (t, τ , v) < ∞.

Lemma 16.2. If for some positive number γ the inclusion γ ∈ Ω (t, τ , v) is satisfied,
then the interval [0, γ ] belongs to the set Ω (t, τ , v), (t, τ , v) ∈ R+× R+× Rl .
Proof. Suppose that for some positive number γ the inclusion γ ∈ Ω (t, τ , v) is
satisfied. Then for any number β , 0 < β < γ , the following inclusion is fulfilled:

β β
β π e z + π eAτ Cv ∈
At 0
(1 − λ )γ + λ v2 π eAτ BU.
γ γ
Furthermore, from inclusion (16.6) follows

β β
1− π eAτ Cv ∈ 1 − λ v2 π eAτ BU.
γ γ
Whence, taking into account the convexity of the set π eAτ BU, we obtain

Aτ β β
β π e z + π e Cv ∈
At 0
(1 − λ )γ + λ v2 π eAτ BU
+ 1− λ v2 π eAτ BU
γ γ

β β
⊂ (1 − λ )γ + λ v2 + 1 − λ v2 π eAτ BU.
γ γ

The function f (γ ) = (1 − λ )γ + λ v2 is concave since f (γ ) < 0 for γ > 0.
That is why the following inequality in terms of function f is satisfied [5]:
f (γ ) − f (0) f (β ) − f (0)
for γ > β > 0,
γ β
whence we obtain

β β
f (γ ) + 1 − f (0) f (β )
γ γ
or

β β
(1 − λ )γ + λ v + 1 −
2 λ v2 (1 − λ )β + λ v2.
γ γ
Therefore, since 0 ∈ π eAτ BU, we have

β π eAt z0 + π eAτ Cv ∈ (1 − λ )β + λ v2 π eAτ BU,
and consequently β ∈ Ω (t, τ , v) for any β ∈ [0, γ ].

Lemma 16.3. If π eAt z0
= 0, then the upper bound in the definition of γ (t, τ , v)
(16.5) is attained. Function γ (t, τ , v) is Borel measurable jointly in (t, τ , v) ∈
R+× R+× Rl .
Proof. Let us study the function of the distance between the vector and the compact,
entering the definition of Ω (t, τ , v) (16.4):

δ (γ ,t, τ , v) = min γπ eAt z0 + π eAτ Cv − (1 − λ )γ + λ v2 π eAτ Bu.
u∈U
This function is continuous on the set R+ × R+ × R+ × Rl . The inclusion γ ∈

Ω (t, τ , v) is equivalent to the equation δ (γ ,t, τ , v) = 0. Then, by the function defined
in (16.5), it follows that there exists a sequence of numbers γi such that
δ (γi ,t, τ , v) = 0 and γi −−→ γ (t, τ , v).
i→∞
Whence, taking into account the finiteness of γ (t, τ , v) (if π eAt z0 = 0), we conclude
that δ (γ (t, τ , v),t, τ , v) = 0, and the upper bound in (16.5) is attained.
Let us consider the level set of function γ (t, τ , v):

Λa = (t, τ , v) ∈ R+× R+× Rl : γ (t, τ , v) < a .
We will show that this set is open and, therefore, Borel for any positive number a.
This will be imply the Borel measureability of the function γ (t, τ , v).
Let us fix a positive number a and let (t¯, τ̄ , v̄) be an arbitrary point of Λa .
Consequently, a ∈ / Ω (t¯, τ̄ , v̄) and δ (a, t¯, τ̄ , v̄) > 0. The continuity of the function
δ (·) ensures the existence of a neighborhood Δ of the point (t¯, τ̄ , v̄) such that the
inequality δ (a,t, τ , v) > 0 holds for all (t, τ , v) ∈ Δ , that is, γ (t, τ , v) < a for all
(t, τ , v) ∈ Δ . This implies that the set Λa is open, as required.

16.3 Main Theorem
We now formulate the sufficient conditions to guarantee bringing a solution

of (16.1), (16.2) to the terminal set M, beginning from the initial state z0 .
Theorem 16.1. Let Assumption 16.3 on the parameters of the game (16.1), (16.2)
hold.
Suppose that there exists a moment T = T (z0 ) such that either π eAT z0 = 0 or
πe z
AT 0
= 0 and for all admissible controls v(·) the following inequality is satisfied:
T
γ (T, T − τ , v(τ )) dτ 1. (16.7)
0
Then the differential game can be terminated at time T .

Proof. Let us fix the moment of time T that fulfills the assumption of the theorem.
Let us first analyze the case π eAT z0
= 0.
In view of Lemma 16.3 the resolving function γ (T, τ , v) is Borel measurable, and
for all (τ , v) ∈ R+× Rl the following inclusion is satisfied:

γ (T, τ , v)π eAT z0 + π eAτ Cv ∈ (1 − λ )γ (T, τ , v) + λ v2 π eAτ Bu.
u∈U
The mappings on the left- and right-hand sides of this inclusion are Borel measur-
able in (τ , v) and continuous in u, u ∈ U.
By the theorem of Kuratowski and Ryll-Nardzewski [2, 3] there exists a Borel-
measurable selection, that is, a Borel-measurable mapping w(τ , v) ∈ U, such that

γ (T, τ , v)π eAT z0 + π eAτ Cv = (1 − λ )γ (T, τ , v) + λ v2 π eAτ Bw(τ , v)
for all (τ , v) ∈ R+× Rl . Also, from this theorem it may be concluded that there exists
a Borel-measurable mapping w̃(τ , v) ∈ U such that

π eAτ Cv = λ v2 π eAτ Bw̃(τ , v)
for all (τ , v) ∈ R+× Rl .

Let us assume that on the interval [0, T ] the evader applies the control v(τ ), which
is a Lebesgue-measurable function satisfying the integral inequality
T
v(τ )2 dτ 1.
0
By condition (16.7) of the theorem there exists an instant T ∗ = T ∗ (z0 , v(·)) such that
T∗
γ (T, T − τ , v(τ )) dτ = 1.
0
Then the control of the pursuer on the interval [0, T ] prescribed by the formula
⎧
⎪
⎨ − (1 − λ )γ (T, T − τ , v(τ )) + λ v(τ )2 · w(T − τ , v(τ )), for τ ∈ [0, T ∗ ],
u(τ ) =
⎪
⎩
− λ v(τ )2 · w̃(T − τ , v(τ )), for τ ∈ (T ∗ , T ].
(16.8)
In essence, such a control law of the pursuer is a countercontrol with a single

switching.
Note that superposition of Borel and Lebesgue functions is a Lebesgue-
measurable function [2]. That is why control (16.8) is Lebesgue measurable for
any measurable control v(τ ).
Let us show that under such a control choice of the pursuer a solution of (16.1)
hits the terminal set at instant T :
T
π z(T ) = π eAT z0 + π eA(T −τ ) [Bu(τ ) + Cv(τ )]dτ
0
T∗ T
A(T −τ )
= πe z +
AT 0
πe Bu(τ ) dτ + π eA(T −τ ) Bu(τ ) dτ
0 T∗
T
+ π eA(T −τ )Cv(τ ) dτ = π eAT z0
0
T ∗
− (1 − λ )γ (T, T − τ , v(τ )) + λ v(τ )2 π eA(T −τ ) Bw(T − τ , v(τ )) dτ
0
T T
+ λ v(τ )2 π eA(T −τ ) Bw̃(T − τ , v(τ )) dτ + π eA(T −τ )Cv(τ ) dτ
T∗ 0
T∗
= π eAT z0 − [γ (T, T − τ , v(τ ))π eAT z0 + π eA(T −τ )Cv(τ )] dτ
0
T T
A(T −τ )
+ πe Cv(τ ) dτ + π eA(T −τ )Cv(τ ) dτ
T∗ 0
T∗
= π eAT z0 − γ (T, T − τ , v(τ )) dτ · π eAT z0 = 0.
0
This equation proves that a solution of (16.1) is brought to the terminal set z(T ) ∈ M.
Let us verify that a control u(τ ) (16.8) constructed in such a way meets the
integral constraints (16.2):
T T∗
u(τ )2 dτ = [(1 − λ )γ (T, T − τ , v(τ )) + λ v(τ )2]w(T − τ , v(τ ))2 dτ
0 0
T
+ λ v(τ )2 w̃(T − τ , v(τ ))2 dτ
T∗
T∗ T
(1 − λ ) γ (T, T − τ , v(τ )) dτ + λ v(τ )2 dτ 1.
0 0
The case π eAT z0 = 0 is analyzed in a similar way. In so doing, the pursuer control
on the interval [0, T ] is as follows:

u(τ ) = − λ v(τ )2 · w̃(T − τ , v(τ )). (16.9)
Analogously, it may be shown that in this case, too, the control (16.9) ensures
bringing a solution of (16.1) to the terminal set M at moment T (for any admissible
control v(τ )), and control u(τ ) meets integral constraint (16.2).

Remark 16.1. The theorem is easily transferred to the case of general constraints
on controls:
∞ ∞
uT (τ )Gu(τ ) dτ μ 2 , vT (τ )Hv(τ ) dτ ρ 2 , (16.10)
0 0
where G and H are symmetric, positive-definite matrices of dimensions m × m and

l × l, respectively, u(τ ) and v(τ ) are measurable functions, and μ and ρ are positive
numbers, where the symbol T means transposition.
The substitution
1 1 1 1 1 1
ũ = · G 2 u, ṽ = · H 2 v, B̃ = μ · BG− 2 , C̃ = ρ ·CH − 2
μ ρ
transforms the differential game (16.1), (16.10) into the original form
ż = Az + B̃ũ + C̃ṽ, z ∈ Rn , ũ ∈ Rm , ṽ ∈ Rl , z(0) = z0 ,

∞ ∞
ũ(τ )2 dτ 1, ṽ(τ )2 dτ 1.
0 0
In such a form the theorem can be transformed into the general case.
16.4 Example: Simple Motions
Below, the developed technique is applied to solve a specific problem.

The motions of the pursuer and the evader are described by the following
differential equations:
ẋ = u, x(0) = x0 , x ∈ Rn , u ∈ Rn ,
(16.11)
ẏ = v, y(0) = y0 , y ∈ Rn , v ∈ Rn .
Constraints on the control have the form

∞ ∞
u(τ )2 dτ μ 2 , v(τ )2 dτ ρ 2 , μ > ρ > 0. (16.12)
0 0
A terminal set is given by the equality x = y.

Upon substitution
u v
z = x − y , ũ = , ṽ = ,
μ ρ
the game takes the standard form
ż = μ ũ − ρ ṽ, z0 = x0 − y0 ,
∞ ∞
ũ(τ ) dτ 1,
2
ṽ(τ )2 dτ 1.
0 0
The terminal set is M = {0}, the operator π represents itself as an identical
transformation.
It is easy to see that Assumption 16.3 is satisfied for the parameter λ = ρμ < 1:
−ρ D ⊂ λ · μ D, D = {z ∈ Rn : z2 1}.
The resolving function γ (·) can be found from the formula for the set-valued
mapping

Ω (t, τ , ṽ) = γ ∈ R : γ z − ρ ṽ ∈ (1 − λ )γ + λ ṽ · μ D
0 2

= γ ∈ R : (γ z0 − ρ ṽ)2 (1 − λ )γ + λ ṽ2 · μ 2

μ (μ − ρ )
= γ : F(γ , ṽ) = z0 2 γ 2 − 2γ ρ z0 , ṽ +
2

−ρ (μ − ρ )ṽ2 0 .
The function F(γ , ṽ) represents itself as a quadratic polynomial with respect to γ
with a positive coefficient at a higher degree term; therefore, γ (ṽ) (16.5) is the largest
root of the quadratic equation F(γ , ṽ) = 0.
Note that F(0, v) 0 for all v ∈ Rn ; therefore, the function γ (v) is defined for all
v and γ (v) 0.
Let us find v∗ that yields a minimum to the function γ (v). To this end we
differentiate the following equation:
∂γ ∂ F/∂ v 2γρ z0 + 2ρ (μ − ρ )v∗

=− = = 0,
∂v ∂ F/∂ γ ∂ F/∂ γ
whence we obtain a unique extremum of the function γ (·):
γ
v∗ = − · z0 .
μ −ρ
The corresponding value v∗ of function γ (·) can be found from the quadratic
equation F(γ , v∗ ) = 0:
(μ − ρ )2
γ (v∗ ) = ,
z0 2
whence
μ −ρ 0
v∗ = − ·z .
z0 2
It can be easily shown that, in view of the fact that the function γ (·) is the largest
root of the quadratic equation, the following inequality is true:

ρ (μ − ρ )v
γ (v) →∞ when v → ∞.
z0 2
From this it follows that the unique extremum of function γ (v) appears as its
minimum.
By Theorem 16.1 the time of the game termination is defined by the relationships
T T
(μ − ρ )2
γ (v(τ )) dτ γ (v∗ ) dτ = · T = 1,
0 0 z0 2
whence
z0 2
T= . (16.13)
(μ − ρ )2
It should be noted that instant T coincides with the time of first absorption [4] for
the game (16.11), that is, it coincides with the first moment when the attainability
set of the pursuer x absorbs the attainability set of the evader y. This instant T is the
minimal guaranteed time of the game (16.11) termination.
Let us present an explicit form of a countercontrol u(v) of the pursuer on the
interval [0, T ] that solves the problem of approach. The strategy of the pursuer is
defined by the relationships

γ z0 − ρ ṽ = (1 − λ )γ + λ ṽ2 μ w , w 1 ,

ũ(ṽ) = − (1 − λ )γ + λ ṽ2 w,
whence
−γ z0 + ρ ṽ
ũ(ṽ) = .
μ
Then, using the substitution u = μ ũ, v = ρ ṽ and the quadratic equation F(γ , ṽ) = 0,
we deduce that
u(v) = v − γ (v) · z0,
where

1 μ (μ − ρ )
γ (v) = 0 2 × z0 , v +
z 2

2
μ ( μ − ρ ) μ − ρ
+ z0 , v + + z0 2 v2 .
2 ρ
Thus, this control assures solving the problem (16.11), (16.12) no later than at
time T (16.13).
References
1. Chikrii, A.A.: Conflict Controlled Processes. Kluwer, Boston (1997)

2. Kisielewicz M.: Differential Inclusions and Optimal Control. Kluwer, Boston (1991)
3. Kuratowski, K., Ryll-Nardzewski, C.: A general theorem on selectors. Bull. Acad. Pol. Sci.
Warsaw 13, 397–403 (1965)
4. Pontryagin, L.S.: Selected Scientific Papers, vol. 2. Nauka, Moscow (1988) (in Russian)
5. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Chapter 17
Anglers’ Fishing Problem
Anna Karpowicz and Krzysztof Szajowski
Abstract The model considered here will be formulated in relation to the “fishing
problem,” even if other applications of it are much more obvious. The angler goes
fishing, using various techniques, and has at most two fishing rods. He buys a
fishing pass for a fixed time. The fish are caught using different methods according
to renewal processes. The fish’s value and the interarrival times are given by the
sequences of independent, identically distributed random variables with known
distribution functions. This forms the marked renewal–reward process. The angler’s
measure of satisfaction is given by the difference between the utility function,
depending on the value of the fish caught, and the cost function connected with
the time of fishing. In this way, the angler’s relative opinion about the methods of
fishing is modeled. The angler’s aim is to derive as much satisfaction as possible, and
additionally he must leave the lake by a fixed time. Therefore, his goal is to find two
optimal stopping times to maximize his satisfaction. At the first moment, he changes
his technique, e.g., by discarding one rod and using the other one exclusively. Next,
he decides when he should end his outing. These stopping times must be shorter than
the fixed time of fishing. Dynamic programming methods are used to find these two
optimal stopping times and to specify the expected satisfaction of the angler at these
times.
Keywords Stopping time • Optimal stopping • Dynamic programming

• Semi-Markov process • Marked renewal process • Renewal–reward process
• Infinitesimal generator • Fishing problem • Bilateral approach • Stopping game
A. Karpowicz
Bank Zachodni WBK, Rynek 9/11, 50-950 Wrocław, Poland
K. Szajowski ()
Institute of Mathematics and Computer Science, Wybrzeże
Wyspiańskiego 27, 50-370 Wrocław, Poland
328 A. Karpowicz and K. Szajowski
17.1 Introduction
Before we start our analysis of the double optimal stopping problem (cf. the idea
of multiple stopping for stochastic sequences in Haggstrom [8] and Nikolaev [16])
for the marked renewal process related to the angler’s behavior, let us present the
so-called fishing problem. One of the first authors to consider the basic version
of this problem was Starr [19], and further generalizations were done by Starr
and Woodroofe [21], Starr et al. [20], and Kramer and Starr [14]. A detailed
review of papers related to the fishing problem was presented by Ferguson [7].
A simple formulation of the fishing problem, where the angler changes his location
or technique before leaving his fishing spot, was done by Karpowicz [12]. We extend
the problem to a more advanced model by taking into account the various fishing
techniques used at the same time (the parallel renewal–reward processes or the
multivariate renewal–reward process). This is motivated by the natural, more precise
models of the known, real applications of the fishing problem. The typical process of
software testing consists in checking subroutines. Initially, many kinds of bugs are
searched for. Consecutive stopping times are moments when the expert stops general
testing of modules and starts checking the most important, dangerous types of errors.
Similarly, in proofreading, it is natural to look for typographic and grammatical
errors at the same time. Next, we look for errors in language use.
Since various works are done by different groups of experts, it is natural that they
would compete against each other. If the first period work comprises one group and
the second period requires other experts, then they can be players in a game among
themselves. In this case, the proposed solution is to find the Nash equilibrium where
player strategies are the stopping times.
The applied techniques of modeling and finding the optimal solution are similar
to those used in the formulation and solution of the optimal stopping problem for the
risk process. Both models are based on the methodology explicated by Boshuizen
and Gouweleeuw [1]. The background mathematics for further reading are the
monographs by Brémaud [3], Davis [4], and Shiryaev [18]. The optimal stopping
problems for the risk process are considered in papers by Jensen [10], Ferenstein
and Sierociński [6], and Muciek [15]. A similar problem for a risk process with a
disruption (i.e., when the probability structure of the considered process is changed
at one moment θ ) was analyzed by Ferenstein and Pasternak-Winiarski [5]. The
model of the last paper brings to mind the change in fishing methods considered
here. However, such a change is considered as made by a decision maker and it is not
uncontrolled, automatically simply consequence on the basis of environment type.
The following two sections present details of the model. A slight modification
of the background assumption by the adoption of multivariate tools (two rods)
and the possible control of their numbers in use yields a different structure of the
base model (the underlying process, sets of strategies—admissible filtrations and
stopping times). This modified structure allows the introduction of a new kind of
knowledge selection, which consequently leads to a game model of the angler’s
expedition problem in Sects. 17.1.2 and 17.2.2. Following a general formulation
17 Anglers’ Fishing Problem 329
of the problem, a version of the problem for a detailed solution will be chosen.
However, the solution is presented as a scalable procedure dependent on parameters
that depends on various circumstances. It is not difficult to adopt a solution to a wide
range of natural cases.
17.1.1 Single Angler’s Expedition
An angler goes fishing. He buys a fishing pass for a fixed time t0 , which gives him
the right to use at most two rods. The total cost of fishing depends on the real
time of each equipment usage and the number of rods used simultaneously. The
angler starts fishing with two rods up to moment s. The effect on each rod can be
modeled by the renewal processes {Ni (t),t ≥ 0}, where Ni (t) is the number of fish
caught with rod i, i ∈ A := {1, 2}, during time t. Let us combine them together
into a marked renewal process. The usage of the ith rod by time t generates cost
ci : [0,t0 ] → ℜ (when a rod is used simultaneously with other rods, it will be denoted
by an index depending on the set of rods, e.g., a, cai ), and the reward is represented
{i} {i}
by independent, identically distributed (i.i.d.) random variables X1 , X2 , . . . (the
value of the fish caught with the ith rod) with cumulative distribution function Hi .1
The streams of two kinds of fish are mutually independent and are independent
of the sequence of random moments when the fish have been caught. The two-
→
−
vector process N (t) = (N1 (t), N2 (t)), t ≥ 0, can be represented also by a sequence
of random variables Tn taking values in [0, ∞] such that
T0 = 0,
Tn < ∞ ⇒ Tn < Tn+1 , (17.1)
for n ∈ N, and a sequence of A-valued random variables zn for n ∈ N ∪ {0} (Chap. 2

in Brémaud [3]; Jacobsen [9]). The random variable Tn denotes the moment of
catching the nth fish (T0 = 0) of any kind and the random variable zn indicates the
class of fish to which the nth fish belongs. The processes Ni (t) can be defined by the
sequence {(Tn , zn )}∞
n=0 as
∞
Ni (t) = ∑ I{Tn ≤t} I{zn =i} . (17.2)
n=1
→
−
Both the two-variate process N (t) and the double sequence {(Tn , zn )}∞
n=0 are
called a two-variate renewal process. Optimal stopping problems for the com-
pound risk process based on the two-variate renewal process were considered by
Szajowski [22].
1 The following convention is used throughout the paper: →

−
x = (x1 , x2 , . . ., xs ) for the ordered
collection of the elements {xi }i=1 .
s
Let us define, for i ∈ A and k ∈ N, the sequence

{i}
n0 = 0,
{i} {i}
nk+1 = inf{n > nk : zn = i} (17.3)
{i} {i} {i} {i}
and set Tk = T {i} . Let us define random variables Sn = Tn − Tn−1 assuming
nk
that they are i.i.d. with the continuous, cumulative distribution function Fi (t) =
{i} {i} {i}
P(Sn ≤ t) and the conditional distribution function Fis (t) = P(Sn ≤ t|Sn ≥ s).
In Sect. 17.2.1 the alternative representation of the two-variate renewal process will
be proposed. There is also a mild extension of the model in which the stream of
events after some moment changes to another stream of events.
Remark 17.1. In various procedures, it is necessary to localize the events in a group
of renewal processes. Let C be the set of indices related to such a group. The
sequence {nC ∞ C C C
k }k=0 such that n0 = 0, nk+1 := inf{n > nk : zn ∈ C} has an obvious
meaning.
Analogously, nC (t) := inf{n : Tn > t,zn ∈ C}.
Let i, j ∈ A. The angler’s satisfaction measure (the net reward) at the period a
from rod i is the difference between the utility function gai : [0, ∞)2 × A × R+ →
[0, Gai ] which can be interpreted as the reward from the ith rod when the last
success was on rod j and, additionally, is dependent on the value of the fish caught,
the moment of the result evaluation, and the cost function cai : [0,t0 ] → [0,Cia ]
reflecting the cost of duration of the angler’s expedition. We assume that gai and
cai are continuous and bounded and, additionally, cai are differentiable. Each fishing
method evaluation is based on different utility functions and cost functions. In this
way, the angler’s relative opinion about them is modeled.
The angler can change his method of fishing at moment s and decide to use
only one rod. It could be one of the rods used up to moment s or the other one.
Even though the rod used after s is the one chosen from those used before s, its
effectiveness could be different before and after s. After moment s the modeling
{3}
process is the renewal–reward one with the stream of i.i.d. random variables Xn at
{3}
moments Tn [i.e., appearing according to the renewal process N3 (t)]. Following
these arguments, the mathematical model of catching fish, and their value after s,
could (and in practice should) be different from those for rods used before s. The
reason for the reduction in the number of rods could be their better effectiveness.
The value of the fish that have been caught up to time t, if the change in fishing
technique took place at time s, is given by
Ni (s∧t) N3 ((t−s)+ ) N3 ((t−s)+ )
∑ ∑ ∑ ∑
{i} {3} {3}
Mts = Xn + Xn = Ms∧t + Xn ,
i∈A n=1 n=1 n=1
{i} Ni (t) {i} {i} →

− {1} {2}
where Mt = ∑n=1 Xn and Mt = ∑2i=1 Mt . We denote M t = (Mt , Mt ). Let
Z(s,t) denote the angler’s payoff for stopping at time t (end of expedition) if the
change in the fishing method took place at time s. The natural filtration related to
the double-indexed process Z(s,t) is Fts = σ {0 ≤ u ≤ s ≤ v ≤ t : Z(u, v)}. If the
effect of extending the expedition after s is described by gbj : R+ × A × [0,t0 ] × R ×
2
[0,t0 ] → [0, Gbj ], j ∈ B, minus the additional cost of time cbj (·), where cbj : [0,t0 ] →
[0,Cbj ] [when card(B) = 1, then index j will be abandoned, and cb = ∑ j∈B cbj will
be used, which will be adequate]. Then the payoff can be expressed as
⎧ a − → a
⎪ g ( M t , zN(t) ,t) − c (t)
⎪
⎪
if t < s ≤ t0 ,
⎪ →
−
⎨ ga ( M , z , s) − ca (s)
s N(s)
Z(s,t) = →
− (17.4)
⎪
⎪ +g ( M s , zN(s) , s, Mts ,t) − cb(t − s) if s ≤ t ≤ t0 ,
b
⎪
⎪
⎩
−C if t0 < t,
where the function ca (t), ga (− →m , i,t) and the constant C can be taken as follows:
→
− →
−
c (t) = ∑i=1 ci (t), g ( M s , j,t) = ∑2i=1 gai ( M t , j,t), C = C1a +C2a +Cb . With the no-
a 2 a a
tation wb (−→ = wa (−
m , i, s, m,t) →
m , i, s) + gb (−
→ − cb (t − s) and wa (−
m , i, s, m,t) →m , i,t) =
a →
− a
g ( m , i,t) − c (t), formula (17.4) is reduced to
Z(s,t) = Z {zN(t) } (s,t)I{t<s≤t0 } + Z {zN(s) } (s,t)I{s≤t} ,
where
→
− →
−
Z {i} (s,t) = I{t<s≤t0 } wa ( M t , i,t) + I{s≤t≤t0 } wb ( M s , i, s, Mts ,t) − I{t0 <t}C.
17.1.2 The Competitive Fishing
When the methods of fishing are operating by separate anglers, a stopping random
field can be built based on the structure of the marked renewal–reward process as
a model of the competitive expedition results. One possible definition of payoff
is based on the assumption that each player has his own account related to the
exploration of the fishery. The states of the accounts depend on who forces the first
stop for changing the technique, under what circumstances, and what techniques
they choose. The first stopping moment, the minimum of stopping moments chosen
by the players, is, after the moment of the event (catching fish), Tn with rod zn , and
the reward functions depend on the type of fishing that gives recent caught fish (i.e.,
j, where j = zn ). The player’s payoff wai (−
→
m , j,t) = gai (−
→
m , j,t) − cai (t). The part of
the payoff that depends on the second chosen moment, which stops the expedition,
is different for the player who forces the change in fishing methods (the leader)
by himself and the other for the opponent. The leader is the one responsible for
determining the expedition deadline.
Let us assume for a while that the ith player, i = 1, 2, takes his opponent’s rod
and gives his own rod to his opponent. It is not a crucial assumption at any rate, and
the method of fishing after the change can be different from both available methods
before the considered moment. The method of treatment of the case without this
assumption will be explained later (p. 335), when the behavior of the player in the
second part of the expedition is formulated. Define the function
w̃bi (−
→ = w̃ai (−
m , j, s, k, m,t) →
m , j, s) + g̃bi (−
→ − cb(t − s)
m , j, s, k, m,t)
for j ∈ A, k ∈ B, where j is the rod with which the fish had been caught just before
the moment of the first stop and k is the technique used by the ith player after
the change (the denotation −k is used for a complementary rod or player who has
decided what is appropriate). It describes the case where the player deciding to
change the method chooses the perspective technique of fishing as the first one.
Presumably he will explore the best methods with improvements and the second
angler will use the rod that is not used by the leader. The payoff of the players,
when the ith player is the one who forces the first stop, has the following form:
→
− →
−
Zi ( j, s,t) = I{t≤s≤t0 } g̃ai ( M t , j,t) + I{s<t≤t0 } w̃bi ( M s , i, s, −i, Mts ,t) − I{t0 <t}C,
(17.5)
−
→ →
−
Z−i ( j, s,t) = I{t≤s≤t0 } g̃a−i ( M t , j,t) + I{s<t≤t0 } w̃b−i ( M s , i, s, i, Mts ,t) − I{t0 <t}C.
(17.6)
In the preceding payoffs it is assumed that the final stop can be declared at any
moment. Each players declares changes in techniques right after an event with his
rod (catching fish with his rod) as long as on the opponent’s rod there is no event.
The details of the strategy sets and the solution concept are formulated in subsequent
parts of this paper.
The extension considered here is motivated by the natural, more precise models
of known real applications of the fishing problem. The typical process of software
testing consists of checking subroutines. Various types of bugs can be discovered
in this way. Each problem with subroutines generates the cost of bug removal and
increases the value of the software. It depends on the types of bugs found. Prelimi-
nary testing requires various types of experts. The stable version of subroutines can
be kept by less advanced computer scientists. The consecutive stopping times are
moments when the expert of a certain class stops testing one module and another
tester starts checking. The procedure for proofreading is similar.
17.2 The Optimization Problem and a Two-Person Game
17.2.1 Filtrations and Markov Moments
Let the sequences of pairs {(Tn , zn )}∞

n=0 be a two-variate renewal process (A-marked
renewal process) defined on (Ω , F , P). According to the denotation of the previous
{i}
section, there are three renewal processes {Tn }∞ n=0 , i = 1, 2, 3, and they are denoted
{z } {i} {i}
by Tn = TNz n(Tn ) . There are also three renewal–reward processes {(Tn , Xn )}∞
n=0 ,
n
{z }
i = 1, 2, 3. By convention let us denote Xn = XNz n(Tn ) . The following σ -field
n
generated by the history of the A-marked renewal processes are defined by
Ft = FtA = σ (X0 , T0 , z0 . . . , XN(t) , TN(t) , zN(t) ) (17.7)
for t ≥ 0. This σ -field can be defined as

→
−
FtA = σ {( N (s), XN(s) , zN(s) ), 0 ≤ s ≤ t, i ∈ A}.
Definition 17.1. Let T be a set of stopping times with respect to σ -fields {Ft },
t ≥ 0, defined by (17.7). The restricted sets of stopping times are
Tn,K = {τ ∈ T : τ ≥ 0, Tn ≤ τ ≤ TK } (17.8)
for n ∈ N and n < K are subsets of T . The elements of Tn,K are denoted τn,K .
The stopping times τ ∈ T have a nice representation that will be helpful in the
solution of the optimal stopping problems for the renewal processes [3]. A crucial
role in our subsequent considerations will be played by such a representation. The
following lemma is for unrestricted stopping times.
Lemma 17.1. If τ ∈ T , then there exist Rn ∈ Mes(Fn ) such that the condition τ ∧
Tn+1 = (Tn + Rn ) ∧ Tn+1 on {τ ≥ Tn } a.s. is fulfilled.
Various restrictions in the class of admissible stopping times will change this rep-
resentation. Some examples of subclasses of T are formulated here (Lemma 17.1).
Only a few of them are used in the optimization problems investigated in this paper
(Corollary 17.1).
{3} {3} {3} {3}
Let Fs,t = σ (FsA , X0 , T0 , . . . , XN ((t−s))+ , TN ((t−s)+ ) ) be the σ -field generated
3 3
by all events up to time t if the switch at time s from a two-variate renewal process to
{i}
another renewal process took place. For simplicity of notation we set Fn := F {i} ,
Tn
{i}
Fn := FTn , Fns := F {3} .2 Let Mes(Fn ) (Mes(Fn )) denote the set of nonnegative
s,Tn
{i}
and Fn (Fn )-measurable random variables. Henceforth, T and T s will stand
for the sets of stopping times with respect to σ -fields Fs and {Fs,t , 0 ≤ s ≤ t},
respectively. Furthermore, we can define for n ∈ N and n ≤ K the sets
{i} {i}
1. Tn,K = {τ ∈ T : τ ≥ 0, Tn ≤ τ ≤ TK };
{i} {i}
2. Tn = {τ ∈ T : τ ≥ Tn };
2 For the optimization problem there are two epochs: before the first stop, where there are some
payoffs, the model of stream of events, and after the first stop, when there are other payoffs and
different streams of events. In Sect. 17.3, this will be emphasized by adopting adequate denotations.

{i,A{−i} } {i} A −i A −i {i}

3. T̄n,K τ ∈ T : τ ≥ 0, Tn ≤ τ ≤ TK , ∀k τ ∈
= / Tk , Tk+1 ∨ T {i} A−i ,
n (Tk )

−i { j}
where A{−i} := A \ {i}, TkA := min{ j∈A{−i} } T { j} {i} ;
n (Tk )

{i} {i} −i A−i ∨ T {i}

4. T̄n = τ ∈ T : τ ≥ Tn , ∀k τ ∈ / TkA , Tk+1 {i} A−i
;
n (Tk )
s = {τ ∈ T s : 0 ≤ s ≤ τ , T {3}
5. Tn,K n ≤ τ ≤ TK }.
The stopping times τ ∈ T {i} and τ ∈ T̄ {i} can also be represented as shown in
Lemma 17.1.
Lemma 17.2. Let the index i ∈ A be chosen and fixed.
{i} {i} {i}
1. For every τ ∈ T {i} and n ∈ N there exist Rn ∈ Mes(Fn ) such that τ ∧ Tn+1 =
{i} {i} {i} {i}
(Tn + Rn ) ∧ Tn+1 on {τ {i} ≥ Tn } a.s. is fulfilled.
{i} {i}
2. If τ ∈ T̄ {i} and n ∈ N, then there exist Rn ∈ Mes(Fn ) such that the condition
{i} {i} {i} {i} {i}
τ ∧ Tn+1 = (Tn + Rn ) ∧ Tn+1 on {τ ≥ Tn } a.s. is fulfilled.
Obviously the angler wants to have as much satisfaction as possible and must
leave the lake before the fixed time. Therefore, his goal is to find two optimal
∗ ∗
stopping times τ a and τ b so that the expected gain is maximized:
∗ ∗
EZ(τ a , τ b ) = sup sup EZ(τ a , τ b ), (17.9)
τ a ∈T τ b ∈T τ a
∗
where τ a corresponds to the moment when he should change to the more effective
∗
rod and τ b to the moment when he should stop fishing. These stopping moments
should appear before the fixed time of fishing t0 . The process Z(s,t) is piecewise-
deterministic and belongs to the class of semi-Markov processes. The optimal
stopping time of similar semi-Markov processes was studied by Boshuizen and
Gouweleeuw [1] and the multivariate point process by Boshuizen [2]. Here the
structure of multivariate processes is revealed and their importance for the model is
shown. We use dynamic programming methods to find these two optimal stopping
times and to specify the expected satisfaction of the angler. The method of the
solution is similar to those used by Karpowicz and Szajowski [13], Karpowicz [12],
and Szajowski [22]. Let us first observe that by the properties of conditional
expectation we have
∗ ∗
∗

EZ(τ a , τ b ) = sup E{E Z(τ a , τ b )|Fτ a } = sup EJ(τ a ),
τ a ∈T τ a ∈T
where
∗

J(s) = E Z(s, τ b )|Fs = ess sup E Z(s, τ b )|Fs . (17.10)
τ b ∈T s
∗ ∗
Therefore, to find τ a and τ b , we must calculate J(s) first. The process J(s)
corresponds to the value of the revenue function in one stopping problem if the
observation starts at moment s.
17.2.2 Angler Games
Based on the consideration of Sect. 17.1.2, a version of competitive fishing is

formulated here. There are two anglers, each using one method of fishing at the
beginning of an expedition and an additional fishing period after a certain moment
by another method up to the moment chosen by a certain rule. The random field that
is the model of payoffs in such a case is given by (17.5) and (17.6). The final segment
starts at the moment when one of the anglers wants it. Let τi ∈ T̄ {i} , i = A, be the
strategies of the players for stopping their individual fishing period and switching to
the time segment that is stopped at moment σ determined by one angler (let us call
that angler the leader). The payoffs of the players are
ψi (τ1 , τ2 ) = Zi (zN(τ1 ∧τ2 ) , τ1 ∧ τ2 , σ τ1 ∧τ2 )I{τ1 =τ2 }

+ Zi (zN(τ1 ∧τ2 ) ∧ zN(τ1 ∧τ2 ) , τ1 ∧ τ2 , σ τ1 ∧τ2 )I{τ1 =τ2 } . (17.11)
Assignment of the leader in the case τ1 = τ2 is arbitrary but defined. The aim is to
find a pair (τ1 , τ2 ) of stopping times such that for i ∈ {1, 2} we have
Eψi (τi , τ−i

) ≥ Eψi (τi , τ−i ). (17.12)
The optimization problem of the angler and the game between two anglers will
involve the construction of the optimal second stopping moment.
17.3 Construction of the Optimal Second Stopping Time
In this section, we will find the solution to one stopping problem defined by (17.10).
We will first solve the problem for a fixed number of fish caught and then consider
the case with an infinite stream of fish caught. In this section we fix s, the moment
when the change took place, and m = Ms , the mass of the fish at time s. Taking into
account various models of fishing after the first stop, it is necessary to admit various
models of event streams. Assume that the moments of successive fish caught after
{3}
the first stop are Tn and the times between the events are i.i.d. with continuous,
cumulative distribution function F(t) with the density function f(t)and the fish’s
value represented by i.i.d. random variables with distribution function H(t) [for
convenience this part of the expedition is modeled by the renewal process denoted
{3} {3}
(Tn , Xn )].
17.3.1 Fixed Number of Fish Caught
b := τ b ∗ ∗
In this subsection we are looking for the optimal stopping time τ0,K K
∗

E Z(s, τKb )|Fs = ess sup E Z(s, τKb )|Fs , (17.13)
τKb ∈T0,K
s
where s ≥ 0 is a fixed time when the position is changed and K is the maximum
number of fish that can be caught. Let us define

b b∗
Γn,K
s
= ess sup E Z(s, τn,K )|Fns = E Z(s, τn,K )|Fns , n = K, . . . , 1, 0 (17.14)
b ∈T s
τn,K n,K
{3}
and observe that ΓK,K
s = Z(s, T
K ). In subsequent considerations, we will use the
representation of stopping time formulated in Lemmas 17.1 and 17.2. The exact
form of the stopping strategies is given in the following corollary.
{i}
Corollary 17.1. Let i ∈ A. If τ a ∈ T {i} , τ b ∈ T s , then there exist Ran ∈ Mes(Fn )
{i} {i}
and Rbn ∈ Mes(Fns ), respectively, such that for conditions τ a ∧ Tn+1 = (Tn + Ran ) ∧
{i} {i} {3} {3} {3} {3}
Tn+1 on {τ a ≥ Tn } a.s. and τ b ∧ Tn+1 = (Tn + Ran ) ∧ Tn+1 on {τ a ≥ s ∧ Tn }
a.s. are valid.
Now we can derive the dynamic programming equations satisfied by Γn,K s .
n{1} = MT 1 , Mns =
To simplify the notation, we can write Mt = Mts for t ≤ s, M n
M s {3} , and F̄i = 1 − Fi . The payoff functions are simplified here to ĝa (m) =
Tn
ga (m1 , m2 , i,t)I{m1 +m2 =m} (m1 , m2 ), ĝb (m) = gb (m1 , m2 , i, s, m,t)I
{m−m 1 −m2 =m}
.
Lemma 17.3. Let s ≥ 0 be the moment of changing the fishing place. For n = K − 1,
K − 2, . . . , 0

{3}
ΓK,K
s
= Z s, TK ,
{3}
Γn,K
s
= ess sup ϑn,K (Ms , s, Mns , Tn , Rbn ) a.s., (17.15)
Rb
n ∈Mes(Fn )
s
where

r) = I{t≤t0 } F̄(r) I{r≤t0 −t} ŵb (m, s, m,t
ϑn,K (m, s, m,t, + r) − CI{r>t0 −t}

+E I {3}
{Sn+1 ≤r}
Γn+1,K
s
|Fns − CI{t>t0 }

and there exists Rbn ∈ Mes(Fns ) such that
{3}
Γn,K
s
= ϑn,K (Ms , s, Mns , Tn , Rbn ) a.s., (17.16)
⎧
⎨ τ b∗ ∗ {3}
b ∗ n+1,K if Rbn ≥ Sn+1 ,
τn,K = (17.17)
⎩ T {3} + Rb ∗ if Rb ∗ < S{3} ,
n n n n+1
∗
b =T {3} b
τK,K = ŵa (m, s)+ ĝb (m−m)−c
K , and ŵ (m, s, m,t) b (t −s) where ŵa (m,t) =
a a
ĝ (m) − c (t).
∗ ∗
Remark 17.2. Let {Rbn }Kn=1 , RbK = 0, be a sequence of Fns -measurable random
s = K ∧ inf{i ≥ n : Rb < S{3} }. Then Γ s =
variables, n = 1, 2, . . . , K, and ηn,K
∗
∗
i i+1 n,K
b b
E Z(s, τn,K )|Fn for n ≤ K − 1, where τn,K = Tηn,K
s b
s + R s .
η n,K

Proof of Remark 17.2. It is a consequence of an optimal choice Rbn in (17.15).

{3}
Proof of Lemma 17.3. The form of Γn,K
s for the > t0 is obvious from (17.4)
case Tn
and (17.14). Let us assume (17.15) and (17.16) for n + 1, n + 2, . . ., K. For any τ ∈
s (i.e., τ ≥ T {3} ) we have {τ < T {3} } = {τ ∧ T {3} < T {3} } = {T {3} + Rb <
Tn,K n n+1 n+1 n+1 n n
{3}
Tn+1 }. This implies

{3} {3} {3} {3}
τ < Tn+1 = Sn+1 > Rbn , τ ≥ Tn+1 = Sn+1 ≤ Rbn . (17.18)
{3} b
Suppose that TK−1 ≤ t0 and take any τK−1,K ∈ TK−1,K
s . According to (17.18) and the
properties of conditional expectation,

{3} s
E [Z(s, τ )|Fns ] = E I {3}
E Z s, τ ∨ Tn+1 Fn+1 Fn

Sn+1 ≤Rb
n

{3} F s
+E I {3}
Z s, τ ∧ Tn+1 n
Sn+1 >Rb
n

{3}
= I{Rb ≤t0 −Tn } F̄(Rn )ŵb Ms , s, Mns , Tn + Rbn
n

{3} s
+ E I {3} b E Z s, τ ∨ Tn+1 Fn+1 Fns .

Sn+1 ≤Rn
b
Let σ ∈ Tn+1 . For every τ ∈ Tns we have
⎧
⎨σ {3}
if Rbn ≥ Sn+1 ,
τ=
⎩ T {3} + Rb {3}
if Rbn < Sn+1 .
n n
We also have

E[Z(s, τ )|Fns ] =E I {3} b E[Z(s, σ )|Fn+1
s
]|Fn
{S
n+1≤R }n

{3}
+ I{Rb ≤t0 −Tn } F̄ Rbn ŵb Ms , s, Mns , Tn + Rbn
n

≤ sup E I {3} Γn+1,K Fn
s
Sn+1 ≤R
R∈Mes(Fns )

{3} s
b
+I{R≤t0 −Tn } F̄(R)ŵ Ms , s, Mn , Tn + R
s
= E Z s, τn,K Fn
)|F s ] ≤ sup
It follows that supτ ∈Tns E[Z(s, τ )|Fns ] ≤ E[Z(s, τn,K τ ∈Tnb E[Z(s, τ )|Fn ],
s
n

where the last inequality is due to the fact that τn,K ∈ Tn,K . We apply the induction
s
hypothesis, which completes the proof.

{3}
Lemma 17.4. Γn,K s = γ s,Ms (M s , T
K−n n n ) for n = K, . . . , 0, where the sequence of
functions γ j is given recursively as follows:
s,m
= I{t≤t0 } ŵb (m, s, m,t)

γ0s,m (m,t) − CI{t>t0 } ,
γ s,m = I{t≤t0 } sup κγbs,m (m, s, m,t,

j (m,t) r) − CI{t>t0 } , (17.19)
r≥0 j−1
where

κδb (m, s, m,t,
r) = F̄(r) I{r≤t0 −t} ŵb (m, s, m,t
+ r) − CI{r>t0 −t}
r ∞
+ dF(z) δ (m
+ x,t + z) dH(x).
0 0
Proof of Lemma 17.4. Since the case for t > t0 is obvious, let us assume that
{3}
Tn ≤ t0 for n ∈ {0, . . . , K − 1}. According to Lemma 17.3, we obtain ΓK,K
s
=
{3}
γ0s,Ms (MKs , TK ), and thus the proposition is satisfied for n = K. Let n = K − 1; then
Lemma 17.3 and the induction hypothesis lead to

{3}
ΓK−1,K
s
= ess sup F̄ RbK−1 I b b M , s, M s
{3} ŵ s
b
K−1 , TK−1 + RK−1
R ≤t −T
K−1 ∈Mes(Fs,K−1 )
Rb K−1 0 K−1

{3}
−CI b
{3}
+ E I {3} b γ0
s,Ms
MK , TK
s
Fs,K−1 a.s.,
RK−1 >t0 −TK−1 SK ≤RK−1
{3} {3} {3} {3} {3}

where MKs = MK−1
s + XK , TK = TK−1 + SK , and the random variables XK
{3} {3}
and SK are independent of Fs,K−1 . Moreover, RbK−1 , MK−1
s , and TK−1 are Fs,K−1 -
measurable. It follows that

{3}
ΓK−1,K
s
= ess sup F̄ RbK−1 I b {3} ŵ
b
Ms , s, MK−1
s
, TK−1 + RbK−1
{RK−1 ≤t0 −TK−1 }
K−1 ∈Mes(Fs,K−1 )
Rb
Rb ∞
K−1 {3}
−CI b
{3}
+ dF(z) γ0
s,Ms
MK−1 + x, TK−1 + z dH(x)
s
RK−1 >t0 −TK−1 0 0

{3}
= γ1s,Ms MK−1
s
, TK−1 a.s.
{3}
Let n ∈ {1, . . ., K − 1} and suppose that Γn,K
s = γ s,Ms (M s , T
K−n n n ). As was done
previously, we conclude by Lemma 17.3 and the induction hypothesis that

Γn−1,K
s
= ess sup F̄ Rbn−1 I b b M , s, M s , T {3} + Rb
{3} ŵ s n−1 n−1 n−1
Rn−1 ≤t0 −Tn−1
n−1 ∈Mes(Fn−1 )
Rb s
Rb ∞
n−1 {3}
−CI b {3}
+ dF(s) γ s,Ms
K−n M s
n−1 + x, Tn−1 + s dH(x) a.s.
Rn−1 >t0 −Tn−1 0 0
{3}
Therefore, Γn−1,K
s
= γK−(n−1)
s,Ms
(Mn−1
s
, Tn−1 ).

Henceforth we will use αi to denote the hazard rate of the distribution Fi (i.e.,
αi = fi /F̄i ), and to shorten the notation, we set Δ • (a) = E ĝ• (a + X {i}) − ĝ•(a) ,
where • can be a or b.
Remark 17.3. The sequence of functions γ s,m
j can be expressed as
= I{t≤t0 } ŵb (m, s, m,t)

γ0s,m (m,t) − CI{t>t0 } ,

b b
γ s,m
j (
m,t) = I {t≤t0 } ŵ (m, s,
m,t) + y j (m − m,t − s,t0 − t) , −CI{t>t0 }
and ybj (a, b, c) is given recursively as follows:
yb0 (a, b, c) = 0
ybj (a, b, c) = max φybb (a, b, c, r),
0≤r≤c j−1
r b
where φδb (a, b, c, r) = 0 F̄(z){α (z) Δ (a) + Eδ (a + X
{3}, b + z, c − z) −

cb (b + z)}dz, and F is the c.d.f. of S{3} [α (t) is the hazard rate of the distribution
of S{3}].
Proof of Remark 17.3. Clearly
r ∞
{3} {3}
dF(s) γ s,m
j−1 (m + x,t + s) dH(x) = E I γ s,m
{S ≤r} j−1
{3} m + X ,t + S ,
0 0
where X {3} has the c.d.f. H. Since F is continuous and κγbs,m (m, s, m,t,
r) is bounded
j−1
and continuous for t ∈ R+ \ {t0 }, the supremum in (17.19) can be changed into a
maximum. Let r > t0 − t; then

κγbs,m (m, s, m,t,
r) = E I{S{3} ≤t0 −t } γ s,m
j−1 m + X {3}
,t + S {3}
−C F̄(t0 − t)
j−1

≤ E I{S{3} ≤t0 −t } γ s,m
j−1 m + X {3} ,t + S{3} + F̄(t0 − t)ŵb (m, s, m,t
0)
= κγbs,m (m, s, m,t,t

0 − t) .
j−1
The preceding calculations cause that γ s,m = I{t≤t0 } max0≤r≤t0 −t ϕ j (m, s, m,t,
j (m,t) r)

b
−CI{t>t0 } , where ϕ j (m, s, m,t, + r) + E I{S{3} ≤r} γ s,m
r) = F̄(r)ŵ (m, s, m,t j−1

(m {3} {3} {3}
+ X ,t + S ) . Obviously for S ≤ r and r ≤ t0 − t we have S ≤ t0 ; {3}
therefore, we can consider the cases t ≤ t0 and t > t0 separately. Let t ≤ t0 ; then
= ŵb (m, s, m,t),
γ0s,m (m,t) and the hypothesis is true for j = 0. The task is now to
calculate γ s,m
j+1 (
m,t) given γ s,m
j (·, ·). The induction hypothesis implies that for t ≤ t0

r) = F̄(r)ŵb (m, s, m,t
ϕ j+1 (m, s, m,t, + r) + E I{S{3} ≤r} γ s,m
j (m + X {3},t + S{3})

= ĝa (m) − ca (s) + F̄(r) ĝb (m
− m) − cb(t − s + r)
r
+ f(z){Eĝb (m
− m + X {3}) − cb(t − s + z)
0
+Eybj (m
− m + X {3},t − s + z,t0 − t − z)}dz.
It is clear that for any a and b

r
F̄(r) ĝb (a) − cb(b + r) = ĝb (a) − cb(b) − {f(z) ĝb (a) − cb(b + z)
0
b
+F̄(z)c (b + z)}dz;
therefore,
r
r) = ŵb (m, s, m,t)
ϕ j+1 (m, s, m,t, + F̄(z){α (z)[Δ b (m
− m)
0

+Eybj (m
− m + X {3},t − s + z,t0 − t − z)] − cb (t − s + z)}dz,
which proves the theorem. The case for t > t0 is trivial.

Following the methods of Ferenstein and Sierociński [6], we find the second
optimal stopping time. Let B = B([0, ∞) × [0,t0 ] × [0,t0 ]) be the space of all
bounded, continuous functions with the norm δ = supa,b,c |δ (a, b, c)|. It is easy to
check that B with the norm supremum is complete space. The operator Φ b : B → B
is defined by
(Φ b δ )(a, b, c) = max φδb (a, b, c, r). (17.20)
0≤r≤c
Let us observe that ybj (a, b, c) = (Φ b ybj−1 )(a, b, c). Remark 17.3 now implies that
∗ ∗
there exists a function rbj (a, b, c) such that ybj (a, b, c) = φybb (a, b, c, rb j (a, b, c)),
j−1
and this gives

∗
γ s,m
j (m,t)
= I{t≤t0 } ŵb (m, s, m,t)
+ φybb (m
− m,t − s,t0 − t, rbj
j−1

× (m
− m,t − s,t0 − t)) − CI{t>t0 } .
The consequence of the foregoing considerations is a theorem that determines the

b∗ in the following manner.
optimal stopping times τn,K
∗ ∗ {3} {3}
Theorem 17.1. Let Rbi = rK−i
b (M s − M , T
i s i − s,t0 − Ti ) for i = 0, 1, . . . , K.
∗ {3} ∗
Moreover, ηn,K
s = K ∧ inf{i ≥ n : Rbi < Si+1 }; then the stopping time b =
τn,K

{3} ∗ s = E Z(s, τ b∗ )|F s .
Tη s + Rbη s is optimal in the class Tn,K
s and Γn,K n,K n
n,K n,K
17.3.2 Infinite Number of Fish Caught

∗
The task is now to find the function J(s) and the stopping time τ b that is optimal in
the class T s . To obtain the solution of one stopping problem for an infinite number
of fish caught, it is necessary to set the restriction F(t0 ) < 1.
Lemma 17.5. If F(t0 ) < 1, then the operator Φ b : B → B defined by (17.20) is a
contraction.
Proof of Lemma 17.5. Let δi ∈ B for i ∈ {1, 2}. There exists ρi such that
(Φ b δi )(a, b, c) = φδbi (a, b, c, ρi ). We thus obtain
(Φ b δ1 )(a, b, c) − (Φ b δ2 )(a, b, c) = φδb1 (a, b, c, ρ1 ) − φδb2 (a, b, c, ρ2 )
≤ φδb1 (a, b, c, ρ1 ) − φδb2 (a, b, c, ρ1 )

ρ1 ∞
= dF(z) [δ1 − δ2 ](a + x, b + z, c − s) dH(x)
0 0
ρ1 ∞
≤ dF(z) sup |[δ1 − δ2 ](a, b, c)|dH(x)
0 0 a,b,c
≤ F(c) δ1 − δ2 ≤ F(t0 ) δ1 − δ2 ≤ C δ1 − δ2 ,

b b
where 0 ≤ C < 1. Similarly, as before, b (Φ δb2 )(a,
b, c) − (Φ δ1 )(a, b, c) ≤ C
δ2 − δ1 . Finally, we conclude that Φ δ1 − Φ δ2 ≤ C δ1 − δ2, which com-
pletes the proof.

Applying Remark 17.3, Lemma 17.5, and the fixed point theorem we conclude
the following remark
Remark 17.4. There exists yb ∈ B such that yb = Φ b yb and limK→∞ ybK − yb = 0.
According to the preceding remark, yb is the uniform limit of ybK when K tends to
infinity, which implies that yb is measurable and γ s,m = limK→∞ γKs,m is given by

= I{t≤t0 } ŵb (m, s, m,t)
γ s,m (m,t) + yb(m
− m,t − s,t0 − t) − CI{t>t0 } . (17.21)
We can now calculate the optimal strategy and the expected gain after changing
locations.
Theorem 17.2. If F(t0 ) < 1 and has the density function f, then:
∗
(i) For n ∈ N the limit τnb = limK→∞ τn,K
b a.s. exists and τ b ≤ t is an optimal
n 0
{3}
stopping rule in the set T s ∩ {τ ≥ Tn };
{3}
(ii) E Z(s, τnb )|Fns = γ s,m (Mns , Tn ) a.s.

Proof. (i) Let us first prove the existence of τnb . By the definition of Γn,K+1
s , we
have
Γn,K+1
s
= ess sup E [Z(s, τ )|Fns ] = ess sup E [Z(s, τ )|Fns ] ∨ ess sup E [Z(s, τ )|Fns ]
τ ∈Tn,K+1
s τ ∈Tn,K
s τ ∈TK,K+1
s

b∗
= E Z(s, τn,K )|Fns ∨ E [Z(s, σ ∗ )|Fns ] ,
b ∗ ∗ ∗
b or σ ∗ , where τ b ∈ T s and
and thus we observe that τn,K+1 is equal to τn,K n,K n,K
∗ b∗ b∗ , which implies that the
σ ∈ TK,K+1 , respectively. It follows that τn,K+1 ≥ τn,K
s
∗
b is nondecreasing with respect to K. Moreover, Rb ≤ t − T ∗ {3}
sequence τn,K i 0 i
b∗ ≤ t , and therefore τ b ≤ t exists.
for all i ∈ {0, . . ., K}; thus τn,K 0 n 0
Let us now look at the process ξ s (t) = (t, Mts ,V (t)), where s is fixed and
{3}
V (t) = t − TN (t) . ξ s (t) is a Markov process with the state space [s,t0 ] × [m, ∞) ×
3
[0, ∞). In a general case, the infinitesimal operator for ξ s is given by
∂ s,m ∂ s,m
Aps,m (t, m,
v) = p (t, m,
v) + p (t, m,
v)
∂t ∂v

+α (v) p (t, x, 0) dH(x) − p (t, m,

s,m s,m
v) ,
R+
v) : [0, ∞) × [0, ∞) × [0, ∞) → R is continuous, bounded, and

where ps,m (t, m,
measurable with bounded left-hand derivatives with respect to t and v [1, 17].
For t ≥ s the process Z(s,t) can be expressed as Z(s,t) = ps,m (ξ s (t)), where

ĝa (Ms ) − ca (s) + ĝb(Mts − Ms ) − cb(t − s) if s ≤ t ≤ t0 ,
p s,m
(ξ (t)) =
s
−C if t0 < t.
It follows easily that in our case Aps,m (t, m,

v) = 0 for t0 < t and

v) = α (v)[Eĝb (m
Aps,m (t, m, + X {3} − m) − ĝb(m
− m)] − cb (t − s) (17.22)

for s ≤ t ≤ t0 . The process ps,m (ξ s (t)) − ps,m (ξ s (s)) − st (Aps,m )(ξ s (z)) dz is a
martingale with respect to σ {ξ s (z), z ≤ t}, which is the same as Fts . This can
b∗ ≤ t , applying Dynkin’s formula we obtain
be found in [4]. Since τn,K 0
b∗
τn,K
s,m s b∗ {3}
E p (ξ (τn,K ))|Fn − p (ξ (Tn )) = E
s s,m s
{3}
(Ap )(ξ (z)) dz|Fn a.s.
s,m s s
Tn
(17.23)
From (17.22) we conclude that
τ b∗ b
τn,K
∗
n,K {3}
{3}
(Ap s,m
)(ξ (z)) dz = Eĝb (Mns + X {3} − m) − ĝb (Mns − m)
s
{3}
α (z − Tn ) dz
Tn Tn
τ b∗
n,K
− {3}
cb (z − s) dz.
Tn
Moreover, let us check that

τn,K
b∗ τ b∗
{3} 1 n,K {3} 1
{3} α z − Tn dz ≤ f z − Tn dz ≤ < ∞,
Tn F̄(t0 ) Tn{3}
F̄(t0 )

τn,K
b∗ ∗
b {3}
{3} c (z − s) dz = cb τn,K
b
− s − cb Tn − s < ∞,
Tn

b
Eĝ Mns + X {3} − m − ĝb (Mns − m) < ∞,
where the last two inequalities result from the fact that the functions ĝb and cb
are bounded. On account of the preceding observation we can use the dominated
convergence theorem and
b ∗
τn,K τnb
lim E {3}
(Aps,m )(ξ s (z)) dz|Fns = E {3}
(Aps,m )(ξ s (z)) dz|Fns .
K→∞ Tn Tn
(17.24)

Since τnb ≤ t0 , applying Dynkin’s formula to the left-hand side of (17.24) we
conclude that
b
τn s
E s,m s
(Ap ) (ξ (z)) dz Fn = E ps,m ξ s τnb Fns
{3}
Tn

{3}
−ps,m ξ s Tn a.s. (17.25)
Combining (17.23)–(17.25) we can see that

∗
b
lim E ps,m ξ s τn,K F s = E ps,m ξ s τ b∗ F s , (17.26)
n n n
K→∞
∗ ∗
b )|F s ] = E[Z(s, τ b )|F s ]. We next prove the optimality
hence limK→∞ E[Z(s, τn,K n n n
∗ {3} {3}
of τnb in the class T s ∩ {τnb ≥ Tn }. Let τ ∈ T s ∩ {τnb ≥ Tn }, and it is clear
{3} s . As τ b∗ is optimal in the class T s , we have
that τ ∧ TK ∈ Tn,K n,K n,K

b∗ {3}
lim E ps,m (ξ s (τn,K ))|Fns ≥ lim E ps,m (ξ s (τ ∧ TK ))|Fns . (17.27)
K→∞ K→∞
∗
From (17.26) and (17.27) we conclude that E ps,m (ξ s (τnb ))|Fns ≥ E [ps,m
{3}
(ξ s (τ ))|Fns ] for any stopping time τ ∈ T s ∩ {τ ≥ Tn }, which implies that
∗
τnb is optimal in this class.
∗ {3}
(ii) Lemma 17.4 and (17.26) lead to E[Z(s, τnb )|Fns ] = γ s,Ms (Mns , Tn ).

The remainder of this section will be devoted to the proof of the left-hand differ-
entiability of the function γ s,m (m, s) with respect to s. This property is necessary to
construct the first optimal stopping time. First, let us briefly denote δ (0, 0, c) ∈ B
by δ̄ (c).

Lemma 17.6. Let ν̄ (c) = Φ b δ̄ (c), δ̄ (c) ∈ B and δ̄+ (c) ≤ A1 for c ∈ [0,t0 ); then
ν̄+ (c) ≤ A2 .
Proof of Lemma 17.6. The derivative ν̄+ (c) exists because ν̄ (c) = max0≤r≤c φ̄ b (c, r),
where φ̄ b (c, r) is differentiable with respect to c and r. Fix h ∈ (0,t0 − c) and
define δ̄1 (c) = δ̄ (c + h) ∈ B and δ̄2 (c) = δ̄ (c) ∈ B. Obviously, Φ b δ̄1 − Φ b δ̄2 ≥
|Φ b δ̄1 (c) − Φ b δ̄2 (c)| = |Φ b δ̄ (c + h) − Φ b δ̄ (c)|, and on the other hand using
Taylor’s formula for the right-hand derivatives we obtain

δ̄1 − δ̄2 = sup δ̄ (c + h) − δ̄ (c) ≤ h sup δ̄ (c) + |o(h)| .
+
c c
From the foregoing and Remark 17.8 it follows that

|o(h)| ν̄ (c + h) − ν̄ (c) |o(h)|

−C sup δ̄+ (c) + ≤
≤ C sup δ̄+ (c) + ,
c h h c h

and letting h → 0+ gives ν̄+ (c) ≤ CA1 = A2 .

The significance of Lemma 17.6 is such that the function ȳ(t0 − s) has a bounded
left-hand derivative with respect to s for s ∈ (0,t0 ]. The important consequence of
this fact is shown by the following remark.
Remark 17.5. The function γ s,m can be expressed as γ s,m (m, s) = I{s≤t0 } u(m, s) −
CI{s>t0 } , where u(m, s) = ĝa (m) − ca (s) + ĝb (0) − cb (0) + ȳb (t0 − s) is continuous,
bounded, and measurable with the bounded left-hand derivatives with respect to s.
At the end of this section, we determine the conditional value function of the second
optimal stopping problem. According to (17.10), Theorem 17.2, and Remark 17.5,
we have
∗

J(s) = E Z(s, τ b )|Fs = γ s,Ms (Ms , s) a.s. (17.28)
17.4 Construction of the Optimal First Stopping Time
In this section, we formulate the solution of the double stopping problem. In the
first epoch of the expedition, the admissible strategies (stopping times) depend on
the formulation of the problem. For the optimization problem the most natural
strategies are the stopping times from T (see the relevant problem considered in
Szajowski [22]). However, when the bilateral problem is considered, the natural
class of admissible strategies depends on who uses the strategy. It should be T {i}
for the ith player. Here the optimization problem with restriction to the strategies
from T {1} in the first epoch is investigated.
The function u(m, s) has similar properties to those of the function ŵb (m, s, m,t)
and the process J(s) has a similar structure to that of the process Z(s,t). By this
observation one can follow the calculations of Sect. 17.3 to obtain J(s). Let us define
again Γn,K = ess supτ a ∈Tn,K E [J(τ a )|Fn ] , n = K, . . . , 1, 0, which fulfills the following
representation:
Lemma 17.7. Γn,K = γK−n (M n{1} , Tn{1} ) for n = K, . . . , 0, where the sequence of
functions γ j can be expressed as
γ0 (m, s) = I{s≤t0 } u(m, s) − CI{s>t0 } ,

a
γ j (m, s) = I{s≤t0 } u(m, s) + y j (m, s,t0 − s) , −CI{s>t0 }
and yaj (a, b, c) is given recursively as follows:
ya0 (a, b, c) = 0,
yaj (a, b, c) = max φyaa (a, b, c, r),
0≤r≤c j−1
where
r
φδa (a, b, c, r) = F̄1 (z) α1 (z) Δ a (a) + Eδ (a + x{1}, b + z, c − z)
0

− (ȳb− (c − z) + ca (b + z)) dz.
Lemma 17.7 corresponds to the combination of Lemma 17.4 and Remark 17.3 from
Sect. 17.3.1. Let the operator Φ a : B → B be defined by
(Φ a δ )(a, b, c) = max φδa (a, b, c, r). (17.29)

0≤r≤c
∗ (a, b, c) such that

Lemma 17.7 implies that there exists a function r1, j
γ j (m, s) = I{s≤t0 } u(m, s) + φyaa (m, s,t0 − s, r1,

∗
j (m, s,t0 − s)) − CI{s>t0 } .
j−1
We can now state the analog of Theorem 17.1.

∗ ∗ {1} {1}
Theorem 17.3. Let Rai = rK−i
a
(Mi , Ti ,t0 − Ti ) and ηn,K = K ∧ inf{i ≥ n :
{1} a∗ {1} a∗
Rai ∗ < Si+1 }; then τn,K = Tηn,K + Rηn,K is optimal in the class Tn,K and Γn,K =

a∗
E J(τn,K )|Fn .
The following results may be proved in much the same way as in Sect. 17.3.
Lemma 17.8. If F1 (t0 ) < 1, then the operator Φ a : B → B defined by (17.29) is a
contraction.
Remark 17.6. There exists ya ∈ B such that ya = Φ a ya and limK→∞ yaK − ya = 0.
The preceding remark implies that γ = limK→∞ γK is given by
γ (m, s) = I{s≤t0 } [u(m, s) + ya (m, s,t0 − s)] − CI{s>t0 } . (17.30)
We can now formulate our main results.

Theorem 17.4. If F1 (t0 ) < 1 and has the density function f1 , then
∗ ∗ ∗
(i) For n ∈ N the limit τna = limK→∞ τn,K
a a.s. exists and τ a ≤ t is an optimal
n 0
{1}
stopping rule in the set T ∩ {τ ≥ Tn },
∗ {1}
(ii) E J(τna )|Fn = γ (Mn , Tn ) a.s.
Proof. The proof follows the same method as in Theorem 17.2. The difference lies
in the form of the infinitesimal operator. Define the processes ξ (s) = (s, Ms ,V (s)),
{1}
where V (s) = s− TN (s) . As was the case previously, ξ (s) is the Markov process with
1
the state space [0, ∞) × [0, ∞) × [0, ∞). Notice that J(s) = p(ξ (s)) and p(s, m, v) :
[0,t0 ] × [0, ∞) × [0, ∞) → R are continuous, bounded, and measurable with the
bounded left-hand derivatives with respect to s and v. It is easily seen that

Ap(s, m, v) = α1 (v)[Eĝa (m + x{1} ) − ĝa (m)] − ȳb− (t0 − s) + ca (s) for s ≤ t0 . The
rest of the proof is the same as in the proof of Theorem 17.2.

Summarizing, the solution of a double stopping problem is given by
∗ ∗ ∗ {1}
EZ(τ a , τ b ) = EJ(τ a ) = γ (M0 , T0 ) = γ (0, 0),
∗ ∗
where τ a and τ b are defined according to Theorems 17.2 and 17.4, respectively.
17.5 Examples
The form of the solution results in the fact that it is difficult to calculate the solution
in an analytic way. In this section, we will present examples of the conditions for
which the solution can be calculated explicitly.
Remark 17.7. If the process ζ2 (t) = Aps,m (ξ s (t)) has decreasing
paths,
then
∗ {3}
the second optimal stopping time is given by τnb = inf{t ∈ Tn ,t0 : Aps,m
(ξ s (t)) ≤ 0}; on the other hand, if ζ2 (t) has nondecreasing paths, then the second
optimal stopping time is equal to t0 .
Similarly, if the process ζ1 (s) = Ap(ξ (s)) has decreasing
paths, then the first
a∗ {1}
optimal stopping time is given by τn = inf{s ∈ Tn ,t0 : Ap(ξ (s)) ≤ 0}; on the
other hand, if ζ1 (s) has nondecreasing paths, then the first optimal stopping time is
equal to t0 .
∗ {3} τnb∗
Proof. From (17.25) we obtain E[Z(s, τnb )|Fns ] = Z(s, Tn ) + E[ {3} (Aps,m )
Tn
(ξ s (z)) dz] a.s., and the application results of Jensen and Hsu [11] complete the
proof.

Corollary 17.2. If S{3} has an exponential distribution with constant hazard rate
α , the function ĝb is increasing and concave, the cost function cb is convex, and
{3}
t2,n = Tn , msn = Mns , then
∗
τnb = inf{t ∈ [t2,n ,t0 ] : α [Eĝb (msn + x{3} − m) − ĝb(msn − m)] ≤ cb (t − s)}, (17.31)
where s is the moment when the location is changed. Moreover, if S{1} has an
exponential distribution with constant hazard rate α1 , ĝa is increasing and concave,
{1} n{1} , then
ca is convex, and t1,n = Tn , mn = M
∗

τna = inf{s ∈ [t1,n ,t0 ] : α1 Eĝa (mn + x{1} ) − ĝa(mn ) ≤ ca (s)}.
∗
Proof. The form of τ a ∗n and τnb is a consequence of Remark 17.7. Let us observe

that by our assumptions ζ2 (t) = αΔ b (Mts − m) − cb (t − s) has decreasing paths for
{3} {3} {3} 3 ) = α [Δ b (M s − m) −
t ∈ [Tn , Tn+1 ). It suffices to prove that ζ2 (Tn ) − ζ2 (Tn−1 n
Δ b (Mn−1
s − m)] < 0 for all n ∈ N.
∗ ∗
It remains to check that ȳb− (t0 − s) = 0. We can see that τ b = τ b (s) is
deterministic, which is clear from (17.31). If s ≤ t0 , then combining (17.25), (17.26),
b∗
gives γ s,m (m, s) = E Z(s, τ b )|Fs = Z(s, s) + E sτ (Aps,m )(ξ s (z))
∗
and (17.28)

dz|Fs . By Remark 17.5, it follows that
τ b ∗ (s)
∗
τ b (s)
b
ȳ (t0 − s) = E (Ap s,m
)(ξ (z)) dz =
s
αΔ b (0) − c2(z − s) dz,
s s
and this yields

τ b ∗ (s)
∗ ∗
ȳb− (t0 − s) = c2 (z − s) dz + τ b (s) αΔ b (0) − c2 (τ b 2 (s) − s)
s

− αΔ b (0) − c2(0)
∗

= c2 (τ b (s) − s) − c2(0) − αΔ b (0) − c2(0) = 0. (17.32)
The rest of proof runs as previously.

g•i
Corollary 17.3. If for i = 1 and i = 2 the functions are increasing and convex,
ci are concave, and S{i} have an exponential distribution with constant hazard rate
∗ ∗
αi (i.e., α = α2 ), then τna = τnb = t0 for n ∈ N.
Proof. This is also the straightforward consequence of Remark 17.7. It suffices

to check that ȳb− (t0 − s) is nonincreasing with respect to s. First observe that
∗
τ b (s) = t0 . Considering (17.32), it is obvious that ȳb− (t0 − s) = α2 Δ b (0) − c2
(t0 − s), and this completes the proof.

17.6 Conclusions
This article presents a solution of the double stopping problem in the “fishing
model” for a finite horizon. The analytical properties of the reward function in
one stopping problem played a crucial rule in our considerations and allowed us to
obtain a solution to the extended double stopping problem. Repeating considerations
from Sect. 17.4, we can easily generalize our model and the solution to the multiple
stopping problem, but the notation can be inconvenient. The construction of the
equilibrium in the two-person non-zero-sum problem formulated in Sect. 17.2 can
be reduced to the two double optimal stopping problems in the case where the payoff
structure is given by (17.5), (17.6), and (17.11). The key assumptions related to
the properties of the distribution functions. Assuming general distributions and an
infinite horizon, one can obtain the extensions of the foregoing model.
References
1. Boshuizen, F., Gouweleeuw, J.: General optimal stopping theorems for semi-Markov
processes. Adv. Appl. Probab. 4, 825–846 (1993)
2. Boshuizen, F.A.: A general framework for optimal stopping problems associated with multi-
variate point processes, and applications. Sequential Anal. 13(4), 351–365 (1994)
3. Brémaud, P.: Point Processes and Queues. Martingale Dynamics. Springer, Berlin (1981)
4. Davis, M.H.A.: Markov Models and Optimization. Chapman & Hall, New York (1993)
5. Ferenstein, E., Pasternak-Winiarski, A.: Optimal stopping of a risk process with disruption
and interest rates. In: Brèton, M., Szajowski, K. (eds.) Advances in Dynamic Games:
Differential and Stochastic Games: Theory, Application and Numerical Methods, Annals of
the International Society of Dynamic Games, vol. 11, 18 pp. Birkhäuser, Boston (2010)
6. Ferenstein, E., Sierociński, A.: Optimal stopping of a risk process. Appl. Math. 24(3), 335–342
(1997)
7. Ferguson, T.: A Poisson fishing model. In: Pollard, D., Torgersen, E., Yang, G. (eds.) Festschrift
for Lucien Le Cam: Research Papers in Probability and Statistics. Springer, Berlin (1997)
8. Haggstrom, G.: Optimal sequential procedures when more then one stop is required. Ann.
Math. Stat. 38, 1618–1626 (1967)
9. Jacobsen, M.: Point process theory and applications. Marked point and piecewise deterministic
processes. In: Prob. and its Applications, vol. 7. Birkhäuser, Boston (2006)
10. Jensen, U.: An optimal stopping problem in risk theory. Scand. Actuarial J. 2, 149–159 (1997)
11. Jensen, U., Hsu, G.: Optimal stopping by means of point process observations with applications
in reliability. Math. Oper. Res. 18(3), 645–657 (1993)
12. Karpowicz, A.: Double optimal stopping in the fishing problem. J. Appl. Prob. 46(2), 415–428
(2009). DOI 10.1239/jap/1245676097
13. Karpowicz, A., Szajowski, K.: Double optimal stopping of a risk process. GSSR Stochast. Int.
J. Prob. Stoch. Process. 79, 155–167 (2007)
14. Kramer, M., Starr, N.: Optimal stopping in a size dependent search. Sequential Anal. 9, 59–80
(1990)
15. Muciek, B.K., Szajowski, K.: Optimal stopping of a risk process when claims are covered
immediately. In: Mathematical Economics, Toru Maruyama (ed.) vol. 1557, pp. 132–139.
Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502 Japan
Kôkyûroku (2007)
16. Nikolaev, M.: Obobshchennye posledovatelnye procedury. Litovskiui Mat. Sb. 19, 35–44
(1979)
17. Rolski, T., Schmidli, H., Schimdt, V., Teugels, J.: Stochastic Processes for Insurance and
Finance. Wiley, Chichester (1998)
18. Shiryaev, A.: Optimal Stopping Rules. Springer, Berlin (1978)
19. Starr, N.: Optimal and adaptive stopping based on capture times. J. Appl. Prob. 11, 294–301
(1974)
20. Starr, N., Wardrop, R., Woodroofe, M.: Estimating a mean from delayed observations. Z. f ür
Wahr. 35, 103–113 (1976)
21. Starr, N., Woodroofe, M.: Gone fishin’: optimal stopping based on catch times. Report No. 33,
Department of Statistics, University of Michigan, Ann Arbor, MI (1974)
22. Szajowski, K.: Optimal stopping of a 2-vector risk process. In: Stability in Probability,
Jolanta K. Misiewicz (ed.), Banach Center Publ. 90, 179–191. Institute of Mathematics, Polish
Academy of Science, Warsaw (2010), doi:10.4064/bc90-0-12
Chapter 18
A Nonzero-Sum Search Game with Two
Competitive Searchers and a Target
Ryusuke Hohzaki
Abstract In this paper, we deal with a nonzero-sum three-person noncooperative

search game, where two searchers compete for detection of a target and the target
tries to evade the searchers. We verify that there occurs cooperation between two
searchers against the target in the game with a stationary target and for a special
case of the game with a moving target. Using these characteristics, we can partially
regard the three-person nonzero-sum game as an equivalent two-person zero-sum
game with the detection probability of target as a payoff. For a general game with
a moving target, however, there could be many Nash equilibria. We propose a
numerical algorithm for a Nash equilibrium in the general case. The discussion
on the nonzero-sum search game in this paper could help us to step forward to
a cooperative search game, where a coalition of some searchers and the rest of
searchers compete against each other for detection of target as the future work.
Keywords Search theory • Game theory • Nonzero-sum noncooperative game

• Search allocation game
18.1 Introduction
Search theory starts from military affairs. As an application of game theory to search
problem, Morse and Kimball [20] discuss a position planning of the patrol line
in the straits by an anti-submarine warfare (ASW) airplane to block the passages
of submarines. For several decades after the research, they focus on one-sided
problems for an optimal search under the assumption that the stochastic rule on
the behavior of the submarine is known [18].
R. Hohzaki ()
Department of Computer Science, National Defense Academy, 1-10-20 Hashirimizu,
Yokosuka 239-8686, Japan
352 R. Hohzaki
Since then, many researchers take one of the game models, named
“hide-and-search game”, where a stationary target hides at a position. Norris [24]
deals with a two-person zero-sum (TPZS) noncooperative game, where a target
hides in a box at first and then a searcher sequentially looks into boxes with possible
overlooking and a payoff is the expected number of looks until the detection of
the target. Baston and Bostock [2] and Garnaev [7] study an ASW game, where
an ASW airplane drops some depth charges to destroy a hidden submarine. They
adopt the destruction probability of the target as a payoff. We can quote Nakai [23],
Kikuta [17] and Iida et al. [16] as the other studies on the hide-and-search game.
Their models are still the TPZS noncooperative game but they adopt a variety
of payoff functions: detection probability, expected reward, expected time until
detection and others.
The hide-and-search game is extended to the game with a moving target, named
“evade-and-search game”. Meinardi [19] analyzes the evade-and-search game,
where a target moves on a line in a diffusive fashion and a searcher looks at a
point on the line sequentially as time elapses. The target is informed of the history
of searched points and then the game is modeled as a multi-stage TPZS game.
Washburn [29] and Nakai [22] adopt the multi-stage model with the payoff of the
moving distance of the searcher until detection of target. Their models are similar
to Meinardi’s one. Danskin [5] discusses a one-shot game played by a submarine
and an ASW airplane. The submarine chooses a point to move to and the airplane
selects a point to drop his dipping sonar buoys for detection of the submarine. Eagle
and Washburn [6] also study the one-shot game, where a searcher moves in a search
space as well as in Danskin.
Hohzaki [11, 12], Iida et al. [15] and Dambreville and Le Cadre [4] deal with an
optimal distribution of searching resources for a searcher and an optimal moving
strategy for a target by a one-shot game called “search allocation game (SAG)”.
For the one-shot SAG, Washburn [30] and Hohzaki [8, 13, 14] take account of
practical constraints such as geographical restriction or energy limitation on moving.
Hohzaki [9] proposes a method to derive an optimal solution for a multi-stage SAG.
As we reviewed the previous research on the search game above, almost all
researchers handle the TPZS noncooperative game of a target versus a searcher
although we can itemize small number of special game models such as Baston and
Garnaev [3], who study a nonzero-sum game with a searcher and a protector of
taking the distribution strategies of resources. However, we can think of cooperative
search situation, in which several searchers cooperate for an effective search for
the target or a drifting person in a shipwreck would take a cooperative action to a
search and rescue (S&R) team, such as firing a distress signal or a flashlight to be
easily detected. Hohzaki [10] is one of few researches on the search defined by the
cooperative game (refer to Owen [26] or Myerson [21]). Using the framework of the
SAG, Hohzaki models the search game with some cooperative searchers against a
moving target under the criterion of detection probability of target. The discussion
includes the imputation of the obtained reward among cooperative searchers of a
team or a coalition, which is a common theme for an ordinary coalition game.
There are other types of cooperative search problems. Alpern and Gal [1] have
18 A Nonzero-Sum Search Game with Two Competitive Searchers and a Target 353
been studying the so-called rendezvous search problem, where players try to meet
each other as soon as possible without knowing the exact position of another
player. In the context of graph theoretic problem, we can enumerate further works.
Parsons [27, 28] studies how many searchers are required to find a mobile hider in a
graph. O’ Rourke [25] theoretically or algorithmically derives the minimum number
of guards to watch over the interior of a polygon-shaped art gallery by computational
geometry. The problem on security by watchmen or guards are named “art gallery
problem”.
In the Hohzaki’s model [10], he proves that searchers have the incentive to
construct a grand coalition and develop his theory based on the assumption that
only the members of the coalition join the search operation for the target. However
there could be a competition between the coalition’s members and nonmembers. In
the treasure hunting from shipwreck, the nonmembers would be going to outwit the
coalition for the preemptive detection of the treasure. In this paper, we discuss a
three-person nonzero-sum noncooperative search game, where two teams or two
coalitions of searchers compete for the detection of a target and the target tries
to evade the teams, and derive a Nash-equilibrium (NE). The results of this paper
would help us step forward to an other type of cooperative game or coalition game,
where several groups of searchers compete each other, other than the Hohzaki’s
model and discuss the incentive of the groups to a larger group or a grand coalition
beyond their competition.
As a preliminary, we consider a search game for a stationary target by a three-
person nonzero-sum noncooperative game model in the next section. In Sect. 18.3,
we discuss a game with a moving target. Because it would be difficult to derive a
NE, we try to do it for a small size of problem at first and propose a computational
algorithm for the NE of the general game with a moving target. In Sect. 18.4, we
analyze the characteristics of the NE by some numerical examples.
18.2 A Nonzero-Sum Search Game with a Stationary Target
We consider the search game where two searchers compete against each other to get
the value of a target while the target evades from them. The problem is formulated
as a three-person nonzero-sum noncooperative game.
(A1) A search space is discrete and it consists of n cells denoted by K =
{1, 2, · · · , n}. A target with value 1 chooses one cell to hide himself.
(A2) Searcher 1 has the amount Φ1 of searching resources in hand and distributes
them in the search space to detect the target while Searcher 2 has the amount
Φ2 of resources.
(A3) If the target is in cell i and the searcher scatters x resources there, the searcher
can detect the target with probability fi (x) = 1 − exp(−αi x), where parameter
αi indicates the effectiveness of unit resource for detection. The event of
detection by one searcher is independent of the detection by the other.
354 R. Hohzaki
(A4) If only one searcher detects the target, the detector monopolizes the value of
the target 1. If both of them do, Searcher 1 gets a reward δ1 and Searcher 2
δ2 , where δ1 + δ2 does not necessarily equal 1. The target is given 1 only if he
is not detected.
We denote a mixed strategy of the target by p = {p1 , p2 , · · · , pn }, where pi is the
probability of hiding in cell i. Let us denote a pure strategy of Searcher 1 or 2 by
x = {xi , i ∈ K} or y = {yi , i ∈ K}, respectively. xi or yi is the respective amount
of resource distributed in cell i by Searcher 1 or 2. We denote feasible regions of
players’ strategies p, x and y by Π , Ψ1 and Ψ2 , which are given by

Π ≡ p ∈ Rn |pi ≥ 0, i ∈ K, ∑ pi = 1
i∈K

Ψ1 ≡ x ∈ R |xi ≥ 0, i ∈ K,
n
∑ xi ≤ Φ1
i∈K

Ψ2 ≡ y ∈ R |yi ≥ 0, i ∈ K,
n
∑ yi ≤ Φ2 .
i∈K
The searchers had obviously better use up all resources because the detection
function fi (x) is monotone increasing for x, as stated in (A3). Therefore, we can
replace inequality signs with equality signs in the definitions of Ψ1 and Ψ2 .
From the independency of detection events in Assumption (A3), three players of
the target, Searcher 1 and 2 would have the expected rewards or payoffs Q(p, x, y),
R1 (p, y, x) and R2 (p, y, x), expressed by
Q(p, x, y) = ∑ pi exp(−αi (xi + yi)) (18.1)

i∈K
R1 (p, x, y) = ∑ pi (1 − exp(−αi xi )) {exp(−αi yi )

i∈K
+δ1 (1 − exp(−αi yi ))} (18.2)

R2 (p, y, x) = ∑ pi (1 − exp(−αi yi )) {exp(−αi xi )
i∈K
+δ2 (1 − exp(−αi xi ))} . (18.3)
The payoff R1 (·) is strictly concave for variable x and R2 (·) is also the same for y.
The feasible regions Ψ1 and Ψ2 are closed convex sets. Therefore, if there is a
Nash equilibrium (NE), we can find it among pure strategies of the searchers. From
here, we consider maximization problems for these expected payoffs and derive an
optimal response of a player to others.
1. Optimal response of the target
We can transform a maximization problem of the target’s payoff Q(p, x, y), which
is the non-detection probability of the target, as follows.
max Q(p, x, y) = max

p∈Π p∈Π i∈K
∑ pi exp(−αi (xi + yi ))
= max exp(−αi (xi + yi )) = exp(− min αi (xi + yi )).

i∈K i∈K
As seen by the transformation from the second expression to the third, an optimal
target’s strategy p∗ ∈ Π is given by p∗i = 0 for i ∈ / I ∗ and arbitrary p∗i for i ∈ I ∗ ,
using a set of cells I ≡ {i ∈ K|αi (xi + yi ) = ν ≡ mins∈K αs (xs + ys )} or I ∗ ≡
∗
Arg mins∈K αs (xs + ys ).

2. Optimal response of Searcher j = 1, 2
If i does not belong to I ∗ , namely, αi (xi + yi ) > ν , the analysis of Item 1 above
tells us that it must be pi = 0 and then both searchers j = 1, 2 should not distribute
any resource in the cell i, i.e. x∗i = y∗i = 0, from Eqs. (18.2) and (18.3). This
contradicts the first assumption of i ∈ / I ∗ . Therefore, it should be αi (xi + yi ) =
ν (const) for every i ∈ K and we have
1
ν= (Φ1 + Φ2 ) (18.4)
∑s∈K 1/αs
1/αi
xi + yi = (Φ1 + Φ2 ), i ∈ K. (18.5)
∑s∈K 1/αs
To determine the optimal response of a searcher, we consider a basic problem

with one searcher to maximize the detection probability or minimize the non-
detection probability equivalently by the limited amount of searching resource,
Φ , given that the target’s strategy pi . The problem is formulated as follows.
min
x
∑ pi exp(−αi xi ), s.t. xi ≥ 0, i ∈ K, ∑ xi ≤ Φ . (18.6)
i∈K i∈K
By the help of the definition of a Lagrange function L(x; λ , μ ) ≡ ∑i∈K

pi exp(−αi xi ) + λ (∑i xi − Φ ) − ∑i μi xi , we can obtain necessary and sufficient
conditions for optimality as the Karush–Kuhn–Tucker conditions. The conditions
for optimal x are unified into
1 pi αi +
xi = log , (18.7)
αi λ
where [x]+ means [x]+ ≡ max{0, x}. An optimal Lagrangian multiplier λ is

uniquely determined from ∑i xi = Φ and Eq. (18.7).
Noting that we can generate Eq. (18.2) by replacing pi in the objective func-
tion (18.6) with pi D1i (y), where
D1i (y) ≡ exp(−αi yi ) + δ1 (1 − exp(−αi yi )) = (1 − δ1 ) exp(−αi yi ) + δ1 ,

356 R. Hohzaki
we apply Eq. (18.7) to the original problem with two searchers to derive an optimal
response x for Searcher 1 as
+ +
1 pi αi D1i (y) 1 pi αi
xi = log = log + log {(1 − δ1) exp(−αi yi ) + δ1 } ,
αi λ1 αi λ1
(18.8)
given other strategies p and y. Similarly, we have an optimal response y for Searcher
2 given strategies p and x, as follows:
+
1 pi αi
yi = log + log{(1 − δ2 ) exp(−αi xi ) + δ2 } . (18.9)
αi λ2
Optimal Lagrangian multipliers λ1 and λ2 in Eqs. (18.8) and (18.9) are determined
by conditions ∑i xi = Φ1 and ∑i yi = Φ2 , respectively.
As a result, we organize the necessary and sufficient conditions for optimal x, y
and p in the following system of equations.
+
1 pi αi
xi = log + log{(1 − δ1 ) exp(−αi yi ) + δ1 } , i ∈ K (18.10)
αi λ1
+
1 pi αi
yi = log + log{(1 − δ2 ) exp(−αi xi ) + δ2 } , i ∈ K (18.11)
αi λ2
1/αi
xi + yi = (Φ1 + Φ2 ), i ∈ K (18.12)
∑ j∈K 1/α j
∑ xi = Φ1 or ∑ yi = Φ2 (18.13)
i∈K i∈K
∑ pi = 1. (18.14)
i∈K
We need only one of equations (18.13) for a full system because we can derive the
other equation of (18.13) from Eq. (18.12). The total number of variables x, y, p,
λ1 and λ2 is 3|K| + 2, which is the same as the number of equations contained in
the system. If all equations of the system are independent, optimal variables are
uniquely determined.
We can show that the following solution satisfies the conditions above.
1/αi
x∗i = Φ1 (18.15)
∑ j∈K 1/α j
1/αi
y∗i = Φ2 (18.16)
∑ j∈K 1/α j
1/αi
p∗i = . (18.17)
∑ j∈K 1/α j
We can easily see that the above strategies satisfy the conditions (18.12)∼(18.14).
Noting that αi x∗i , αi y∗i , and αi p∗i do not depend on the cell number i, we can derive
Lagrangian multipliers λ1 and λ2 by substituting Eqs. (18.15)∼(18.17) in (18.10)
and (18.11).

Φ1 Φ2 1
λ1∗ = exp −
∑ j 1/α j
(1 − δ1) exp −
∑ j 1/α j
+ δ1 ∑ αj (18.18)
j

Φ2 Φ1 1
λ2∗ = exp −
∑ j 1/α j
(1 − δ2) exp −
∑ j 1/α j
+ δ2 ∑ αj (18.19)
j
Let us note that the optimal strategies (18.15), (18.16) and (18.17) are also optimal
for the TPZS game, where a searcher with Φ1 + Φ2 resources and a target fight
against each other for the payoff of non-detection probability. We can easily verify
the optimality of x∗i + y∗i and p∗i for the TPZS game by solving the following
minimax or maximin optimization where a searcher’s strategy is zi = xi + yi
generated by the original strategies of two searchers, xi and yi .
min max ∑ pi exp(−αi zi ) = max min ∑ pi exp(−αi zi ).

{zi } {pi } i∈K {pi } {zi } i∈K
If δ1 + δ2 = 1, there is the relation of R1 (p, x, y) + R2 (p, y, x) = 1 − Q(p, x, y) be-

tween R1 (p, x, y) and R2 (p, y, x). Therefore, the minimization of the non-detection
probability Q(p, x, y) has Pareto optimality for two searchers in the case of
δ1 + δ2 = 1 but does not necessarily have it in other cases. Nevertheless, the
searchers’ strategies of the NE have a direction to the minimization of the non-
detection probability. The property is caused by the target strategy aiming the
maximization of the non-detection probability, as explained just before deriving
Eq. (18.5). The target strategy forces both searchers to have xi + yi = (Φ1 + Φ2 )
/αi / ∑s 1/αs , which is an optimal representative response of both searchers against
the target in the TPZS game.
18.3 A Search Game with a Moving Target
Here we consider the nonzero-sum game with two searchers and a moving target.
Two searchers play in a noncooperative manner for the detection of target. The target
moves in a search space to avoid the detection. The game with a moving target is
modeled as follows:
(B1) A search space consists of a discrete cell space K = {1, · · · , K} and a discrete
time space T = {1, · · · , T }.
(B2) A target chooses one among a set of routes Ω to move along. His position on
a route ω ∈ Ω at time t ∈ T is represented by ω (t) ∈ K.
358 R. Hohzaki
(B3) Two searchers distribute their searching resources to detect the target. Φk (t)
resources are available at each time t for Searcher k = 1, 2. Searchers can start
distributing resource from time τ ∈ T.
(B4) The detection of the target by the distribution of x resources in cell i occurs
with probability 1 − exp(−αi x) only if the target is in the cell i. The parameter
αi indicates the effectiveness of unit resource in the cell i. The events of
detection by two searchers are independent of each other.
If a searcher detects the target, the detector monopolizes the value of the
target 1. If both searchers detect, Searcher 1 and 2 get reward δ1 and δ2 (0 ≤
δ1 , δ2 ≤ 1), respectively. The game is terminated on detection of the target or
at the last time T . The target is given 1 only if he is not detected.
(B5) Players do not know any information about the behavior of other players and
the situation of the search in the process of the game. Therefore, all players
make their plans or strategies in advance of the game.
Let T = {τ , τ + 1, · · · , T } be an available time period for searching. We denote
a distribution plan of Searcher k = 1, 2 by ϕk = {ϕk (i,t), i ∈ K, t ∈ T}, where
ϕk (i,t) ∈ R is the amount of searching resources distributed in cell i at time t, and a
mixed strategy of the target by π = {π (ω ), ω ∈ Ω }, where π (ω ) is the probability
of taking path ω ∈ Ω .
When the target takes a path ω and the searchers adopt their strategies ϕ1 and ϕ2 ,
non-detection probability Q(ω , ϕ1 , ϕ2 ) is given by

T
Q(ω , ϕ1 , ϕ2 ) = exp − ∑ αω (t) (ϕ1 (ω (t),t) + ϕ2 (ω (t),t)) . (18.20)
t=τ
The detection at time t is conditioned that no detection occurs before t, the

probability of which is

t−1
exp − ∑ αω (ζ ) (ϕ1 (ω (ζ ), ζ ) + ϕ2 (ω (ζ ), ζ )) .
ζ =τ
Taking account of the detection probability by Searcher k at time t, 1 − exp(−αω (t)

ϕk (ω (t),t)), and the detection event by the other searcher, the reward that the
searcher k expects at time t is

t−1
exp − ∑ αω (ζ ) (ϕ1 (ω (ζ ), ζ ) + ϕ2 (ω (ζ ), ζ )) (1 − exp(−αω (t) ϕk (ω (t),t)))
ζ =τ

× exp(−αω (t) ϕ j (ω (t),t)) + δk (1 − exp(−αω (t) ϕ j (ω (t),t))) .
Because the detection event is exclusive at each time, the total expected reward of
Searcher k, Rk (ω , ϕk , ϕ j ), (k, j) ∈ {(1, 2), (2, 1)}, is given

T t−1
Rk (ω , ϕk , ϕ j ) = ∑ exp − ∑ αω (ζ ) (ϕk (ω (ζ ), ζ ) + ϕ j (ω (ζ ), ζ ))
t=τ ζ =τ

× (1 − exp(−αω (t) ϕk (ω (t),t))) × exp(−αω (t) ϕ j (ω (t),t))

+ δk (1 − exp(−αω (t) ϕ j (ω (t),t))) (18.21)
for the target path ω . As a result, we have the payoffs of the target and Searcher k,
Q(π , ϕ1 , ϕ2 ) and Rk (π , ϕk , ϕ j ), as follows:
Q(π , ϕ1 , ϕ2 ) = ∑ π (ω )Q(ω , ϕ1 , ϕ2 ) (18.22)

ω ∈Ω
Rk (π , ϕk , ϕ j ) = ∑ π (ω )Rk (ω , ϕk , ϕ j ), (k, j) = (1, 2), (2, 1). (18.23)

ω ∈Ω
Now that we have formulated the three-person nonzero-sum game with a target and
two searchers, the next thing to do is to obtain a NE which maximizes Q(π , ϕ1 , ϕ2 ),
R1 (π , ϕ1 , ϕ2 ) and R2 (π , ϕ2 , ϕ1 ) with respect to π , ϕ1 and ϕ2 , respectively. The
optimality conditions of the NE, (π ∗ , ϕ1∗ , ϕ2∗ ), are
Q(π ∗ , ϕ1∗ , ϕ2∗ ) ≥ Q(π , ϕ1∗ , ϕ2∗ ), R1 (π ∗ , ϕ1∗ , ϕ2∗ ) ≥ R1 (π ∗ , ϕ1 , ϕ2∗ ),

R2 (π ∗ , ϕ2∗ , ϕ1∗ ) ≥ R2 (π ∗ , ϕ2∗ , ϕ1 ) (18.24)
for arbitrary π ∈ Π , ϕ1 ∈ Ψ1 and ϕ2 ∈ Ψ2 , where Π and Ψk are the feasible regions

of players’ strategies π and ϕk (k = 1, 2), given by

Π≡ π |π (ω ) ≥ 0, ω ∈ Ω , ∑ π (ω ) = 1 (18.25)
ω ∈Ω

Ψk ≡
ϕk |ϕk (i,t) ≥ 0, i ∈ K,t ∈ T, ∑ ϕk (i,t) = Φk (t),t ∈ T . (18.26)

i∈K
All are closed convex sets. We can see that the payoff function, Q(π , ϕ1 , ϕ2 ) is
linear for π and convex for ϕ1 and ϕ2 . We are going to prove the strictly concavity
of Rk (ω , ϕk , ϕ j ) for ϕk .
Using a notation

t−1
β j (ω ,t) ≡ exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ )) × exp(−αω (t) ϕ j (ω (t),t))
ζ =τ

+δk (1 − exp(−αω (t) ϕ j (ω (t),t))) ,
360 R. Hohzaki
we can transform Rk (ω , ϕk , ϕ j ) like
Rk (ω , ϕk , ϕ j )

T t−1
= ∑ β j (ω ,t) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) (1− exp(−αω (t) ϕk (ω (t),t)))
t=τ ζ =τ

T t−1 t
= ∑ β j (ω ,t) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) − exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ )
t=τ ζ =τ ζ =τ

T −1 t
= β j (ω , τ ) − ∑ (β j (ω ,t) − β j (ω ,t + 1)) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ )
t=τ ζ =τ

T
−β j (ω , T ) exp − ∑ αω (ζ ) ϕk (ω (ζ ), ζ ) . (18.27)
ζ =τ
Noting

t
β j (ω ,t + 1) = exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ ))
ζ =τ

× (1 − δk ) exp(−αω (t+1) ϕ j (ω (t + 1),t + 1)) + δk

t−1
≤ exp − ∑ αω (ζ ) ϕ j (ω (ζ ), ζ ))
ζ =τ

× (1 − δk ) exp(−αω (t) ϕ j (ω (t),t)) + δk
= β j (ω ,t),
the last expression of Eq. (18.27) is proved to be strictly concave.

From the strictly concavity of Rk (·) for ϕk and the closed convexity of the region
Ψk , an optimal response of the searcher k is uniquely determined given other players’
strategies π and ϕ j ( j
= k). As a lemma, we state the optimality conditions of
π ∈ Π of maximizing the non-detection probability, which is easily derived from
Eqs. (18.20), (18.22) and (18.25).
Lemma 18.1. An optimal selection probability of target path is π (ω ) ≥ 0 for ω ∈
ΩM (ϕ1 + ϕ2 ) and π (ω ) = 0 for ω ∈
/ ΩM (ϕ1 + ϕ2 ), where ΩM (ϕ1 + ϕ2 ) is defined by
ΩM (ϕ1 + ϕ2 ) ≡ {ωm ∈ Ω |g(ωm , ϕ1 + ϕ2 ) = min g(ω , ϕ1 + ϕ2 )}

ω ∈Ω
and g(ω , ϕ1 + ϕ2 ) is the weighted amount of resources cumulated on the path ω ∈ Ω

defined by
g(ω , ϕ1 + ϕ2 ) ≡ ∑ αω (t) (ϕ1 (ω (t),t) + ϕ2(ω (t),t)). (18.28)

t∈T
Fig. 18.1 A search space and

target paths
18.3.1 A Special Case of (K, T) = (2, 2)
We have so far discussed a general theory about the three-person nonzero-sum

search game. Here we apply it to a small case of two cells (K = 2) and two time
points (T = 2) to find a concrete form of NE. The case is illustrated in Fig. 18.1.
The path route {ω (t),t = 1, 2} is set to be {1, 1} and {2, 1} for target path #1
and #2, respectively, and both paths rendezvous in cell 1 at time 2. That is why
both searchers distribute all their resources Φ1 (2) and Φ2 (2) into cell 1 at time
2 and an optimal distribution at time 2 is determined to be ϕ1∗ (1, 2) = Φ1 (2),
ϕ1∗ (2, 2) = 0, ϕ2∗ (1, 2) = Φ2 (2), ϕ2∗ (2, 2) = 0. Using two variables x and y, we
denote the resource distribution at time 1 by ϕ1 (1, 1) = x, ϕ1 (2, 1) = Φ1 (1) − x
for Searcher 1, and ϕ2 (1, 1) = y, ϕ2 (2, 1) = Φ2 (1) − y for Searcher 2. We are going
to solve the maximization problem of the payoff function given by Eq. (18.23). The
followings are the derivatives of Q(π , x, y) and Rk (π , x, y), which would be helpful
for the coming analysis.
Q(π , x, y) = π (1) exp(−α1 (x + y + Φ1(2) + Φ2(2))) + π (2) exp(−α2 (Φ1 (1)

+Φ2 (1) − x − y) − α1(Φ1 (2) + Φ2(2))) (18.29)
∂ R1
= {π (1)α1 exp(−α1 (x + y)) − π (2)α2 exp(−α2 (Φ1 (1) + Φ2 (1) − x − y))}
∂x
× {(1 − δ1 ) − (1 − exp(−α1 Φ1 (2)))
(exp(−α1 Φ2 (2)) + δ1 (1 − exp(−α1 Φ2 (2))))}
+δ1 {π (1)α1 exp(−α1 x) − π (2)α2 exp(−α2 (Φ1 (1) − x))} (18.30)
∂ R2
= {π (1)α1 exp(−α1 (x + y)) − π (2)α2 exp(−α2 (Φ1 (1) + Φ2 (1) − x − y))}
∂y
× {(1 − δ2 ) − (1 − exp(−α1 Φ2 (2)))
(exp(−α1 Φ1 (2)) + δ2 (1 − exp(−α1 Φ1 (2))))}
+δ2 {π (1)α1 exp(−α1 y) − π (2)α2 exp(−α2 (Φ2 (1) − y))} . (18.31)
362 R. Hohzaki
1. Optimal response of the target

We can do the maximization of maxπ Q(π , x, y) by comparing the values in
the shoulder of the exponential function in Eq. (18.29) and obtain an optimal
response of the target, using a notation
α2
Φ ≡ (Φ1 (1) + Φ2 (1)),
α1 + α2
as follows:
(i) π (1) = 1 and π (2) = 0, if x + y < Φ . (18.32)

(ii) π (1) = 0 and π (2) = 1, if x + y > Φ . (18.33)

(iii) π (1) and π (2) ≥ 0 satisfying π (1) + π (2) = 1, i f x + y = Φ . (18.34)
2. Optimal response of the searcher

Please note that the value in parenthesis { } in the second line of Eq. (18.30)
changes its sign of positiveness or negativeness depending on δ1 . It is positive if
δ1 < δ1∗ and negative if δ1 > δ1∗ , where
1 − (1 − exp(−α1 Φ1 (2))) exp(−α1 Φ2 (2))

δ1∗ = .
1 + (1 − exp(−α1 Φ1 (2))) (1 − exp(−α1 Φ2 (2)))
Similarly, the value in the parenthesis { } in the second line of Eq. (18.31)
changes its sign with a threshold
1 − (1 − exp(−α1 Φ2 (2))) exp(−α1 Φ1 (2))

δ2∗ =
1 + (1 − exp(−α1 Φ2 (2))) (1 − exp(−α1 Φ1 (2)))
for δ2 .
We are going to prove that x + y = Φ must not hold for any optimal x and y
by classifying δ1 and δ2 into four cases. In the process of proof, we might refer to
Eqs. (18.29)∼(18.34).
(i) Case of δ1 < δ1∗ and δ2 < δ2∗ :
If x + y < Φ , it must be π (1) = 1 from Eq. (18.32) and then R1 (·) mono-
tonically increases for x because ∂ R1 /∂ x > 0 from Eq. (18.30). R2 (·) is also
monotone increasing for y. Therefore, x and y is never optimal within x + y <
Φ . If x+y > Φ , it must be π (1) = 0, from which ∂ R1 /∂ x < 0 and ∂ R2 /∂ y < 0
hold and then smaller x and y are much better for the searcher. The condition
x + y > Φ is never valid for any optimal x and y.
(ii) Case of δ1 < δ1∗ and δ2 > δ2∗ :
If x + y < Φ , we have π (1) = 1 and ∂ R1 /∂ x > 0. This implies that larger x is
desirable for Searcher 1. Concerning ∂ R2 /∂ y of Eq. (18.31), we have

∂ R2 (π , y, x) ∂ R2 (π , y, x)
≥ = α1 exp (−α1 y)
∂y ∂y x=0
× {1 − (1 − exp(−α1 Φ2 (2))) (exp(−α1 Φ1 (2))

+δ2 (1 − exp(−α1 Φ1 (2))))}
>0 (18.35)
and then larger y is desirable for Searcher 2. As a result, both searchers increase
x and y until reaching x + y = Φ .
If x + y > Φ , we have π (1) = 0, ∂ R1 /∂ x < 0 and

∂ R2 (π , y, x) ∂ R2 (π , y, x)
≤ = −α2 exp (−α2 (Φ2 (1) − y))
∂y ∂y x=Φ1 (1)
× {1 − (1 − exp(−α1 Φ2 (2))) (exp(−α1 Φ1 (2))

+δ2 (1 − exp(−α1 Φ1 (2))))}
< 0. (18.36)
Therefore, the searchers decrease x and y until x + y = Φ .
(iii) Case of δ1 > δ1∗ and δ2 < δ2∗ :
By analogy to the case of (ii), we can say that x + y = Φ must not hold for
optimality.
(iv) Case of δ1 > δ1∗ and δ2 > δ2∗ :
If x + y < Φ , it must be π (1) = 1. In the similar manner to the trans-
formation (18.35), we verify ∂ R1 /∂ x ≥ ∂ R1 /∂ x|y=0 > 0 and ∂ R2 /∂ y ≥
∂ R2 /∂ y|x=0 > 0. If x + y > Φ and then π (1) = 0, we have ∂ R1 /∂ x < 0 and
∂ R2 /∂ y < 0 by analogy to the transformation (18.36). x + y > Φ never holds
for optimal x and y.
Considering Cases (i)−(iv), we must search for a NE under the condition x + y =
Φ . By the substitution of the condition, we can transform Eqs. (18.30) and (18.31)
to
∂ R1
= (π (1)α1 − π (2)α2 )A1
∂x
+ δ1 {π (1)α1 exp(−α1 x) − π (2)α2 exp(−α2 (Φ1 (1) − x))} (18.37)
∂ R2
= (π (1)α1 − π (2)α2 )A2
∂y
+ δ2 {π (1)α1 exp(−α1 y) − π (2)α2 exp(−α2 (Φ2 (1) − y))}. (18.38)
In the above expressions, we use notation

Ak ≡ exp(−α1 Φ ) (1 − δk )

− (1 − exp(−α1 Φk (2))) exp(−α1 Φ j (2)) + δk (1 − exp(−α1 Φ j (2))) ,
364 R. Hohzaki
Fig. 18.2 Optimal x + y and

optimal π (1)
x+y
where index (k, j) is one of (1, 2) or (2, 1). Zero point (x∗ , y∗ ) of equations
∂ R1 /∂ x = ∂ R2 /∂ y = 0 gives us the NE of maximizing both payoffs R1 and R2 .
To clarify the relation among optimal x∗ , y∗ and π (1), we solve these equations with
respect to π (1) using π (2) = 1 − π (1), and then we have
α2 (A1 + δ1 exp(−α1 (Φ1 (1) − x∗ )))

π (1) = <1
α2 (A1 + δ1 exp(−α1 (Φ1 (1) − x∗ ))) + α1 (A1 + δ1 exp(−α1 x∗ ))
(18.39)
α2 (A2 + δ2 exp(−α1 (Φ2 (1) − y∗)))

π (1) = <1
α2 (A2 + δ2 exp(−α1 (Φ2 (1) − y∗))) + α1 (A2 + δ2 exp(−α1 y∗ ))
(18.40)
The functions in the right-hand sides of Eqs. (18.39) and (18.40) are monotone
increasing for x∗ , y∗ and then x∗ + y∗ is increasing for π (1). When we draw the
function x∗ + y∗ and a horizontal line of Φ on the axis of π (1), a crossing point
between these two curves gives us optimal π ∗ (1). Figure 18.2 shows the function
x∗ + y∗ with respect to π (1) and the optimal response of the target (18.32)−(18.34),
in a general way. We might recall that the function x∗ + y∗ is derived from
Eqs. (18.37) and (18.38) under the condition x + y = Φ . Basically, we should
have used the function x + y of π (1) directly derived from simultaneous equations
∂ R1 /∂ x = 0 and ∂ R2 /∂ y = 0 using Eqs. (18.30) and (18.31). But this derivation
would be difficult. Anyway, we obtain the same NE both ways. From Eqs. (18.37)
and (18.38), we can make sure that ∂ R1 /∂ x = 0 and ∂ R2 /∂ y = 0 hold for variables
π (1), π (2), x and y satisfying π (1)α1 = π (2)α2 , α1 x = α2 (Φ1 (1) − x)) and
α1 y = α2 (Φ2 (1) − y). An equation x + y = Φ is also valid. Therefore, we have
the following conclusion about the NE and the non-detection probability although
we do not dare to present the searchers’ payoffs R1 (π ∗ , x∗ , y∗ ) and R2 (π ∗ , y∗ , x∗ )

because of their long expressions.
α2 α1
π ∗ (1) = , π ∗ (2) = (18.41)
α1 + α2 α1 + α2
α2
x∗ = Φ1 (1) (18.42)
α1 + α2
α2
y∗ = Φ2 (1) (18.43)
α1 + α2

∗ ∗ ∗ α1 α2
Q(π , x , y ) = exp − (Φ1 (1) + Φ2 (1)) − α1 (Φ1 (2) + Φ2 (2)) .
α1 + α2
(18.44)
The optimal variables π ∗ and x∗ + y∗ also give us a NE for the TPZS game with the
non-detection probability Q(π , x, y) as a payoff, where two searchers are regarded
as one player against the target. Thus we might pay attention to the equivalence
between the nonzero-sum game with three persons and the zero-sum game with two
persons. In the original nonzero-sum game, two searchers do not need to cooperate
in searching for the target because parameters δ1 and δ2 are not necessarily set to
be δ1 + δ2 = 1. Nevertheless, they are possibly motivated to be cooperative by the
selfish behavior of the target aiming to minimize the non-detection probability, as we
see in this special case of the moving target problem. We can present another case
in Sect. 18.4, where the target could exploit the noncooperative behavior between
two searchers to direct the situation to the better with less detection probability.
18.3.2 A General Search Game with a Moving Target
We can formulate the problem of finding an optimal response of Searcher k by the

following problem Pk (ϕ j ; π ), given the strategies of the target and the other searcher
= k), π and ϕ j .
j(
Pk (ϕ j ; π ) : max Rk (π , ϕk , ϕ j )
ϕk ,λ
s.t. ∑ ϕk (i,t) ≤ Φk (t), t ∈ T (18.45)

i∈K

ϕk (i,t) ≥ 0, i ∈ K, t ∈ T (18.46)
+
g(ω , ϕk + ϕ j ) = λ , ω ∈ Ω (π ) (18.47)
g(ω , ϕk + ϕ j ) ≥ λ , ω ∈ Ω \Ω + (π ), (18.48)
where Ω + (π ) ≡ {ω ∈ Ω |π (ω ) > 0} and Ω \Ω + (π ) = {ω ∈ Ω |π (ω ) = 0}.

366 R. Hohzaki
Conditions (18.47) and (18.48) are necessary to keep π be optimal for ϕ1 and ϕ2 ,
as seen from Lemma 18.1. We have a theorem for the NE.
Theorem 18.1. If a sequence of solutions converges to some (ϕ1∗ , ϕ2∗ ) by the
repetition of solving Problem Pk (ϕ j ; π ) with fixed π for (k, j) = (1, 2), (2, 1), the
solution of π , ϕ1∗ , and ϕ2∗ is a Nash-equilibrium. There exists a Nash-equilibrium for
any target strategy π if Problem Pk (ϕ j ; π ) is well-defined for (k, j) = (1, 2), (2, 1).
Proof. The strategy ϕk∗ is evidently an optimal response to other players’ strategies
π and ϕ ∗j . The rest we have to prove is to verify the optimality of π for ϕ1∗ and
ϕ2∗ . Let λ ∗ be an optimal multiplier λ of Problem Pk (ϕ j ; π ). The non-detection
probability becomes
Q(π , ϕ1∗ , ϕ2∗ ) = ∑ π (ω ) exp(−g(ω , ϕ1∗ + ϕ2∗ ))

ω ∈Ω
= ∑+ π (ω ) exp(−g(ω , ϕ1∗ + ϕ2∗ ))

ω ∈Ω ( π )
+ ∑+ π (ω ) exp(−g(ω , ϕ1∗ + ϕ2∗ ))

ω ∈Ω \ Ω ( π )
= ∑ π (ω ) exp(−λ ∗ ) = exp(−λ ∗ ).
ω ∈Ω + ( π )
Noting that Q(π , ϕ1∗ , ϕ2∗ ) = exp(−λ ∗ ) holds for arbitrary π ∈ Π of Ω + (π ) =

Ω + (π ). For arbitrary π ∈ Π of Ω + (π )
= Ω + (π ), we have the following inequality
Q(π , ϕ1∗ , ϕ2∗ )

= ∑+ π (ω ) exp(−λ ∗ ) + ∑+ π (ω ) exp(−g(ω , ϕ1∗ + ϕ2∗ ))
ω ∈Ω ( π ) ω ∈Ω \ Ω ( π )
≤ ∑ π (ω ) exp(−λ ∗ ) + ∑ π (ω ) exp(−λ ∗ ) = exp(−λ ∗ ).

ω ∈Ω + ( π ) ω ∈Ω \ Ω + ( π )
This implies that the target does not have any incentive to change his current
strategy π .
The problem Pk (ϕ j ; π ) has a unique solution from their strictly concavity if
the problem is well-defined or its feasible region is not empty. A sequence of
the solutions is a mapping of a new point (ϕ1 , ϕ2 ) from an old one (ϕ1 , ϕ2 ) by
solving problems ϕ1 = argmaxϕ1 R1 (π , ϕ1 , ϕ2 ) and ϕ2 = arg maxϕ2 R2 (π , ϕ2 , ϕ1 ).
The mapping is closed from the continuity of functions R1 (·), R2 (·) and the
closed convexity of the feasible region defined by conditions (18.45)−(18.48),
and therefore it has a fixed point from the Kakutani’s fixed-point theorem, that is,
(ϕ1 , ϕ2 ) coincides with (ϕ1 , ϕ2 ). Therefore, there exists a Nash equilibrium for any
target strategy π .

The iterative algorithm of finding an optimal solution as a convergence point

is often used in many problems. However, we also observe that such a direct
methodology sometimes fails to find the convergence point by the swing of the
temporary solutions in the process of calculation. To avoid the vibration of the
solutions, the objective with penalty function could be effective. Let us substitute
such function

Rk (π , ϕk , ϕ j ) ≡ Rk (π , ϕk , ϕ j ) − γ ||ϕk − ϕ k ||2
for the original objective function in problem Pk (ϕ j ; π ) (k = 1, 2), and denote the
renewed problem by Pk (ϕ j ; π ). ϕ k is the current solution of Searcher k’s strategy
and gamma is a parameter for adjustment. If we find a convergence point mentioned
in Theorem 18.1, the point is the NE, aside from the algorithmic idea for the
practical computation of the NE. We can anticipate that there would be many NEs
from Theorem 18.1. We are going to propose a reasonable target strategy π , based
on which we can derive the convergence point (ϕ1∗ , ϕ2∗ ) for optimal searchers’
strategies.
A thoughtful target would think of the worst case that searchers’ strategies ϕ1 and
ϕ2 are totally against his interest to make the non-detection probability Q(π , ϕ1 , ϕ2 )
as small as possible. The target has to respond optimally to the worst case that two
searchers cooperate in minimizing Q(π , ϕ1 , ϕ2 ). We can regard the case as a TPZS
game with non-detection probability as a payoff. In the game, the target chooses
one path ω ∈ Ω as a maximizer and a team of two searchers makes a plan of
distribution ϕ (i,t) = ϕ1 (i,t) + ϕ2 (i,t) as a minimizer. The non-detection probability
or the payoff is given by

T
Q(ω , ϕ ) = exp − ∑ αω (t) ϕ (ω (t),t) ,
t=τ
which is modified from Eq. (18.20). Fortunately, we already have a research on this
kind of TPZS search game, by Hohzaki [10]. It says that we obtain an optimal
strategy of searchers ϕ ∗ from the following linear programming formulation:
PS : w = max η
ϕ ,η
s.t. ∑ αω (t) ϕ (ω (t),t) ≥ η , ω ∈ Ω

t∈T
∑ ϕ (i,t) = Φ1 (t) + Φ2(t), t ∈ T

i∈K

ϕ (i,t) ≥ 0, i ∈ K, t ∈ T,
and an optimal strategy of target π ∗ from the following problem, which is dual to
Problem (PS ) above.
DT : w = min
ν ,π
∑ ν (t)(Φ1 (t) + Φ2(t))

t∈T
368 R. Hohzaki
Table 18.1 Cells in target t

paths
ω 1 2 3 4 5 6 7 8 9 10
ω1 1 1 1 1 1 1 1 1 1 1
ω2 2 2 2 2 2 2 2 2 2 2
ω3 3 3 3 3 3 3 3 3 3 3
ω4 4 4 4 4 3 3 3 3 3 3
ω5 1 2 3 3 3 3 3 3 3 3
s.t. ∑ π (ω ) = 1
ω ∈Ω
π (ω ) ≥ 0, ω ∈ Ω
αi ∑
π (ω ) ≤ ν (t), i ∈ K, t ∈ T,
ω ∈Ωit
where Ωit is a set of paths passing through cell i at time t and is defined by Ωit ≡
{ω ∈ Ω |ω (t) = i}. The resulting non-detection probability is calculated by exp(−w)
using the optimal value w of the problem above.
At the end of this section, we incorporate the discussion so far into an algorithm
to derive a NE for the original three-person nonzero-sum search game.
Algorithm AL2S
(i) Solve Problem DT to derive an optimal target strategy π ∗ . We also solve

Problem PS for an optimal strategy {ϕ ∗ (i,t)} of the unified searcher. Generate
initial temporary strategies for individual searcher k by
Φk (t)
ϕk0 (i,t) = ϕ ∗ (i,t) , i ∈ K, t ∈ T.
Φk (t) + Φ j (t)
(ii) Using π = π ∗ , repeat solving convex problem Pk (ϕ j ; π ) for (k, j) = (1, 2) and
(2, 1) by turns. If their solutions ϕ1∗ and ϕ2∗ converge, the obtained π , ϕ1∗ and
ϕ2∗ are a Nash equilibrium. The resulting payoff of each player are given by
Q(π , ϕ1∗ , ϕ2∗ ) and Rk (π , ϕk∗ , ϕ ∗j ), (k, j) = (1, 2), (2, 1).
18.4 Numerical Examples
We took a small size of problem in Sect. 18.3.1 to derive an analytical form of NE.
Here we take a larger size of problem to numerically analyze the property of the NE
by applying the algorithm proposed in Sect. 18.3.2.
t=
Cells
Fig. 18.3 Target paths
Table 18.2 Initial temporary strategy of a searcher

t
Cells 1 2 3 4 5 6 7 8 9 10
1 0.173 0.327 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
2 0.327 0.173 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
3 0 0 0 0 0.278 0.278 0.278 0.278 0.278 0.278
4 0 0 0 0 0 0 0 0 0 0
Φk (t) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
We consider five target paths, Ω = {1, · · · , 5}, running in a search space of

four cells K = {1, · · · , 4} and 10 time points T = T = {1, · · · , 10}. Table 18.1 and
Fig. 18.3 show the route of each path {ω (t), t ∈ T} in the space of cells and time
points. Path 1, 2 and 3 always stay in Cell 1, 2 and 3, respectively. Path 4 stays in
Cell 4 in the early time but moves to Cell 3 at time 5. Path 5 moves accross some
cells and stays in Cell 3 after time 3. The effectiveness of searching resource is the
same in all cells, αi = 0.2 (i = 1, · · · , 4), and two searchers have the same amount
of available resources at each time, Φ1 (t) = Φ2 (t) = 0.5 (t ∈ T).
Solving problem DT , we have a target strategy (π (1), π ∗ (2), π ∗ (3), π ∗ (4),
∗
∗
π (5)) = (1/3, 1/3, 1/6, 1/6, 0). Path 5 has many crossing points between other
paths. The crossing point is a good place for the searchers to efficiently focus their
resource on because the resource distributed there can cover several paths simulta-
neously. That is why the target avoids taking the path and its selection probability
becomes zero. Without Path 5, two paths 1 and 2 are running independently of any
other paths. Path 3 and 4 are in the same situation that they runs independently
during a time period from time 1 to 4 and meet at time t = 5 to stay in Cell 3 after
then. Considering these situation, the selection probability of path π ∗ stated above
is persuadable. From the solution of problem PS , we generate the initial temporary
strategy of Searcher k = 1, 2, ϕk0 , shown in Table 18.2.
As the weighted amount of searching resource,
{g(ωk , ϕ 0 ), k = 1, . . . , 5} = {0.667, 0.667, 0.667, 0.667, 0.805}
are distributed on five target paths ω1 , . . . , ω5 , respectively. The values correspond to
the target strategy π ∗ (ω ), as mentioned in Lemma 18.1. Using π ∗ and ϕk0 (k = 1, 2),
370 R. Hohzaki
Table 18.3 Optimal distribution of resource (Case 2: π = π = (0.4, 0.4, 0, 0, 0.2), order of
k = 1, 2)
t
Cells 1 2 3 4 5 6 7 8 9 10
Searcher 1
1 0 0.254 0.200 0.200 0.149 0.149 0.150 0.150 0.150 0.150
2 0.255 0 0.200 0.199 0.149 0.149 0.149 0.150 0.150 0.150
3 0.173 0.173 0 0 0.202 0.201 0.201 0.201 0.201 0.200
4 0.072 0.073 0.100 0.101 0 0 0 0 0 0
Φ1 (t) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
Searcher 2
1 0.174 0.328 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
2 0.326 0.172 0.250 0.250 0.111 0.111 0.111 0.111 0.111 0.111
3 0 0 0 0 0.277 0.278 0.278 0.278 0.278 0.278
4 0 0 0 0 0 0 0 0 0 0
Φ2 (t) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
we apply Algorithm AL2S to three cases of (δ1 , δ2 ) = (1, 1) (Case 1), (0.8, 0.2)
(Case 2) and (0, 0) (Case 3) to derive the NEs. Optimal distributions of searching
resource are almost the same as Table 18.1 with some small differences in the three
cases. They give the target almost the same non-detection probability 0.513 although
the detailed probabilities are 0.51346, 0.51347 and 0.51348 for Case 1, 2 and 3,
respectively, reasonably reflecting the favorableness of the simultaneous detection
based on δ -value. By these NEs, Searcher 1 and 2 get the rewards (0.249, 0.249)
(Case 1), (0.247, 0.240) (Case 2) and (0.238, 0.238) (Case 3). The reward tends to
decrease as the value δk gets smaller. However, we can say that the influence of
the simultaneous detection by both searchers on the reward is not so large as the
total detection probability by either searcher. That is why, for any case, the optimal
distribution of resource is near to the initial distribution ϕk0 , which is derived from
Problem PS under the criterion of total detection probability.
We check another target strategy π = π = (0.4, 0.4, 0, 0, 0.2) different from
π ∗ , in Case 2. After applying Algorithm AL2S to this case, we have Table 18.3 as an
optimal distribution ϕk∗ for two searchers (k = 1, 2).
In this case, the resulting non-detection probability is 0.525 and the rewards are
0.214 and 0.261 for Searcher 1 and 2. Searcher 2 can expect more reward than
Searcher 1 in spite of δ1 = 0.8 and δ2 = 0.2. The results are advantageous to the
target and Searcher 2 but disadvantageous to Searcher 1, comparing with the results
by the original target strategy π ∗ . The advantage depends on the order of calculation
in Step (ii) of Algorithm AL2S . The results above are brought by the order k = 1, 2.
If we change the order to k = 2, 1, we have the distribution obtained by exchanging
two distribution plans for two searchers in Table 18.3. The results also have the
same non-detection probability as the above but bring the expected rewards 0.266
and 0.208 to Searcher 1 and 2, respectively. The rewards become advantageous to
Searcher 1 but disadvantageous to Searcher 2. These phenomena often appear in
the repeated game or the game with a leader and a follower, where the leading
player with the declaration of his intention or his strategy is usually in the favoring
position. In Algorithm AL2S , ϕk0 (i,t) is declared first by Searcher 2 and used in the
first calculation as fixed parameters in the case of order k = 1, 2. In the case of order
k = 2, 1, the first declaration is done by Searcher 1. Anyway, these distributions are
both the NEs or there are two NEs at least for the target strategy π. Both these NEs
are more favorite than the NE for π ∗ for the target. Now we may have the lesson
that the target could lead the game to more advantageous Nash-equilibrium points
if it lets the searchers carry such a belief on the target strategy as π.
18.5 Conclusion
In this paper, we deal with a three-person nonzero-sum noncooperative search game,

where a target and two searchers compete against one another. For the game with
a stationary target, we show that there always exists the Nash-equilibrium (NE)
point in which cooperation occurs between the searchers against the target under
the common criterion of non-detection probability. We also demonstrate a special
example of the moving-target game, which has a Nash-equilibrium in the same
situation with the cooperation of two searchers as the stationary-target game. In
the situation, the game can be regarded as a two-person zero-sum game between the
target and a group of two searchers with the payoff of non-detection probability of
the target. In a general case, we prove the existence of NE for any target strategy on
path-selection if the problem is well-defined, in Theorem 18.1. We can anticipate
that there would be many NEs for a general game with a moving target. We
propose a computational algorithm to derive a NE. Applying the algorithm to some
examples, we see that there could be the situation in which the target has to fight a
coalition of two searchers. At the same time, there could be another situation that
the target leads the game to his advantage by exploiting the noncooperative relation
between two searchers.
We originally started this research on the game expecting the extension of our
model to a cooperative game with two groups of searchers and a target, who compete
against one another. Through the evaluation of reward given to searcher’s groups, we
could define the characteristic function for the coalition of searchers and discuss the
complicated search operation with many searchers chasing a target as an ordinary
coalition game. This is our work in the near future.
Acknowledgements We appreciate the Ministry of Education, Culture, Sports, Science and

Technology of Japan, which partially supports the present research by Grants-in-Aid for Scientific
Research (C), Grant No. 22510169 (2010–2012).
372 R. Hohzaki
References
1. Alpern, S., Gal, S.: The Theory of Search Games and Rendezvous. Kluwer Academic, Boston
(2003)
2. Baston, V.J., Bostock, F.A.: A one-dimensional helicopter-submarine game. Naval Res.
Logistics 36, 479–490 (1989)
3. Baston, V.J., Garnaev, A.Y.: A search game with a protector. Naval Res. Logistics. 47, 85–96
(2000)
4. Dambreville, F., Le Cadre, J.P.: Search game for a moving target with dynamically generated
informations. In: Proceedings of the 5th International Conference on Information Fusion
(FUSION’2002), pp. 243–250 (2002)
5. Danskin, J.M.: A helicopter versus submarine search game. Oper. Res. 16, 509–517 (1968)
6. Eagle, J.N., Washburn, A.R.: Cumulative search-evasion games. Naval Res. Logistics 38,
495–510 (1991)
7. Garnaev, A.Y.: A remark on a helicopter-submarine game. Naval Res. Logistics 40, 745–753
(1993)
8. Hohzaki, R.: Search allocation game. Eur. J. Oper. Res. 172, 101–119 (2006)
9. Hohzaki, R.: A multi-stage search allocation game with the payoff of detection probability.
J. Oper. Res. Soc. Jpn. 50, 178–200 (2007)
10. Hohzaki, R.: A cooperative game in search theory. Naval Res. Logistics 56, 264–278 (2009)
11. Hohzaki, R., Iida, K.: A search game with reward criterion. J. Oper. Res. Soc. Jpn. 41, 629–642
(1998)
12. Hohzaki, R., Iida, K.: A solution for a two-person zero-sum game with a concave payoff
function. In: Takahashi, W., Tanaka, T. (eds.) Nonlinear Analysis and Convex Analysis,
pp. 157–166. World Science Publishing Co., London (1999)
13. Hohzaki, R., Iida, K., Komiya, T.: Discrete search allocation game with energy constraints.
J. Oper. Res. Soc. Jpn. 45, 93–108 (2002)
14. Hohzaki, R., Washburn, A.: An approximation for a continuous datum search game with energy
constraint. J. Oper. Res. Soc. Jpn. 46, 306–318 (2003)
15. Iida, K., Hohzaki, R., Furui, S.; A search game for a mobile target with the conditionally
deterministic motion defined by paths. J. Oper. Res. Soc. Jpn. 39, 501–511 (1996)
16. Iida, K., Hohzaki, R., Sato, K.: Hide-and-search game with the risk criterion. J. Oper. Res. Soc.
Jpn. 37, 287–296 (1994)
17. Kikuta, K.: A search game with traveling cost. J. Oper. Res. Soc. Jpn. 34, 365–382 (1991)
18. Koopman, B.O.: The theory of search. III. The optimum distribution of searching effort. Oper.
Res. 5, 613–626 (1957)
19. Meinardi, J.J.: A sequentially compounded search game. In: Mensch A. (ed.) Theory of Games:
Techniquea and Applications, pp. 285–299. The English Universities Press, London (1964)
20. Morse, P.M., Kimball, G.E.: Methods of Operations Research. MIT, Cambridge (1951)
21. Myerson, R.B.: Game Theory: Analysis of Conflict. Harvard University Press, Cambridge
(1991)
22. Nakai, T.: A sequential evasion-search game with a goal. J. Oper. Res. Soc. Jpn. 29, 113–122
(1986)
23. Nakai, T.: Search models with continuous effort under various criteria. J. Oper. Res. Soc. Jpn.
31, 335–351 (1988)
24. Norris, R.C.: Studies in search for a conscious evader. MIT Technical Report, No. 279 (1962)
25. O’Rourke, J.: Art Gallery Theorems and Algorithms. Oxford University Press, New York
(1987)
26. Owen, G.: Game Theory, pp. 212–224. Academic, New York (1995)
27. Parsons, T.D.: Pursuit-Evasion in a Graph. In: Alavi, Y., Lick, D.R. (eds.) Theory and
Applications of Graphs. Springer, Berlin (1976)
28. Parsons, T.D.: The search number of a connected graph. In: Proceedings of the 10th Southeast-
ern Conference on Combinatorics, Graph Theory and Computing, pp. 549–554 (1978)
29. Washburn, A.R.: Search-evasion game in a fixed region. Oper. Res. 28, 1290–1298 (1980)
30. Washburn, A.R., Hohzaki, R.: The diesel submarine flaming datum problem. Milit. Oper. Res.
4, 19–30 (2001)
Chapter 19
Advertising and Price to Sustain The Brand
Value in a Licensing Contract
Alessandra Buratto
Abstract One of the reasons that induce a brand owner to issue a licensing
contract is that of improving the value of his brand. In this paper, we look at a
fashion licensing agreement where the licensee produces and sells a product in a
complementary business. The value of a fashion brand is sustained by both the
advertising efforts of the licensor and the licensee. We assume that demand is
proportional to the brand value and decreases with the price. The licensor wants
to maximize his revenue coming from the royalties and to minimize his advertising
costs. Moreover, he does not want his brand to be devalued at the end of the selling
season. On the other hand, the licensee plans her advertising campaign in order to
invest in the brand value and maximize the sales revenue. The aim of this paper
is to analyze the different strategies the licensor can adopt to sustain his brand. To
this end, we determine the optimal advertising policies by solving a Stackelberg
differential game, where the owner of the brand acts as the leader and the licensee
as the follower. We determine the equilibrium policies of the two players assuming
that advertising varies over time and price is constant. We also determine a minimum
selling price which guarantees brand sustainability without advertising too much.
Keywords Game theory • Advertising • Licensing • Brand value
19.1 Introduction
Let’s consider a licensing contract between the owner of a brand (licensor) and a
manufacturer (licensee) who produces and sells a product with the licensor’s brand.
We focus on a particular type known as complementary business licensing which
A. Buratto ()
Department of Mathematics, Via Trieste 63, I-35131 Padova, Italy
378 A. Buratto
is, for instance, the case with an owner of a brand of clothes that licenses his
brand to a producer of accessories or perfumes [19]. This type of contract is very
common nowadays. In fact, “Brands allow larger firms to diversify from clothing
into other markets, outside their core business: perfumes, accessories . . . ” [17].
Licensing itself can be considered to be a brand extension strategy according to
Keller’s definition [12].
A licensing contract may last for several years, even decades. Nevertheless,
at the beginning of every selling season for each new product, the two agents
involved have to come to an agreement in setting the selling price and must plan
the advertising campaigns they are respectively going to carry out. Advertising
coordination is quite important in any vertical channel and becomes crucial in a
licensing agreement [18].
The importance of the brand value is well known, especially in the fashion
business, [1]. Several papers study brand dilution and imitation in the fashion
industry, see [5, 6, 9]. Its links to the price have been studied since the early fifties,
with the analysis of the different effects of price over demand, [13]. More recently in
the game theory context such effects have been formalized. For example, in [2, 15]
it is stressed that “In fashion goods price increases the brand value.” In [5] the
authors say that the price of prestige goods should not be too low and licensing
is a mechanism for expanding sales; however, one of its risks is brand dilution. On
the other hand, advertising may increase the brand value and a good advertising
campaign can be useful in guaranteeing brand sustainability.
Here we tackle the issue in a game theoretical context, taking into account
production costs too. In this paper we wish to determine the equilibrium policies
of a licensor assuming that advertising varies over time and price is constant. We
conduct a sensitivity analysis with respect to the price parameter, just in order to
see if some particular prices can guarantee brand sustainability. Imposing to the
licensee a given price, among these values, is one of the licensor’s strategies to
sustain the brand itself.
The aim of this paper is to provide a guideline for the owner of a brand who
is concerned about the sustainability of his brand. We analyze different approaches
he can follow in order to achieve this task, in view of the different dynamics
between the two agents involved in the agreement. We consider the selling price as
already fixed and we tackle the problem of determining the advertising campaigns
by solving a Stackelberg game with the Licensor as the leader and the licensee as
the follower. In fact, the licensor lays down the law in a licensing contract and can
sever the agreement whenever the licensee does not meet his target [18]. Each player
determines his optimal advertising strategy, maximizing the profits coming from the
sale of the product and minimizing the advertising costs. Similar approaches, within
differential game theory, are common in order to determine the advertising strategies
in a vertical channel, see [8, 11], and they have been used in the franchising context
too (see f.e. [7,14]). An attempt has been made in [4] for a licensing contract without
considering brand sustainability.
In the following we will analyze the brand sustainability problem from the
licensor’s point of view.The licensor can achieve the sustainability of his brand
19 Advertising and Price to Sustain The Brand Value in a Licensing Contract 379
through advertising, either cooperating with the licensee in maximizing the sum
of their profits and then both of them have to take into account the production costs
and to take care of the brand value sustainability or in a competitive context, such as:
• By sharing with the licensee the necessary increase of the advertising effort.
• By increasing his usual advertising effort, without binding the licensee to do the
same.
• By forcing the licensee to increase her advertising effort, considering the
sustainability constraint in her advertising plan.
Alternatively
• He can wonder if there exists a minimum selling price to impose to the licensee,
such that the brand value is sustained without spending too much for advertising.
We will analyze the licensor’s brand sustainability problem in Sect. 19.2, in
Sect. 19.3 we calculate the advertising strategies of the two players and we give
an operational rule for the licensor to choose which is the optimal advertising effort
with respect to the selling price value. Moreover we determine a minimum selling
price which guarantees brand sustainability. In Sect. 19.4 we consider the fully
cooperative solution and compare it to the Stackelberg solution, wondering if the
former type of strategy is effectively the best option for the licensor.
19.2 Brand Value Evolution
Let’s denote by [0, T ] the selling season, during which the two players make their
advertising campaigns. In view of the short-term nature of the problem we do not
discount future profits. We model the brand value using the goodwill state variable
introduced by Nerlove and Arrow [16]. Let’s denote by G(t) the brand value,
goodwill, at time t ∈ [0, T ] and by G0 > 0 the initial brand value which we assume
to be sufficiently high, just because only famous brands are licensed.
We assume that the brand value is increased by both the advertising efforts,
aL (t) ≥ 0, al (t) ≥ 0 and the price difference, as follows
Ġ(t) = γL aL (t) + γl al (t) − δ G(t) + β (p − pR) , t ∈ [0, T ] , (19.1)
where γL > 0 and γl > 0 are the advertising efficiencies of the two advertising
messages and δ > 0 is the decaying rate. Observe that the index i = L stands for
the licensor, while the index i = l stands for the licensee.
The additive term β (p − pR) represents the snob effect by which the brand value
increases with the selling price, p; while pR is a reference price, known as regular
price [15]; it is the price the licensor considers fair/proper for this type of branded
product. If the selling price is greater than the reference price, then the brand value
will be increased. On the other hand, if the selling price is too low, p < pR , then
the brand is under valued. Therefore β ≥ 0 is called price sensitivity toward the
brand value.
380 A. Buratto
From now on, we will refer to brand value sustainability as the request of having
a brand value level greater than or equal to the initial one at the end of the selling
season. Being
G(0) = G0 , (19.2)
brand value sustainability can be formalized by means of the constraint
G(T ) ≥ G0 . (19.3)
19.3 Stackelberg Game
Let’s consider a differential game played à la Stackelberg, with the licensor as

the leader and the licensee as the follower and let’s determine the equilibrium
advertising strategies.
19.3.1 The Licensee’s Point of View
The licensee’s objective is to determine the intensity of the advertising campaign in

order to maximize his profit, defined as the difference between sales revenue and
production and advertising costs.
We assume that demand is increasing with the brand value and decreasing with
the price,
D(G) = G(t) − θ p, (19.4)
where θ > 0 is the price sensitivity of the demand. The revenue rate from the sales
is then
p (G(t) − θ p), t ∈ [0, T ] . (19.5)
For granting the use of the brand, the licensee has to pay the royalties to the
licensor, they generally consist of a percentage of the sales, that is
rp (G(t) − θ p), t ∈ [0, T ] , (19.6)
where r ∈ (0, 1) is the royalty percentage. Observe that the royalty percentage of the
revenues from the sales that the licensee has to pay to the licensor is also exogenous
and constant, and as a consequence, the unique strategic marketing instruments are
the advertising efforts of the licensor and the licensee. The licensee’s revenue after
paying the royalties is therefore
(1 − r)p (G(t) − θ p), t ∈ [0, T ] . (19.7)

Production costs are linked to the demand according to the following linear form
C pr (G(t)) = c (G(t) − θ p), t ∈ [0, T ] , (19.8)
with c > 0, marginal production cost. We assume that
(1 − r)p ≥ c,
this means requiring that marginal revenues are greater than marginal costs and such
an assumption is reasonable for any manufacturer.
We consider quadratic advertising costs
1
Clpu (al ) = cl a2l t ∈ [0, T ] , (19.9)
2
with cl > 0 as licensor’s cost parameter.
The licensee has to solve the following problem
T
cl 2
Pl1 : max al ≥0 Jl1 (aL , al , p) = ((1 − r)p − c)(G(t) − θ p) − a (t) dt,
0 2 l
(19.10)
under constraints (19.1)–(19.3).
Observe that we denote by Pi1 the optimization problem for agent i ∈ {L, l} which
maximizes his/her profit and considers brand sustainability, that is Pi1 = max Ji
subject to (19.1)–(19.3). Similarly we denote by Pi0 the optimization problem
for agent i ∈ {L, l} which maximizes his/her profit without considering brand
0/1
sustainability, that is Pi0 = max Ji subject to (19.1) and (19.2) only. Let Ji , i ∈ {L, l}
0/1
be the profits associated to problems Pi respectively.
19.3.2 The Licensor’s Point of View
The licensor’s has his own advertising costs, supposed quadratic too
1
CLpu (aL ) = cL a2L , t ∈ [0, T ] , (19.11)
2
and he obtains from the licensee the royalties, given in (19.6). The licensor’s
problem is
T
cL 2
PL1 : max aL ≥0 JL1 (aL , al , p) = rp (G(t) − θ p) − a (t) dt (19.12)
0 2 L
under constraints (19.1)–(19.3).

382 A. Buratto
In order to write the advertising efforts which constitute the Stackelberg

Equilibrium for this game, it turns out convenient to introduce the following
parameter
2
1 −δ T γL rp γl2 (1 − r)p − c
η0 = 2(δ G 0 − β (p − p R )) − (1 − e ) + .
1 + e− δ T cL δ cl δ
(19.13)
Let’s observe that η0 ≤ 0 if and only if
−δ T
γ2
G0 + βδ pR + c cl 1−e2δ 2
p ≥ ps = l

. (19.14)
β γ 2 γ 2 −δ T
δ + cLL r + cll (1 − r) 1−e 2δ 2
Theorem 19.1 (Stackelberg Equilibria). (I) If p ≥ ps , then the advertising efforts

which constitute the Stackelberg equilibrium are
γL rp

L (t) =
a0∗ 1 − e−δ (T −t) (19.15)
cL δ
and
γl (1 − r) p − c
l (t) =
a0∗ 1 − e−δ (T −t) . (19.16)
cl δ
(II) If p < ps , then the advertising efforts a0∗

L (t) and al (t) are not sufficient to
0∗
sustain the brand value. In order to do so, the licensor can choose among the
following three different options.
• He can share with the licensee the additional advertising effort necessary to
sustain the brand. The optimal advertising efforts are respectively
γL −δ (T −t)
L (t) = aL (t) + ηL
a11∗ ,
0∗
e (19.17)
cL
γl −δ (T −t)
l (t) = al (t) + ηl
a11∗ ,
0∗
e (19.18)
cl
where ηL and ηl satisfy the equation
γL2 γ2
ηL + l ηl = η0 (19.19)
cL cl
The two players join their efforts in advertising in order to sustain the
brand and infinitely many solutions may exist.
• He can increase his advertising adopting the optimal advertising effort
η0 −δ (T −t)
L (t) = aL (t) +
a1∗ ,
0∗
e (19.20)
γL
while the licensee’s advertising effort is the minimum one given in (19.16)
γl (1 − r) p − c
l (t) =
a0∗ 1 − e−δ (T −t) .
cl δ
Observe that in this scenario, the licensee does not take into account the
brand sustainability.
• He can bind the licensee to consider the sustainability constraint in her ad-
vertising plan, whereas he can neglect it. The licensor’s optimal advertising
effort is the one given in (19.15)
γL rp

L (t) =
a0∗ 1 − e−δ (T −t) ,
cL δ
while the licensee’s optimal advertising effort is
η0 −δ (T −t)
l (t) = al (t) +
a1∗ .
0∗
e (19.21)
γl
Proof. See Appendix.

It can be easily proved that the open loop Stackeberg Equilibrium in all cases
coincides with the Markovian Nash Equilibrium, so that it is time consistent.
Observe that if ps < c/(1 − r), then only case I can occur and therefore brand
sustainability is assured with the optimal advertising efforts as in (19.15) and
(19.16).

Theorem 19.1 states that if the selling price, which we assume to be constant,
is greater than or equal to a minimum price ps , then the players can limit their
advertising effort, as if they were neglecting the constraint on the final brand value.
In fact, in such a case, the optimal advertising strategies coincide with the ones they
would have obtained without considering the brand sustainability. Note that if the
licensor imposes the minimum selling price ps to the licensee, then he can be sure
that his brand value will be automatically increased, without any further advertising.
In such a situation, both the licensor’s and the licensee’s advertising efforts are
positive, concave, decrease with time and vanish at the end of the selling season, in
fact
L (t)
da0∗ γL rp −δ (T −t) L (t)
d2 a0∗ da0∗ (t)
=− e < 0, 2
=δ L < 0,
dt cL dt dt
l (t)
da0∗ ((1 − r)p − c) γl −δ (T −t) l (t)
d2 a0∗ da0∗ (t)
=− e < 0, 2
=δ L < 0.
dt cl dt dt
On the other hand, if the selling price is lower than the threshold ps then
at least one of the two actors has to increase his/her advertising effort. In the
case in which only one player increases his/her advertising, the derivatives of the
advertising effort of the player who considers brand sustainability are respectively
384 A. Buratto

∂ a1∗
L (t) η0 γL rp ∂ 2 a1∗
L (t) ∂ a1∗
L (t)
=δ e−δ (T −t) − , = δ ,
∂t γL cL δ ∂t 2 ∂t

∂ a1∗
l (t) η0 γl (1 − r)p − c ∂ 2 a1∗
l (t) ∂ a1∗
l (t)
= δ e−δ (T −t) − , = δ .
∂t γl cl δ ∂t 2 ∂t
The advertising efforts turn out to be either increasing and convex, this for very low
prices, that is for p < p̂i , i ∈ {L, l}, or decreasing and concave, for intermediate
price values, that is for p > p̂i , or finally constant if p = p̂i , i ∈ {L, l} where
−δ T
γ2
G0 + δc2 cl 1−e2 + β δpR
p̂L = l
< ps
1−e−δ T γL2 γl2 β γL2 r 1+ e−δ T
2δ 2 cL r + cl (1 − r) + δ + cL 2δ 2
and
γl2
G0 + δc2 cl + β δpR
p̂l = .
1−e−δ T γL2 γl2 β γL2 (1−r) 1+ e−δ T
2δ 2 cL r + c (1 − r) + δ + c 2δ 2
l l
It is interesting to conduct a sensitivity analysis of the optimal minimum selling

price, ps , with respect to values of the problem’s parameters. Speaking about prices,
it seems reasonable to compare the minimum selling price with the regular price pR .
The minimum selling price is linear in pR and monotonically increasing, that is, the
higher the regular price, the higher the minimum selling price must be to sustain
the brand. Moreover if the regular price is low, more precisely smaller that a given
threshold pR ,
−δ T
γ2
G0 + c cl 1−e 2δ 2
pR =
l

,
γL2 γl2 1−e−δ T
cL r + cl (1 − r) 2δ 2
then the optimal selling price must necessarily be greater than the regular price pR
itself, whereas with a high regular price, greater than the threshold pR , then the
selling price can be lower than pR .
For what concerns the dependence on the royalty coefficient, r, if the licensor’s
γ2 γ2
advertising effectiveness is greater than the licensee’s one, that is if cLL > cl , then the
l
minimum selling price decreases with the royalty coefficient. The opposite happens
if the licensor’s advertising is less effective than the licensee’s one. Observe that
the asymmetry of the game influences its solution; if we had assumed γL = γl and
cL = cl , then a substantial difference on the equilibria would hold, for example the
minimum price ps would not depend on the royalty’s percentage r at all.
Other behaviors are summarized in the following table (sign “+” means
increasing)
G0 c γL cL
ps ++ – +
19.4 Fully Cooperative Solution
Here we consider the situation in which the licensor cooperates with the licensee
in maximizing the sum of the profits and both of them take care of the brand value
sustainability. They have to solve the following optimal control problem
PC1 = max JC1 = max JL1 (aL , al , p) + Jl1 (aL , al , p)

aL ,al ≥0 aL ,al ≥0
T c

L 2 cl
= max (p − c)(G(t) − θ p) − aL (t) + a2l (t) dt
aL ,al ≥0 0 2 2
subject to (19.1)–(19.3).
Theorem 19.2 (Cooperative Equilibrium). The coordinated optimal advertising
efforts are

γL p − c
a∗LC (t) = 1 − e−δ (T −t) + ηC e−δ (T −t) , (19.22)

cL δ

γl p−c
a∗lC (t) = 1 − e−δ (T −t) + ηC e−δ (T −t) . (19.23)

cl δ
where
⎧ ⎫
⎪
⎪ ⎪
⎪
⎨ 2(δ G − β (p − p )) p − c 1 − e− δ T ⎬
0 R
ηC = max − ,0 .
⎪ γL2 γl2
⎪ δ 1 + e− δ T ⎪ ⎪
⎩ c + c (1 + e−δ T ) ⎭
L l

Observe that the equilibrium advertising efforts are strictly monotonically in-
creasing in ηC . We can adopt the same argument as in the Stackelberg problem and
obtain that ηC ≤ 0 if and only if

γ2 γ2 −δ T
G0 + βδ pR + cLL + cl c 1−e 2δ 2
l
p ≥ pC =
.
β γL2 γl2 1−e−δ T
δ + cL + c 2δ 2
l
386 A. Buratto
and therefore,
• If p ≥ pC , then ηC = 0 and therefore the cooperative advertising efforts a∗LC (t)
and a∗lC (t) reduce to the minimum, just because of their monotonicity in ηC .
Nevertheless they are greater than the minimum advertising efforts a0∗ L (t) and
a0∗
l (t) obtained without considering brand sustainability.
• If p < pC , then ηC > 0, and the players have to increase their advertising efforts.
The cooperative advertising efforts in (19.22) and (19.23) are all the more reason
greater than the minimum advertising efforts a0∗L (t) and al (t). Nonetheless, it’s
0∗
not possible to determine a priori neither if they are greater or lower than the
increased advertising efforts in (19.17), (19.20), (19.18) and (19.21). In order to
compare ps and pC , many parameters influence their values and generally not all
the presented scenarios are practicable. It is easy to prove that cases c/1 − r <
ps < pc and ps < c/1 − r < pc never occur, in fact from ps < pc , it follows
that pc < c/1 − r. Furthermore, let be T = 30, cL = 0.15, cl = 0.2, c = 8.1,
pR = 8, β = 0.5, γL = 0.75, γl = 0.7, r = 0.1 and δ = 0.1; then c/1 − r = 9 and
according to the value of G0 , we have the following results
– If G0 = 50, then pc = 8.265239040, ps = 8.095855766 and therefore
ps < pc < c/1 − r;
pc < ps < c/1 − r;
pc < c/1 − r < ps ;
c/1 − r < pc < ps .

Observe that situation c/1−r < pc < ps occurs for high initial brand values: only the
brand owner of a well known brand can take into account the free-riding situation
of binding the licensee to sustain the brand.
Obviously the leader will adopt the strategy which leads him a greater profit.
With this task, let’s denote by JL∗kw , with k, w ∈ {0, 1}, the optimal profit of
the licensor if he adopts the advertising strategy akL and the follower adopts the
∗1 , the optimal profit of the
advertising effort awl . Analogously, let’s denote by JLC
licensor while the players adopt the cooperation advertising strategies a∗LC and a∗lC
in problem PC1 .
A possible rational rule to obtain the players’ profits in a cooperation context is
∗1 ≤ J ∗1 .
the Nash Bargaining solution, see [3, 11], in any case it must be JLC C
Turning back to the licensor’s decision, it’s a well known result that cooperation
leads to greater profits, that is JLC∗1 > J ∗10 . It can be easily proved also that each
L
player earns more when the other one takes care about the brand sustainability
constraint, just because the goodwill is increased by the effect of the other player’s
additional advertising effort. This can be formalized as follows: JL∗10 < JL∗01 .
It remains to check if it turns out convenient to the leader to cooperate or to bind
the licensee to care about brand sustainability. This is not possible to determine a
priori, as this comparison depends on the values of the many problem’s parameters.
Nevertheless, once considered a particular instance of the problem, it can be easily
found the licensor’s best choice by comparing the optimal profits JLC ∗1 and J ∗01
L
evaluated with the particular values of the parameters which characterize such an
instance.
An interesting result is that the cooperative solution doesn’t always lead to a
greater profit for the licensor. In fact, let be T = 30, cL = 0.15, cl = 0.2, c =
8.1, pR = 8, β = 0.5, γL = 0.75, γl = 0.7, r = 0.1, δ = 0.1, θ = 0.5 and p = 10;
either if G0 = 1, 543, or G0 = 1, 545, we have that 9 = c/1 − r < p < pc < ps and
therefore it makes sense to consider the free-riding situation. In the former case
JC∗1 − JL∗01 = 28.48809 > 0, while in the latter JC∗1 − JL∗01 = −28.50707 < 0. Being
∗1 < J ∗1 , by definition, we have proved that there exists, at least, one situation
JLC C
in which the licensor’s strategy of binding the licensee to take care about brand
∗1 < J ∗01 .
sustainability is, for him, more profitable than cooperating, that is JLC L
19.5 Conclusions and Further Developments
We have tackled the problem of determining the optimal advertising strategies in

a licensing contract, in order to maximize the profits of the two players involved
in the contract and guaranteeing the brand sustainability. We considered the selling
price as constant and we have analyzed different scenarios. We have found out that
there exists a minimum selling price the brand owner can impose to the licensee
to assure brand sustainability. In the case that the licensor either cannot bind the
licensee to fix such a minimum price, or he doesn’t want to do it, he can still
guarantee the brand sustainability through an additive advertising. He can share
with the licensee the additional advertising effort necessary the sustain the brand,
he can increase his usual advertising effort, without binding the licensee to do the
same, finally he can force the licensee to increase her advertising effort considering
the sustainability constraint in her advertising plan, without doing himself such a
strategy. We have considered the fully cooperative solution too and considered it
as a benchmark to compare with the previous strategies. We have found out that
in this particular problem the cooperative solution doesn’t always lead to a greater
profit for the licensor, as he can use the strategy of binding the licensee to do more
advertising. That’s why we do not consider an incentive strategy approach to obtain
the cooperative solution as an equilibrium. This can be the idea for a further analysis,
of course only for the cases in which the cooperative solution do effectively brings
388 A. Buratto
the licensor to a greater profit. Another interesting, and non trivial, approach consists
in considering price as a constant decision variable to be determined using the theory
of optimal processes with parameters. Such an approach has been used in [10] for an
optimal control problem, but to the best of my knowledge nothing similar has ever
been applied to a Stackelberg differential game. Considering the pricing problem
requires to analyze the problem also from the follower’s point of view; this can be
done determinating the optimal selling price which takes into account production’s
costs and maximizes the licensee’s profit.
Appendix. Proofs
Proof of Theorem 19.1
Let’s determine the licensee’s best response. The follower problem is

T
cl 2
max al ≥0 Jl (aL , al , p) = ((1 − r)p − c)(G(t) − θ p) − a (t) dt,
0 2 l
subject to Ġ(t) = γL aL (t) + γl al (t) − δ G(t) + β (p − pR),
G(0) = G0 ,
G(T ) ≥ G0 ,
al (t) ≥ 0, ∀t ∈ [0, T ].
The Hamiltonian function is

Hl (G, al , λl ,t) = ((1 − r)p − c)(G − θ p) − c2l a2l
+λl (γL aL + γl al − δ G + β (p − pR)) . (19.24)
it’s derivative w.r.t. al is
∂ Hl (G, al , λl ,t)
= −cl al + γl λl (19.25)
∂ al
and the stationary point is
λl (t)γl
al (t) = , t ∈ [0, T ]. (19.26)
cl
The Hamiltonian function is concave in (G, al ) , therefore Mangasarian suffi-

ciency theorem holds.
The co-state equation is
∂ Hl
λ̇l (t) = − = − ((1 − r) p − c) + δ λl (t), (19.27)
∂G
and solved it gives
(1 − r)p − c
λl (t) = 1 − e−δ (T −t) + λl (T ) e−δ (T −t) , (19.28)

δ
where λl (T ) satisfies the transversality conditions
λl (T ) ≥ 0 and λl (T ) (G(T ) − G0) = 0. (19.29)
Recalling that we assumed

(1 − r)p ≥ c, (19.30)
it follows that the λl (t) > 0 for all t ∈ [0, T ], therefore the optimum advertising effort
for the licensee is

λl (t)γl
a∗l (t) = max ,0
cl

γl (1 − r)p − c −δ (T −t)

−δ (T −t)
= 1− e + λl (T ) e . (19.31)
cl δ
If condition (19.30) didn’t hold, then it would not be convenient for the licensee
to produce at all. Therefore, it would not even be convenient to advertise.
Let’s substitute the follower optimal strategy into the state equation of the
leader’s problem
Ġ(t) = γL aL (t) + γl a∗l (t) − δ G(t) + β (p − pR) .
His Hamiltonian function is

cL
HL (G, aL , λL ,t) = rp (G − θ p) − a2L
2
+λL (γL aL + γl a∗l (t) − δ G + β (p − pR)) . (19.32)
Its derivative w.r.t. aL is
∂ HL (G∗ (t), aL , λL ,t)

= −cL aL + γL λL (19.33)
∂ aL
390 A. Buratto

λL (t)γL
aL (t) = , ∀t ∈ [0, T ] . (19.34)
cL
Mangasarian’s Theorem holds for the licensor’s solution too, in fact his Hamil-
tonian function (19.32) is concave in (G, aL ).
∂ HL
λ̇L (t) = − = −rp + λL(t)δ ; (19.35)
∂G
solved it gives
rp
−δ (T −t) rp rp
λL (t) = λL (T ) − e + = 1 − e−δ (T −t) + λL (T ) e−δ (T −t) ≥ 0,

δ δ δ
(19.36)
where λL (T ) satisfies the transversality conditions
λL (T ) ≥ 0 and λL (T ) (G(T ) − G0) = 0. (19.37)
So that the optimum advertising effort for the licensor is

γL rp
a∗L (t) = max 1 − e−δ (T −t) + λL (T ) e−δ (T −t) , 0
cL δ
γL rp

= 1 − e−δ (T −t) + λL (T ) e−δ (T −t) . (19.38)
cL δ
The advertising efforts a∗l (t) and a∗L (t) given in (19.31) and (19.38) with λl (T ) ≥
0 and λL (T ) ≥ 0 such that λl (T )(G(T ) − G0 ) = 0 and λL (T )(G(T ) − G0 ) = 0
constitute a Stackelberg equilibrium and it can be proved that such equilibrium is
time consistent as it coincides with the Markovian Nash Equilibrium.
In order to determine the values of parameters λL (T ) and λl (T ) from transver-
sality condition, let’s solve the motion equation with the initial condition

Ġ(t) = γL a∗L (t) + γl a∗l (t) − δ G(t) + β (p − pR) , t ∈ [0, T ] ,
(19.39)
G(0) = G0 .
The differential equation in (19.39), can be rewritten as
γL2 rp
Ġ(t) = (1 − e−δ (T −t) ) + λL (T ) e−δ (T −t)
cL δ

γ 2 (1 − r) p − c
+ l 1 − e−δ (T −t) + λl (T ) e−δ (T −t) − δ G(t) + β (p − pR)

cl δ
= −δ G(t) + H e−δ (T −t) + K,
where
ηL = λL (T ), ηl = λl (T ),
2 2
γL γl
2
γL rp γl2 (1 − r)p − c
H= ηL + ηl − + ,
cL cl cL δ cl δ
γL2 rp γl2 (1 − r) p − c
K= + + β (p − pR) .
cL δ cl δ
The solution gives the brand value function
e−δ (T −t) − e−δ (T +t) 1 − e− δ t

G(t) = G0 e−δ t + H+ K
2δ δ
e−δ (T −t) − e−δ (T +t)
= G0 e − δ t +
2δ
2 2 2
γL γl γL rp γl2 (1 − r)p − c
× ηL + ηl − s +
cL cl cL δ cl δ
− δ 2
1− e t γL rp γl (1 − r) p − c
2
+ + + β (p − pR) , (19.40)
δ cL δ cl δ
whose value at the final time is

2 2
1 − e−2δ T
−δ T γL γl2 γL rp γl2 (1 − r)p − c
G(T ) = G0 e + ηL + ηl − +
2δ cL cl cL δ cl δ

1 − e−δ T γL2 rp γl2 (1 − r) p − c
+ + + β (p − pR) . (19.41)
δ cL δ cl δ
Let’s observe that G(T ) is linear in ηL and ηl , furthermore G(T ) ≥ G0 if and only
γ2 γl2
if ( cLL ηL + cl ηl ) ≥ η0 , where
2
1 −δ T γL rp γl2 (1 − r)p − c
η0 = 2(δ G0 − β (p − pR)) − (1 − e ) + .
1 + e− δ T cL δ cl δ
From the transversality conditions the following situations may occur:

• G(T ) > G0 , therefore ηL = ηl = 0. The advertising efforts are (19.15) and
(19.16). This happens if p > ps where ps is given in (19.14);
• ηL > 0 and ηl > 0, therefore G(T ) = G0 . The advertising efforts are (19.17) and
γ2 γl2
(19.18) with ηL and ηl such that ( cLL ηL + cl ηl ) = η0 ;
392 A. Buratto
• ηL > 0, therefore G(T ) = G0 and ηl = 0 so that ηL = cL

η
γL2 0
≥ 0, and the
advertising efforts are (19.20) and (19.16);
• ηl > 0, therefore G(T ) = G0 and ηL = 0 so that ηl = cl
η
γl2 0
≥ 0, and the
advertising efforts are (19.15) and (19.21).
Proof of Theorem 19.2
The Hamiltonian function is

cL a2L + cl a2l
HC (G, aL , al , λC ,t) = (p − c)(G − θ p) −
2
+λC (γL aL + γl al − δ G + β (p − pR)) , (19.42)
it’s derivatives w.r.t. aL and al are
∂ HC (G, aL , λC ,t)
= −cL aL + γL λC , (19.43)
∂ aL
∂ HC (G, al , λC ,t)
= −cl al + γl λC (19.44)
∂ al

λC (t)γL λC (t)γC
(aL (t), al (t)) = , , t ∈ [0, T ]. (19.45)
cL cl
Mangasarian’s Theorem holds for the licensor’s solution too, in fact his Hamil-
tonian function (19.42) is concave in (G, aL , al ).
∂ HC
λ̇C (t) = − = −(p − c) + λC(t)δ , (19.46)
∂G
solved it gives
p−c
λC (t) = 1 − e−δ (T −t) + λC (T ) e−δ (T −t) , (19.47)

δ
where λC (T ) satisfies the transversality conditions
λC (T ) ≥ 0 and λC (T ) (G(T ) − G0) = 0. (19.48)

Let’s observe that λC (t) ≥ 0 as p ≥ c/(1 − r) ≥ c, therefore the optimal

advertising efforts are

γL p − c
a∗LC (t) = 1 − e−δ (T −t) + λC (T ) e−δ (T −t) , (19.49)

cL δ

γl (p − c −δ (T −t)

−δ (T −t)
a∗lC (t) = 1− e + λC (T ) e . (19.50)
cl δ
The motion equation can be rewritten as
Ġ(t) = −δ G(t) + M e−δ (T −t) + N,

where
ηC = λC (T ),
2
γL γl2 p−c
M= + ηC − ,
cL cl δ
2
γL γl2 p − c
N= + + β (p − pR) .
cL cl δ
It’s solution, with the initial condition G(0) = G0 is
e−δ (T −t) − e−δ (T +t) 1 − e− δ t

G(t) = G0 e−δ t + M+ N, (19.51)
2δ δ
whose value at the final time is
1 − e−2δ T 1 − e− δ T
G(T ) = G0 e−δ T + M+ N.
2δ δ
From the transversality condition we obtain
⎧ ⎫
⎪
⎪ ⎪
⎪
⎨ 2(δ G − β (p − p )) p − c 1 − e− δ T ⎬
0 R
ηC = max − , 0 .
⎪
⎪ γL2 + γl2 (1 + e−δ T ) δ 1 + e− δ T ⎪ ⎪
⎩ cL c
⎭
l
References
1. Aaker D.A.: Managing Brand Equity. The Free Press, New York (1991)
2. Amaldoss, W., Jain., S.: Pricing of conspicuous goods: a competitive analysis of social effects.
J. Marketing Res. 42(1), 30–42 (2005)
394 A. Buratto
3. Binmore, K., Osborne, M.L., Rubinstein, A.: Noncooperative models of bargaining. In:
Aumann, R.J., Hart, S. (eds.) Handbook of Game Theory with Economic Applications. North-
Holland, Amsterdam (1992)
4. Buratto, A., Zaccour, G.: Coordination of advertising strategies in a fashion licensing contract.
J. Opt. Theory Appl. 142(1), 31–53 (2009)
5. Caulkins, J.P., Feichtinger, G., Kort P., Hartl, R.F.: Brand image and brand dilution in the
fashion industry. Automatica 42, 1363–1370 (2006)
6. Caulkins, J.P., Hartl, R.F., Feichtinger, G.: Explaining fashion cycles: imitators chasing
innovators in product space. J. Econ. Dyn. Control 31, 1535–1556 (2007)
7. Chintagunta, P., Sigué, J.P.: Advertising strategies in a franchise system. Eur. J. Oper. Res. 198,
655-665 (2009)
8. Dockner, E.J., Jørgensen, S., Van Long, N., Sorger, G.: Differential Games in Economics and
Management Science. Cambridge University Press, Cambridge (2000)
9. Jørgensen, S., Di Liddo, A.: Design imitation in the fashion industry. In: Annals of the
International Society of Dynamic Games, vol. 9, pp. 569–586. Birkhauser, Boston (2007)
10. Jørgensen, S., Kort P., Zaccour, G.: Optimal pricing and advertising policies for an entertain-
ment event. J. Econ. Dyn. Control 33, 583–596 (2009)
11. Jørgensen, S., Zaccour, G.: Differential Games in Marketing. Kluwer Academic, Boston (2004)
12. Keller, K.L.: Strategic Brand Management: Building, Measuring, and Managing Brand Equity,
3rd ed. Prentice-Hall, New York (2007)
13. Leibenstein, H., Bandwagon, Snob, Veblen: Effects in the Theory of Consumers’ Demand.
Quart. J. Econ. 64(2), 183–207 (1950)
14. Martı́n-Herrán, G., Sigué, S.P., Zaccour, G.: Strategic interactions in traditional franchise
systems: Are franchisors always better off? Eur. J. Oper. Res. 213(3), 526–537 (2011)
15. Martı́n-Herrán, G., Taboubi, S., Zaccour, G.: On yopia in a dynamic marketing channel.
G-2006-37, GERAD (2006)
16. Nerlove, M., Arrow, K.J.: Optimal advertising policy under dynamic conditions. Economica
39(114), 129–142 (1962)
17. Power, D., Hauge, A.: No man’s brand – brands, institutions, and fashion. Growth Change
39(1), 123–143 (2008)
18. Raugust, K.: The Licensing Business Handbook, 8th Ed. EPM Communications, New York
(2012)
19. White, E.P.: Licensing. A Strategy for Profits. KEW Licensing Press, Chapel Hill (1990)
Chapter 20
Cost–Revenue Sharing in a Closed-Loop
Supply Chain
Pietro De Giovanni and Georges Zaccour
Abstract We consider a closed-loop supply chain (CLSC) with a single

manufacturer and a single retailer. We characterize and compare the feedback
equilibrium results in two scenarios. In the first scenario, the manufacturer invests
in green activities to increase the product-return rate while the retailer controls the
price. A Nash equilibrium is sought. In the second scenario, the players implement
a cost–revenue sharing (CRS) contract in which the manufacturer transfers part
of its sales revenues, and the retailer pays part of the cost of the manufacturer’s
green activities program that aims at increasing the return rate of used products. A
feedback-Stackelberg equilibrium is adopted, with the retailer acting as the leader.
Our results show that a CRS is successful only under particular conditions. While
the retailer is always willing to implement such a contract, the manufacturer is better
off only when the product return and the remanufacturing efficiency are sufficiently
large, and the sharing parameter is not too high.
Keywords Closed-loop supply chain • Differential game • Product return

• Green activities • Support program • Incentive • Contract
P. De Giovanni ()
Department of Information, Logistics and Innovation, VU University Amsterdam,
de Boelelaan 1105, 3A-31, 1081 HV Amsterdam, The Netherlands
G. Zaccour
GERAD, HEC Montréal, 3000, chemin de la Côte-Sainte-Catherine, Montréal,
QC H3T 2A7, Canada
396 P. De Giovanni and G. Zaccour
20.1 Introduction
A closed-loop supply chain (CLSC) combines forward and reverse supply-chain

activities into a single system, to improve the environmental performance [38],
create new economic opportunities, and provide competitive advantages to partic-
ipants [15]. Forward activities include new product development, product design
and engineering, procurement and production, marketing, sales, distribution, and
after-sale service [45]. Reverse activities refer to all those needed to close the loop,
such as product acquisition, reverse logistics, points of use and disposal, testing,
sorting, refurbishing, recovery, recycling, re-marketing, and re-selling [16,25] . The
integration of these activities makes it possible to recover of residual value from
used products, thus reducing the amount of resources needed for production while
also conserving landfill space and reducing air pollution [3].
Firms are interested in closing the loop when producing with used components
is less costly than manufacturing with new materials [43]. Several empirical
and case studies (see, e.g., [18, 45]) have already highlighted the relevance of
CLSC for business and government, while the comprehensive reviews provided by
Fleischmann et al. [17], Dekker et al. [11] and Atasu et al. [2] report on what has so
far been achieved, and on the issues that still need to be addressed.
Because it has the most to gain from a CLSC, it is the manufacturer (or
remanufacturer) that closes the loop, by managing, often exclusively, the product-
return process. Guide [23] reports that 82 % of the sample firms collect directly
from customers. However, this percentage does not always need to be that high,
and other members of the supply chain can also play a significant role. Savaskan
et al. [43] characterize four configurations of a channel in which a manufacturer, a
retailer, or an OEM (original equipment manufacturer) does the collection, and they
demonstrate that all supply chain members can be involved in the return process as
long as the manufacturer provides the right incentives. Bhattacharya et al. [6] model
a three-player game where a remanufacturer always does the collecting. Depending
on the type of contract, several coordination mechanisms may align the players’
goals.
In a general CLSC configuration, the manufacturer invests in green-activity
programs for the main purpose of increasing product returns [23]. These activities
may include marketing expenditures (e.g., advertising) to increase the customers’
knowledge about the return policy [30] as well as operational activities, such as
collection, inspection, reprocessing, disposal, and redistribution [38], that aim to
increase remanufacturing efficiency and create a suitable logistics network. The
integration of these activities leads to a new strategy for product returns, that
emphasizes not only operational aspects but also social and environmental issues,
through a suitable asset-recovery policy [27]. This is the strategy currently being
used by some firms, such as Kodak, which advertises the return of single-use
cameras to create customer knowledge about the resulting environmental and social
benefits (Kodak.com). The manufacturer may decide either to manage all green
activities exclusively and reap all of the cost saving, or to involve other players and
20 Cost–Revenue Sharing in a Closed-Loop Supply Chain 397
share the economic benefits [43]. In the latter case, the manufacturer should design
an adequate contract, provide attractive incentives for collaborating in closing the
loop, and properly share the economic advantages of remanufacturing [8, 13].
This paper contributes to this research area by developing a dynamic CLSC
game where a cost-sharing program for green activities is introduced along with a
reverse-revenue-sharing contract (RRSC). As reported by Geng and Mallik [19], an
RRSC is a good option when the upstream player wants to involve the downstream
player in a specific activity. For instance, Savaskan et al. [43] show that, when a
retailer is involved in the product-return process, the CLSC performs better. We
confine our interest to a single-manufacturer-single-retailer case and characterize
and contrast the equilibrium strategies and outcomes in two scenarios. In the first
scenario, referred to as Benchmark scenario, the two firms choose non-cooperatively
and simultaneously their strategies. In the second scenario, referred to as CRS, the
players share the manufacturer’s sales revenues and the cost of the green activities.1
In both cases, the manufacturer controls the rate of green activities and the retailer
controls the price. By contrasting the results of the two scenarios, we will be able
to assess the impact of implementing an active approach to increasing consumers’
environmental awareness, and, by the same token, the return rate of used products.
When the retailer contributes to the manufacturer’s activities, the game is played la
Stackelberg. This game structure is common in the literature on marketing channels
(see, e.g., the books [31, 35] operations (e.g., [29]), as well as environmental
management (e.g., [43]).
There is a growing game-theoretic literature that deals with CLSCs, see, e.g.,
[1, 3, 10, 14, 21, 24, 39, 43]. While these contributions investigate CLSCs in static or
two-period games, here we seek to evaluate the CLSC in a dynamic setting. Guide
et al. [27] emphasize the importance of time in managing product returns, which are
subject to time-value decay. Ray et al. [42] evaluate profits and pricing policy under
time-dependent and time-independent scenarios—namely, age-dependent and age-
independent differentiation—and show that the attractiveness of remanufacturing
changes substantially. Finally, Savaskan et al. [43] advise researchers that the CLSC
should be investigated as a dynamic phenomenon, as the influence of dynamic
returns changes channel decisions. Our paper takes up this challenge and proposes
a differential game to analyze equilibrium returns and pricing strategies in the two
scenarios described above.
Our main results can be summarized as follows:
A1. A CRS alleviates the double-marginalization problem in the supply chain. The
consumer pays a lower retail price and demands more product.
A2. The investment in green activities and the return rate of used products are
higher in the CRS scenario than in the benchmark game. The environment
also benefits from the implementation of a CRS contract.
1 What we have in mind here is similar to cooperative advertising programs, where typically, a
manufacturer pays part of the cost of promotion and advertising activities conducted locally by its
retailers. Cooperative advertising programs have been studied in the marketing literature, in a static
setting (e.g., [4, 5, 36]), as well as in a dynamic context (e.g., [32–34]).
A3. The retailer always prefers the CRS scenario to the benchmark scenario. The
manufacturer does the same, under certain conditions involving the revenue-
sharing parameter, the return rate, and the level of cost reduction attributable to
remanufacturing. The conclusion is that a CRS is not always Pareto improving.
The paper is organized as follows. In Sect. 20.2 we state the differential game
model and in Sect. 20.3 we characterize the equilibria in the two scenarios.
Section 20.4 compares strategies and outcomes. Section 20.5 briefly concludes.
20.2 Model and Scenarios
Consider a supply chain made up of one manufacturer, player M, and one retailer,
player R. Let time t be continuous and assume an infinite planning horizon.2 The
manufacturer can produce its single good using new materials or old materials
extracted from returned past-sold products. This second option is common practice
in many industries.3 Managing returns effectively represents one key edge that is
required for CLSC to succeed. Guide et al. [24] presents two streams of practices
for managing returns: a passive and an active approach. The passive approach to
returns (waste-stream approach) consists of waiting and hoping that the customers
return their products. An active (market-driven) approach instead implies that CLSC
participants are continuously involved in controlling and influencing the return rate
by setting appropriate strategies. An active approach makes it possible to manage
the forward activities as a further source of economic benefits [24].
The literature in CLSC can be divided into three streams in terms of modeling
the return rate of used products. The first stream adopted a passive approach and
assumed the return rate to be exogenous (see, e.g., [3, 14, 16, 21, 27]). The second
stream also adopted a passive approach, but modelled the return rate as a random
variable, e.g., an independent Poisson (see, e.g., [1]). The third group of studies
considered an active approach, with the return rate being a function of a player’s
strategy (see, e.g., [43, 44]). We follow this last view. More precisely, we suppose
that the manufacturer can increase the rate of return for previously sold products by
investing in a “green” activity program (GAP). Examples of such activities include
advertising and communications campaigns about the firm’s recycling policies,
2 The assumption of infinite-planning horizon is mainly for tractability.

3 To illustrate, the remanufacturing sector in the US has reached over $53 billion in sales, and
includes over 70,000 firms and 480,000 employees [28]. Large retailers can have return rates in
excess of 10 % of sales, and manufacturers, such as Hewlett-Packard, report product returns that
exceed 2 % of total outbound sales [3]. Xerox practices asset recovery and remanufacturing for its
photocopiers and toner cartridges in the US and abroad; it estimates its total cost-savings at over
$20 million per year [40]. The company saves 40–65 % in manufacturing costs through the re-use
of parts and materials [22]. Kodak collects almost 60 % of the single-use cameras it sells worldwide
and satisfies 58 % of its demand with cameras containing re-used components and almost 80 % of
the materials may be re-used [21, 26].
logistics services, monetary and symbolic incentives, employees-training programs,

etc. Denote by A (t) the level of these activities at time t, t ∈ [0, ∞), and assume that
their cost can be well approximated by the increasing convex function
uM (A(t))2
C (A(t)) = , (20.1)
2
where uM > 0 is a scaling parameter.4 We suppose that the return rate r (t), depends
on the whole history, and not only on the current level of green activities. A
common hypothesis in such a context is to assume that r (t) corresponds to a
continuous and weighted average of past green activities with an exponentially
decaying weighting function. This assumption is intuitive because the return rate
is related to environmental awareness, which is a “mental state” that consumers
acquire over time, not overnight. This process is captured by defining r (t) as a state
variable whose evolution is governed by the linear-differential equation
ṙ (t) = A (t) − δ r (t) , r(0) = r0 ≥ 0, (20.2)
where δ > 0 is the decay rate and r0 is the initial rate of return.
The main economic benefit of the CLSC for the manufacturer is given by the
saved cost (see, e.g., [39]). Following [43], we adopt the unit-production cost
function:
C (r (t)) = cn (1 − r(t)) + cur(t), (20.3)
where cn > 0 is the cost of producing one unit with new raw materials, and cu > 0 is
the cost to produce one unit with used material from returned products, with cu < cn .
The above equation can be rewritten as
C (r (t)) = cn − (cn − cu ) r (t) ,
and, therefore, the difference cn −cu is the marginal remanufacturing efficiency (cost
saving) of returned products. The manufacturer incurs the highest unit cost cn when
r(t) = 0, and the lowest unit cost cu is achieved when all previously purchased
products are returned, i.e., for r(t) = 1. In (20.2)–(20.3), we implicitly assume that
products may be returned independently of their condition, and that a good can be
remanufactured an infinite number of times. In practice, this clearly does not hold
true. For instance, Kodak’s camera frame, metering system, and flash circuit are
designed to be used up to six times [37] and any additional use compromises the
product’s reliability. Therefore, our functional forms in (20.2)–(20.3) are meant to
be rough approximations of return dynamics and cost savings. In the conclusion we
discuss some (necessarily much more complicated) avenues that are worth exploring
in future investigations.
4 Think of A (t) as a composite index of the green activities.

Denote by p (t) the retail price controlled by the retailer. We suppose that the
demand for the manufacturer’s product is given by
D (p (t)) = α − β p (t) , (20.4)
where α > 0 is the market potential and β > 0 represents the marginal effect of
pricing on current sales. To have nonnegative demand, we assume that p (t) ≤ α /β .
Two comments are in order regarding this demand function. First, following a
long tradition in economics, we have chosen a linear form. In addition to being
tractable, this choice is typically justified by the fact that such a demand form is
derivable from the consumer-utility function. Second, we follow [43] and suppose
that D (·) is independent of the return rate. Put differently, we are assuming here
that the CLSC’s main purpose is as a cost-saving rather than a demand-enhancing
mechanism. Denote by ω the constant wholesale price charged by the manufacturer,
with cn < ω < p (t) ≤ α /β . The lower bound ensures that the manufacturer’s margin
is positive even when there is no recycling. The second inequality ensures that the
retailer’s margin is nonnegative.
Up to now, our formulation states that the manufacturer is taking care of the
CLSC’s operational features, and that the marketing decisions (represented by
pricing) are left to the retailer, who is not at all involved in recycling. Although the
players follow an individual profit-maximization objective, they still may attempt
to link their activities to achieve higher economic benefits for both of them. For
instance, IBM and Xerox coordinate their recovery activities with their suppliers
in order to increase their profitability [18, 24]. IBM gives the responsibility for
managing all product returns worldwide to a dedicated business unit called Global
Asset Recovery Services, that collects, inspects, and assigns a particular recovery
option (resale or remanufacturing), and that maximizes the chain’s efficiency by
coordinating its activities with IBM refurbishment centres worldwide [18].
Here we explore a setting where: (a) the retailer financially supports the
manufacturer’s GAP; and, (b) the manufacturer designs an incentive mechanism to
compensate the retailer for this participation, and to better coordinate the CLSC.
Denote by B (t) , 0 ≤ B (t) ≤ 1, the support rate, to be chosen by the retailer,
in the total cost of the GAP. Consequently, the retailer pays B (t)C (A(t)) and
the manufacturer contributes the remaining portion, i.e., (1 − B (t))C (A(t)). The
rationale for the retailer to participate in the manufacturer’s GAP is the premise that
the combined efforts of the two players would lead to a higher return rate for used
products, and consequently, to a lower production cost and wholesale price.
Denote by I (r (t)) the state-dependent incentive provided by the manufacturer
to the retailer. The incentive assumes the traditional form as presented by [7, 9, 20],
where the manufacturer transfers a share of his revenues to the retailer in order to
modify her strategies. This way of modeling the incentive differs from the traditional
scheme elaborated in the literature of CLSC. Typically, the incentive schemes
assume the form of payment, where the manufacturer pays a certain per-unit amount
when another player returns a product [43, 44]. Alternatively, rebates on new sales
can also coordinate a CLSC [21]. Other valid alternative contract schemes link the
incentive to some operational features. For instance, Guide et al. [24] characterized
a quality-dependent price incentive for used products; Guide et al. [27] suggest
the integration between return management with inventory management (VMI)
and resource management (employees). Our way of modeling the incentive is
analogous to the incentive modeled by ReCellular Inc. and presented by [24], where
the manufacturer offers a two part incentive that is formed of a fix (direct) per
unit component as well as of a variable (indirect) component that depends of the
operational (collecting) costs.
Similarly, our incentive consists of a share of the manufacturer’s revenues that is
transferred to the retailer and that is formed of a fix state-independent part, as well
as of a variable state-dependent component. In this sense, instead of focusing only
on its main strengths—reduction of the double-marginalization effect, lower price
and higher demand (see, e.g., [7, 9, 20]—a two-parameter contract implemented
in a CLSC enhances collaboration in product return management. In [7, 9, 20] the
incentive depends only on the sharing parameters, wholesale price, and production
cost, while in our model it also is a function the remanufacturing cost and the
return rate.
Assuming profit-maximization behavior, the players’ objective functionals are
then given by
∞
uM
JM = e−ρ t (α − β p (t)) (ω − C (r (t)) − I (r(t))) − (1 − B (t)) A(t)2 dt,
0 2
(20.5)
∞
uM
JR = e−ρ t (α − β p (t)) (p (t) − ω + I (r(t))) − B (t) A(t)2 dt, (20.6)
0 2
where ρ is the common discount rate.

In summary, by (20.2), (20.5), and (20.6) we have defined a two-player differen-
tial game with three control variables A(t) ≥ 0, p(t) ≥ 0, and 0 ≤ B(t) ≤ 1, and one
state variable r(t), with 0 ≤ r(t) ≤ 1.
20.2.1 The Scenarios
We shall characterize and compare equilibrium strategies and outcomes for two
scenarios. In both of them, the assumption is that the players use Markovian
strategies, i.e., strategies that are functions of the state variable. Further, we restrict
ourselves to stationary strategies, that is, strategies that only depend on the current
value of the state variable, and not explicitly on time.
Benchmark Scenario: The retailer does not participate in the green activities
program of the manufacturer, and the latter does not offer any incentive to
coordinate the CLSC, i.e., B (t) ≡ 0 and I (r(t)) ≡ 0, ∀t ∈ [0, ∞). The game is played
noncooperatively and a feedback-Nash equilibrium is sought. Equilibrium strategies
and outcomes will be superscripted with N (for Nash).
Cost-Revenue Sharing Scenario: We assume that the retailer is the leader and
announces its support rate for the green activities conducted by the manufacturer,
who acts as the follower. The right (subgame-perfect) equilibrium concept in
such a setting is the feedback-Stackelberg equilibrium. Equilibrium strategies and
outcomes will be superscripted with S (for Sharing or Stackelberg). Denote by φ the
percentage of revenues transferred from the manufacturer to the retailer to stimulate
green investments and coordinate the CLSC. Under CRS, we have
I S (r) = φ [ω − C (r (t))] = φ (ω − cn) + φ (cn − cu ) r (t) . (20.7)
As a consequence of this transfer, the manufacturer’s margin (mSM (r)) and the
retailer’s margin (mSR (r)) become
mSM (r) = (1 − φ )(ω − C (r (t))) ,

mSR (r) = p (t) − ω (1 − φ ) − φ C (r (t)) .
The incentive scheme in (20.7) is made of two parts, with one being independent of
the return rate (φ (ω − cn )), and the other being a positive and increasing function
in the return rate (φ (cn − cu ) r (t)). This shows that the retailer has a vested interest
in contributing to a higher return rate.
From now on, we will omit the time argument when no ambiguity may arise.
20.3 Equilibria
In the following two subsections, we characterize the equilibria in the two scenarios
described above.
20.3.1 Benchmark Equilibrium
Recall that in this scenario, the players choose, simultaneously, and independently
their strategies to maximize their individual profits, with B (t) ≡ 0 and I (r(t)) ≡
0, ∀t ∈ [0, ∞).
Proposition 20.1. The equilibrium GAP and price strategies are given by
(cn − cu ) (α − β ω )
AN = > 0, (20.8)
2uM (ρ + δ )
α +βω
pN = > 0. (20.9)
2β
The value functions of the two players are given by

(α − β ω ) (cn − cu) (ω − cn ) (cn − cu )2 (α − β ω )
VMN (r) = r+ + , (20.10)
2 (ρ + δ ) ρ 4uM (ρ + δ )2 ρ
(α − β ω )2
VRN (r) = . (20.11)
4β ρ

The strategies in (20.8) and (20.9) are constant, i.e., independent of the state
variable (return rate). This is a by-product of the linear-state structure of the differ-
ential game played in this scenario. Further, as the retailer’s objective functional is
independent of the state, its pricing strategy is obtained by optimizing the short-term
(or current) profit. Consequently, its value function is state-independent and is given
as a discounted stream of constant profits, i.e.,
∞

−ρ t (α − β ω )2
VRN (r) ≡ VRN = e dt,
4β
0
where (α −4ββω ) is the instantaneous payoff. The manufacturer’s value function is

2
increasing in the return rate. Indeed,
∂ VMN (α − β ω )(cn − cu )
= > 0.
∂r 2 (ρ + δ )
This result provides the rationale for the next scenario. Indeed, as it is in the best
interest of the manufacturer to increase the level of used-product returns, it is
tempting to provide an incentive to the retailer to induce a greater contribution to the
green-activity program. It remains to be seen under which conditions this incentive
is profitable for the manufacturer.
Substituting for green expenditures in the state dynamics (20.2) and solving gives
the following trajectory for the return rate:

1 − e− δ t N
r (t) =
N
A + e−δ t r0 > 0.
δ
The steady-state value is strictly positive and given by
AN (cn − cu ) (α − β ω )
N
r∞ = = > 0. (20.12)
δ 2uM (ρ + δ ) δ
From now on we assume (and check for in the numerical simulations) that the
parameters are such that rN (t) ≤ 1, ∀t ∈ [0, ∞).
20.3.2 Cost-Revenue Sharing Equilibrium
The RRSC was recently introduced by [19] to coordinate a supply chain in which the
upstream players have an economic incentive to coordinate. The traditional revenue
sharing contract (RSC) fits with the implementation of a coordination strategy that
is mainly driven by the retailer [7, 12]. The RSC, in fact, mitigates the double-
marginalization effect and creates efficiency along the chain. While the retailer
transfers a share of its revenues to the manufacturer, it also buys at a lower wholesale
price. Consequently, price decreases and demand increases.
In the RRSC, however, the retailer receives a share of the manufacturer’s net
revenues. The manufacturer wishes to influence the retailer’s strategies by offering
an attractive economic incentive. This type of contract fits adequately with the
CLSC’s targets where the manufacturer has the highest incentive to close the loop.
Further, in the marketing literature dealing with cooperative advertising programs,
the context is one of a manufacturer helping his retailer by paying part of the cost
of the local advertising or promotional efforts conducted by the retailer. Here, the
situation is reversed and it is the retailer who is contributing to the manufacturer’s
GAP. Therefore, the retailer plays the role of leader and the manufacturer, the role
of follower. The following proposition characterizes the equilibrium strategies.
Proposition 20.2. Assuming an interior solution, the feedback-Stackelberg equilib-
rium price, the green activities, and the participation rate are given by
α + β (ω (1 − φ ) + cn (1 − r) φ + cu φ r)
pS = , (20.13)
2β
(2 μ1 + ϕ1 ) r + 2μ2 + ϕ2
AS = , (20.14)
2uM
(2 μ1 − ϕ1 ) r + 2μ2 − ϕ2
BS = , (20.15)
(2 μ1 + ϕ1 ) r + 2μ2 + ϕ2
where ϕ1 , ϕ2 , μ1 and μ2 are the coefficients of the quadratic value functions

ϕ1 2
VMS (r) = r + ϕ2 r + ϕ3 ,
2
μ1 2
VRS (r) = r + μ2 r + μ3 ,
2
determined numerically later on.

The retail price, the investment in green activities, and the retailer’s support rate
are linear in the return rate. This result is expected in view of the linear-quadratic
structure of the differential game. The results in the above proposition are obtained
under the assumption of an interior solution. It is easy to see that for 0 ≤ r ≤ 1, the
price is strictly positive and decreasing in the return rate. As the coefficients of the
value functions cannot be obtained analytically (the six Ricatti equations are highly
coupled—see Appendix), we shall verify in the numerical simulations that the GAP
strategy is nonnegative and that the support rate BS and the return rate r are between
0 and 1. Unlike with the previous scenario, the strategies are now state-dependent,
with the price being a decreasing function of the return rate. This is intuitive because
a higher rate leads to a lower production cost. Further, the higher the percentage φ
of revenues transferred from M to R, the lower the retail price. Indeed, we have
∂ pS −ω −(cn −cu )r
∂φ = 2 < 0. Therefore, as in the literature on revenue-sharing contracts
(see, e.g., [7]), this parameter also lessens the double-marginalization problem in
RRSCs.
20.4 Numerical Results
In the S scenario, the coefficients of the value functions cannot be obtained

analytically; therefore, we proceed with numerical simulations to answer our
research questions. This section is divided into two parts. In the first, we conduct
a sensitivity analysis on the steady-state values of the control variables (price, green
activities, and retailer’s support) and on the return rate of used products. In the
second subsection, we compare the strategies and the players’ payoffs obtained in
the two scenarios.
As a base case, we adopt the following values for the different parameters:
Demand parameters: α = 1, β = 1,
Cost parameters: cn = 0.5, cu = 0.1, uM = 1,
Contract parameters: ω = 0.7, φ = 0.4,
Dynamic parameters: ρ = 0.2, δ = 0.3.
20.4.1 Sensitivity Analysis
Table 20.1 provides the results of a sensitivity analysis of the strategies and the state
variable with respect to the main model’s parameters. A positive (negative) sign
indicates that the value taken by a variable increases (decreases) when we increase
the value of the parameter. A “0” indicates that the variable is independent of that
parameter, and n.a. means not applicable. The reported results for the benchmark
game are analytical and hold true for all admissible parameter values, not only
for those shown in the table. In the S scenario, when we vary the value of a
parameter, the values of all other parameters remain at their base-case levels. Note
that the selected parameters’ values satisfy nonnegativity conditions for price, green
Table 20.1 Sensitivity p∞ B∞ A∞ r∞

analysis
N S S N S N S
α + + + + + + +
β − − − − − − −
uM 0 + + − − − −
cu 0 + + − − − −
cn 0 − − + + + +
ω + + + − + − +
φ n.a. − + n.a. + n.a. +
activities, and demand in both scenarios. They also satisfy the requirement that the
support rate and the return rate be bounded between zero and one. The results allow
for the following intuitive comments:
A1. Varying α and β yields the same qualitative impact in both scenarios for all
variables. Regarding the effect on the support rate provided by the retailer to
the manufacturer’s GAP, we obtain that a larger demand (through a higher
market potential α or a lower consumer-price sensitivity β ) induces the retailer
to increase its support.
A2. A higher uM means an upward shift in the cost of the green-activity pro-
gram. Consequently, the manufacturer reduces its effort. Although the retailer
increases its support rate to compensate for the manufacturer’s lower effort,
the final outcome is a lower return rate in the steady state. In short, the
higher the cost of green activities, the lower the environmental and economic
performance of the CLSC. The same qualitative results are obtained when
the remanufacturing cost, cu , is increased. Under such circumstances, the
manufacturer is less interested in closing the loop since the savings from
producing with used parts are lower.
A3. The higher the production cost, cn , the higher the interest of the manufacturer
in introducing used parts into production. Hence, the positive relationship
between cn and the investment in green activities. Consequently the return rate
is increasing in cn . The retailer benefits from the cost reduction and reduces the
retail price, which in turn, feeds the demand and the returns of used products.
A high cn is therefore an incentive to implement an environmental policy. In
the N scenario, the price is constant, as the production cost does not influence
the retailer’s strategy. In the S scenario, the support rate decreases in cn . The
economic incentive decreases with the production cost; and thus, the retailer’s
willingness to implement a coop program decreases accordingly.
A4. A higher wholesale price leads to a higher retail price and a lower demand.
In turn, the pool of used products is smaller and green activities become less
attractive. Consequently, the rate of return decreases.
A5. To interpret the results regarding φ , the revenue-sharing parameter in the S
scenario, we recall the margins of the two players:
mSM (r) = (1 − φ ) (ω − C (r (t))) ,
mSR (r) = p (t) + φ (ω − C (r (t))) .
Therefore, a higher φ means a higher margin for the retailer and a lower one
for the manufacturer. This incentive is achieving its goal, that is, the retailer
increases its support with φ , and consequently the manufacturer invests more
in GAP, which leads to a higher a return rate.
A6. A higher decay rate leads to lower investments in GAP, and consequently, to a
lower return rate. Also, increasing ρ , which amounts to giving more weight to
short-term profits, leads to a lower investment in GAP.
20.4.2 Comparison
We turn to the analysis of the players’ strategies and outcomes. As most of the
comparisons need to be carried out numerically, we have to limit the number of
parameters that we will let vary. It seems quite reasonable to focus on the most
important parameters in our model, namely, the incentive parameter φ and the
reduction in marginal cost due to manufacturing with used parts, i.e., cn − cu . All
other parameter values are kept at their benchmark levels.5
Retail-Price Strategies: Recall that the Nash and Stackelberg equilibrium prices are
given by
α +βω α + β (ω (1 − φ ) + cn (1 − r) φ + cu rφ )
pN = , pS = .
2β 2β
Without resorting to numerical simulations, we can make two observations: First,
u )φ
the Stackelberg price is decreasing in the return rate ( ∂∂pr = − (cn −c
S
2 ); and, second,
we have pS < pN , for all parameter values. To see this, we note that the two
equilibrium prices are related as follows:
I S (r)
pS (r) = pN − .
2β
By the nonnegativity of the incentive I S (r), we then have pS < pN . Similarly to [24]
and [10], the higher the remanufacturing efficiency, then the higher the incentive
provided by the manufacturer, and the lower the retail price, and consequently, the
higher the demand. Therefore, implementing a CRS contract alleviates the double-
marginalization problem and is beneficial to the consumer.
GAP Strategies: The investments in green activities in the two scenarios are given by
(cn − cu) (α − β ω ) (2μ1 + ϕ1 ) r + 2μ2 + ϕ2

AN = , AS = .
2uM β (ρ + δ ) 2uM
5 We ran other simulations without noticing any significant qualitative changes in the results.
Fig. 20.1 Green-activity strategy for different values of cost savings
As for the retail price, the GAP strategy is constant in the Nash scenario, and it is
linear in the return rate in the S scenario. Figures 20.1 and 20.2 display the green-
activity strategy for different values of cn −cu and φ . The following observations can
be made: First, the higher the return rate, the higher the manufacturer’s investment
in GAP. This result is partly opposed to that of [3]; they advise OEMs not to
invest into increasing the return rate if it is already high, but to focus on other
activities, e.g., the collection system’s efficiency. This would, in spirit, include the
manufacturer’s GAPs, which focus on the marketing and operational aspects of the
return process. One interpretation of our result is that, when a CLSC achieves a high
return rate, GAP investments are also required to keep up in terms of operations
(e.g., logistics network, remanufacturing process, quality-control activities) and
marketing (informing, promoting and advertising to a larger customer base). This
is in line with [21] who suggest decreasing the investment in remanufacturing
activities (e.g., product durability) when the return rate is low. Second, for any given
return rate, the manufacturer invests more in GAP in the S scenario than in the
Nash equilibrium. This result has also been reached in the cooperative-advertising
literature cited previously. The fact that the manufacturer is sharing the cost of the
green activities is in itself an incentive to do more. The implication is that the steady-
state value of the return rate in the S scenario is higher than its Nash counterpart.
Therefore, from an environmental point of view, as in [43], coordination in a CLSC
is preferable to the benchmark game. Third, for any admissible value of r, shifting
up φ leads to a higher investment in green activities.
Fig. 20.2 Green-activity strategy for different values of the sharing parameter
Support-Rate Strategies: The support rate in the S game is given by
(2 μ1 − ϕ1 ) r + 2μ2 − ϕ2
BS = .
(2μ1 + ϕ1 ) r + 2μ2 + ϕ2
Figure 20.3 shows that BS is decreasing in the return rate. When we combine result
with the previous one, namely, that GAP is increasing in r, then it is appealing
to conjecture that the manufacturer and retailer control-variables are strategic
substitutes. This can be seen by rewriting the support rate as
2uM [(2 μ1 − ϕ1) r + 2μ2 − ϕ2 ]

BS = .
AS
Therefore, a higher AS leads the retailer to lower its support rate (but not necessar-
ily the total amount of the subsidy given to the manufacturer). Figure 20.4 reveals
that the support rate increases with φ . This result is somehow expected, given that
the incentive provided by the manufacturer to the retailer is precisely to (hopefully)
drive up the retailer’s participation in the green-activity program [19]. Still, it is
interesting to mention the very significant effect that φ has on the support rate.
Indeed, increasing φ , for instance by less than 15 % (i.e., from 0.35 to 0.40) more
than triples the support provided by the retailer. Note that the positive impact of φ
on the investment in GAP is much less limited (see Fig. 20.2). Further, the higher
the production-cost saving resulting from recycling (higher cn − cu), the steeper the
Fig. 20.3 Support-rate strategy for different values of cost savings
Fig. 20.4 Support-rate strategy for different values of sharing parameter
decline in the rate of support provided by the retailer to the manufacturer. Actually,
when the manufacturer is highly efficient and the return rate of used products is also
high, the retailer’s support is simply less important. Note that the impact of varying
cn − cu on the support rate is proportionally much less visible than the impact of φ .
However, the level of GAP is very sensitive to the value of cn − cu .
Fig. 20.5 Regions where manufacturer (left) and retailer (right) S profits are higher than Nash
Players’ Payoffs: We now turn to the crucial issue of the profitability of a

coop program along with a RRSC. Based on the previous results, the following
proposition characterizes the form of the players’ value functions in the S scenario.
Proposition 20.3. If the form of the strategies are as in Figs. 20.1–20.4 and are
invariant to changes in the parameter values, then the two players’ value func-
tions, i.e.,
ϕ1 2
VMS (r) = r + ϕ2 r + ϕ3 ,
2
μ1 2
VRS (r) = r + μ2 r + μ3 ,
2
are convex increasing, and assume their minimum levels at r = 0 and maximal
values at r = 1.

This result is a clear invitation for members of a supply chain to implement
policies to increase the return rate. That being said, it remains to be seen if the
chain members benefit individually and collectively from the implementation of a
CRS mechanism. Figure 20.5 shows, for each player, the region where the difference
between the Stackelberg and Nash payoffs is positive. For the retailer, the result is
clear cut. Indeed, for all feasible values of the return rate and all plausible values of
cost reduction (cn − cu ) and φ , the retailer always prefers the S game to the N game.
This shows that a two-parameter contract with costless monitoring always makes
the player who receives a share better off (see [7]).
As for the manufacturer, the conclusion depends on the return rate and the
parameter values. Roughly speaking, the manufacturer realizes a higher payoff
in the S equilibrium than in the benchmark scenario when (a) the return rate is
“sufficiently high;” (b) the cost reduction resulting from recycling is “sufficiently
high;” and, (c) the incentive parameter φ is “not too high.” If these conditions
are met, then a CRS contract is Pareto payoff-improving. As the cost savings and
the return rate are expected to vary significantly across firms and industries, it is
difficult to make a general statement about the feasibility of Pareto-optimality in
practice. Based on the following data, it seems reasonable to believe that this result
is achievable. Indeed, regarding the return rate, Guide [23] reports that, if firms
collect the products themselves, or provide incentives to other CLSC participants
and adopt a market-driven approach, then the return rate can be as high as 82 %.
If this example is representative of what can be realized, then we are in the zone
of a “sufficiently high return rate.” Concerning the cost savings, [18] report that
remanufacturing costs at IBM are much lower than those for buying new parts,
sometimes 80 % lower. Similarly, Xerox saves 40–65 % of its manufacturing costs
by reusing parts and materials from returned products [43]. For these firms, the
cost reduction due to remanufacturing is clearly “sufficiently high.” However, these
examples are (good) business exceptions, and no one would expect these levels
of cost reduction to be very common. According to [13], most firms do not adopt
closed-loop practices because of the small savings and inefficient remanufacturing.
Nevertheless, other strategic motivations could still lead those firms to close the
loop. For instance, to avoid and reduce remanufacturing competition, Lexmark has
introduced the Prebate program, whereby customers who return an empty printer
cartridge obtain a discount on a new cartridge; Lexmark does not remanufacture
these used cartridges because the cost savings are low, but recycling them instead
allows to reduce competition (www.atlex.com2003).
The last determinant of Pareto-optimality is the sharing-revenue parameter φ .
The literature on contracting and coordination has already established the appro-
priateness of a revenue-sharing contract in a CLSC, and highlighted the critical
role played by the sharing parameter [8]. Its actual value depends on the players’
bargaining power, and therefore, no general statement can be made.
20.5 Conclusion
To the best of our knowledge, this study is the first attempt to assess, in a dynamic
setting, the impact on a closed-loop supply chain of a CRS contract. Our starting
point is that firms can influence the return rate of used products by carrying out green
activities, and that this return rate is an inherently dynamic process. The optimality
of a CRS can be assessed from the consumer’s, the environmental, and the firms’
points of view. We wrap up our main results in the following series of claims on
these different points of view:
Claim 1 Compared to the benchmark scenario, a CRS contract leads to a lower
retail price and higher demand.
Claim 2 Compared to the benchmark scenario, a CRS contract leads to higher

investments in green activities and a higher used-product-return rate.
Claim 3 A CRS contract is Pareto-improving with respect to the benchmark
scenario only under some conditions. However, the retailer is always better off with
such program.
The conclusion is that the consumer and the retailer will vote in favor of such
a CRS contract and that the environment is always in better off with one. For the
manufacturer, the results are not clear cut.
As future research direction, the following extensions are worth considering:
A1. An analysis of the same game, but assuming a finite horizon. Indeed, our
assumption of infinite horizon is a very strong one and were made mainly
for tractability, i.e., to solve a system of algebraic Ricatti equations instead
of having to deal with a highly coupled differential-equations system. A first
step could be to analyze a two-stage game, where in the second period, the
manufacturer produces with used parts recycled from first-period sales.
A2. An analysis of a multi-retailer situation, in which a manufacturer cooperates
with different retailers while the retailers compete in the same market. This
type of multi-agent configuration has been shown to be extremely important
when evaluating a contract in supply-chain management. For instance, [7]
evaluate a RRSC in a one-manufacturer–one-retailer chain configuration, and
demonstrated its effectiveness for mitigating the double-marginalization effect
and for making players better off. Later, they model a multi-retailer situation,
and show that the positive effects of a two-parameter contract vanish whenever
retailer competition occurs.
A3. A competitive setting where a manufacturer and an original equipment man-
ufacturer (OEM) compete in the collection process. In this context, the
manufacturer has more reasons to collect the end-of-use products, where the
reverse flows need to be managed not only to appropriate some of the returns’
residual value, but also to deter new entrants into the industry [13]. This context
has also been described by [3] with real applications (e.g., Bosch). They report
that remanufacturing can be really effective in a competitive context because
remanufactured products may cannibalize competitors’ products. However,
the literature has overlooked competition in dynamic-CLSC settings, where
players compete while adopting an active return policy.
A4. An evaluation of the impact of a green brand image on remanufacturing. In
a CLSC, marketing and operations interface to ensure high remanufacturing
efficiency while goodwill not only plays the traditional role of increasing
sales (marketing role) but it also increases product returns (operational role).
The main assumption here is that customer returns depend on the stock of
(green) goodwill, which acts as a sustainable lever. Several companies, such
as Coca-Cola, HP, and Panasonic, are modifying their brands, changing the
colors and style to increase customers’ green consciousness. Firms know that
customers are concerned about the environment and are willing to buy and
return green products, and that an appropriate brand strategy may provide
superior performance. CLSCs seek to use goodwill not only to increase sales
but also to induce customers to adopt sustainable behavior. By returning end-
of-use products, customers contribute to conserving landfill space, reducing air
pollution and preserving the environment. Therefore, green goodwill acts as a
sustainable lever with the dual purpose of increasing both sales and returns.
A5. The integration of some quality features in our assumptions. The product
remanufacturability, in fact, decreases over time, thereby also reducing the
attractiveness. The quality of a return governs disassembly, recovery, and dis-
posal operations to be carried out after closing the loop [38]. When a return is
in good conditions, it possesses high residual value, and remanufacturing turns
out to be an extremely appropriate operational strategy. Consequently, firms
in the CLSC are committed to reducing the residence time (which is the time
a product stays with customers) while increasing product remanufacturability
(the high number of times a return may be used in a remanufacturing process).
One our paper’s main assumptions is that a return can be remanufactured
an infinite number of times. Despite the obvious limitations, applications
do exist in several industries (e.g., glass industry). Research in CLSCs has
investigated remanufacturability in terms of product durability [3,21], highlight
the trade-off this implies. High product durability maximizes cost savings but
it considerably extends product life, thereby lowering demand [10]. Moreover,
since durability is a quality feature, it impacts directly on production costs,
reducing the players’ unit-profits margin. Product durability has been also
investigated as a dynamic phenomenon [41], where the stock of durability
decreases over time, influencing both operational decisions and sales in future
periods. While incorporating durability substantially increases the complexity
of the model, addressing this trade-off determines the CLSC’s success and
improves decision-making process.
Acknowledgements We wish to thank the anonymous reviewer for his/her very helpful com-
ments. Research supported by NSERC, Canada.
Appendix
Proof of Proposition 20.1
We need to establish the existence of a pair of value functions VM (r), VR (r) such
that there exists a unique solution r to (20.2) and that the Hamilton–Jacobi–Bellman
(HJB) equations
uM 2
ρ VM (r) = (α − β p)(ω − cn − r (cu − cn )) − A + VM (A − δ r), (20.16)
2
ρ VR = (α − β p)(p − ω ), (20.17)
are satisfied for all r ≥ 0.

Since the retailer’s objective does not depend on the state, only the manufac-
turer’s value function is continuous in r. Maximizing the right-hand side (RHS) of
the HJB for the GAP expenditures and price gives the following strategies:
VM
A= , (20.18)
uM
α +βω
p= . (20.19)
2β
Inserting (20.18) and (20.19) into (20.16) and (20.17) provides

α −βω VM
ρ VM = (ω − cn − r (cu − cn )) + VM −δr , (20.20)
2β 2uM
(α − β ω )2
ρ VR = . (20.21)
4β
We show that a linear value function satisfies (20.20) and (20.21). We define
VM = ς1 r + ς2 , where ς1 and ς2 are the constants. Substituting VM and its derivative
into (20.20) we obtain:

α −βω ς1
ρ (ς1 r + ς2 ) = (ω − cn − r (cu − cn )) + ς1 −δr . (20.22)
2 2uM
By identification, we obtain
(cn − cu ) (α − β ω )
ς1 = ,
2 (ρ + δ )

1 uM (α − β ω )(ω − cn ) + β ς12
ς2 = ,
2ρ uM
(cn − cu ) (α − β ω )
VM = r
2 (ρ + δ )

1 uM (α − β ω )(ω − cn ) (cn − cu )2 (α − β ω )2
+ + .
2ρ uM 4uM (ρ + δ )2
By simple substitutions, we obtain the results in Proposition 20.1.

We need to establish the existence of bounded and continuously differentiable value

functions VM (r), VR (r) such that there exists a unique solution r(t) to (20.2) and
the HJB equations. To obtain a Stackelberg equilibrium, first we determine the
manufacturer’s green expenditures as a function of the retailer’s controls p and B.
The manufacturer’s HJB is
ρ VM (r) = max {(1 − φ ) (α − β p)(ω − cn + r (cn − cu ))

A≥0
uM
− (1 − B)A2 + VM (r) (A − δ r) . (20.23)
2
Maximization of the RHS yields:
VM
A= . (20.24)
uM (1 − B)
Then, we substitute (20.24) into the retailer’s HJB equation to obtain

ρ VR (r) = max (α − β p)[p − ω + φ (ω − cn + r (cn − cu ))]
p≥0,0≤B≤1

B2 (VM )2 VM
− + VR −δr . (20.25)
2uM (1 − B)2 uM (1 − B)
Performing the maximization of the right-hand side we obtain
α + β [ω − φ (ω − cn + r (cn − cu))]
p= , (20.26)
2β
2VR − VM
B= . (20.27)
2VR + VM
Inserting (20.26) and (20.27) inside the HJB we get

α − β (ω − φ (ω − cn + r (cn − cu )))
ρ VM (r) = (1 − φ )
2

2VR + VM
(ω − cn + r (cn − cu )) + VM −δr , (20.28)
4uM

(α − β [ω − φ (ω − cn + r (cn − cu ))])2
ρ VR (r) =
4β

VM V + V
+ + VR R M
−δr . (20.29)
8uM 2uM
Because the differential game is of the linear-quadratic variety, we make the

informed guess that the value functions are quadratic, i.e.,
ϕ1 2
VM (r) = r + ϕ2 r + ϕ3 ,
2
μ1 2
VR (r) = r + μ2 r + μ3 ,
2
where ϕ1 , ϕ2 , ϕ3 , μ1 , μ2 and μ3 are parameters to be determined. Let
a1 = 2uM (1 − φ )β φ (cu − cn )2 ,
a2 = 2uM (ρ + 2δ ),
a3 = uM (1 − φ ) (cn − cu ) (α − β (ω − 2φ (ω − cn ))) ,
a4 = 4uM (δ + ρ ),
a5 = 2uM (1 − φ ) (ω − cn ) [α − β (ω − φ (ω − cn ))] ,
a6 = 4uM ρ ,
a7 = 2uM φ 2 β (cu − cn )2 ,
a8 = 2uM φ (cn − cu ) (α − β (ω − φ (ω − cn))) ,
a9 = 2uM (α − β [ω − φ (ω − cn )])2 .
Inserting VM and VR and their derivatives in (20.28) and (20.29), we obtain the
following six algebraic Ricatti equations:
a1 + ϕ1 (2 μ1 + ϕ1 ) − a2ϕ1 = 0, (20.30)
a3 + ϕ1 μ2 + ϕ2 (ϕ1 + μ1 − a4 ) = 0, (20.31)
a5 + ϕ2 (2 μ2 + ϕ2 ) − a6ϕ3 = 0, (20.32)
a7 + 2 (2μ1 − a2) μ1 + (ϕ1 + 4μ1 ) ϕ1 = 0, (20.33)
a8 + (ϕ1 + 2μ1) ϕ2 + 2 (2μ1 + ϕ1 − a4) μ2 = 0, (20.34)

a9 + β ϕ22 + 4μ2 (μ2 + ϕ2 ) − 2a6 μ3 = 0, (20.35)
with the first three equations corresponding to the manufacturer and the next three
to the retailer.
We briefly describe the procedure used to reduce the solution of that system into
the solution of one non-linear equation to be solved numerically using Maple 10.
From (20.30), we can obtain μ1 as a function of ϕ1 : μ1 = f (ϕ1 ), where
(a2 − ϕ1 ) ϕ1 − a1
f1 (ϕ1 ) = Ω1 = . (20.36)
2ϕ1
Replacing (20.36) for (20.31) and (20.34), we can obtain both ϕ2 and μ2 as function
of ϕ1
a3 Ω3 + ϕ1 Ω2
ϕ2 = f2 (ϕ1 ) = − = Ω4 (20.37)
(ϕ1 + Ω1 − a4) Ω3
Ω2
μ2 = f3 (ϕ1 ) = = Ω5 (20.38)
Ω3
with
Ω2 = a3 (ϕ1 + 2Ω1) − a8 (ϕ1 + Ω1 − a4) ,

Ω3 = 2 (2Ω1 + ϕ1 − a4 ) (ϕ1 + Ω1 − a4) − ϕ1 (ϕ1 + 2Ω1) .
Similarly, we use (20.36)–(20.38) to derive ϕ3 and μ3 as a function of ϕ1 :
a5 + (2Ω5 + Ω4 ) Ω4
ϕ3 = f4 (ϕ1 ) = , (20.39)
a6
a9 + β Ω42 + 4β (Ω5 + Ω4) Ω5
μ3 = f5 (ϕ1 ) = . (20.40)
2a6 β
Finally, replacing (20.36) into (20.33) gives a non-linear equation that unfortunately
cannot be solved analytically. We use the Maple “fsolve” function that uses
numerical-approximation techniques to find a decimal approximation to a solution
of an equation or a system of equations.
Figure 20.1 shows that the advertising strategy

(2 μ1 + ϕ1 ) r + 2μ2 + ϕ2
AS = ,
2uM
is increasing in r, with AS (0) > 0. Therefore, we conjecture the following relation-

ships between the coefficients:
2μ1 + ϕ1 > 0 and 2 μ2 + ϕ2 > 0. (20.41)
From the positivity of AS , Fig. 20.2, and the expression of the support rate, we
conclude that
2μ1 − ϕ1 > 0 and 2 μ2 − ϕ2 > 0 . . . (20.42)
Combining (20.41) and (20.42), we conclude that μ1 and μ2 are positive. The fact
that the support rate is decreasing in the return rate leads to
∂ BS 4 (μ1 ϕ2 − μ2 ϕ1 )
= < 0 ⇒ μ1 ϕ2 − μ2 ϕ1 < 0. (20.43)
∂r ((2 μ1 + ϕ1 ) r + 2μ2 + ϕ2 )2
Further, the positivity of AS and BS (0) ≤ 1, imply ϕ2 > 0. From the positivity of
ϕ2 , μ1 and μ2 , and the condition in (20.43), we conclude that ϕ1 > 0.
Finally, it suffices to note that, since all other parameters involved in equa-
tions (20.32) and (20.35) are positive, a necessary condition for these equations
to hold is to have ϕ3 and μ3 positive.
References
1. Aras, N., Boyaci, T., Verter, V.: The effect of categorizing returned products in remanufactur-
ing. IIE Trans. 36(4), 319–331 (2004)
2. Atasu, A., Guide, V.D.R., Van Wassenhove, L.N.: Product reuse economics in closed-loop
supply chain research. Prod. Oper. Manage. 17(5), 483–497 (2008)
3. Atasu, A., Sarvary, M., Van Wassenhove, L.N.: Remanufacturing as a marketing strategy.
Manage. Sci. 54(10), 1731–1746 (2008)
4. Berger, M.: Vertical cooperative advertising ventures. J. Marketing Res. 9, 309–312 (1972)
5. Bergen, M., John, G.: Understanding cooperative advertising participation rates in conventional
channels. J. Marketing Res. 46, 357–369 (1997)
6. Bhattacharya, S., Guide, V.D.R., Van Wassenhove, L.N.: Optimal order quantities with
remanufacturing across new product generations. Prod. Oper. Manage. J. 15(3), 421–431
(2006)
7. Cachon, G.P., Lariviere, M.A.: 2005. Supply chain coordination with revenue sharing con-
tracts: strength and limitations. Manage. Sci. 51, 30–44 (2005)
8. Corbett, C.J., Savaskan, R.C.: Contracting and coordination in closed-loop supply chains. In:
Daniel, V., Guide, R., Van Wassenhove, L.N. (eds.) Business Aspects of Closed-Loop Supply
Chains: Exploring the Issues. Carnegie Mellon University Press, Pittsburgh (2003)
9. Dana, Jr., J.D., Spier, K.E.: Revenue sharing and vertical control in the video rental industry. J.
Ind. Econ. 49(3), 223–245 (2001)
10. Debo, L.G., Toktay, L.B., Van Wassenhove, L.N.: Market segmentation and product technology
selection for remanufacturable products. Manage. Sci. 51, 1193–1205 (2005)
11. Dekker, R., Fleischmann, M., Van Wassenhove, L.N. (eds.) Reverse Logistics: Quantitative
Models for Closed-Loop Supply Chains. Springer, Berlin (2004)
12. El Ouardighi, F., Jørgensen, S., Pasin, F.: A dynamic game of operations and marketing
management in a supply chain. Int. J. Game Theory Rev. 34, 59–77 (2008)
13. Ferguson, M.E., Toktay, L.B.: The effect of competition on recovery strategies. Prod. Oper.
Manage. 15(3), 351–368 (2006)
14. Ferrer, G., Swaminathan, J.M.: Managing new and remanufactured products. Manage. Sci.
52(1), 15–26 (2006)
15. Ferrer, G., Whybark, C.: Material planning for a remanufacturing facility. Prod. Oper. Manage.
10, 112–124 (2001)
16. Fleischmann, M., Beullens, P., Bloemhof-Ruwaard, J., Van Wassenhove, L.N.: The impact of
product recovery on logistics network design. Prod. Oper. Manage. 10(2), 156–173 (2001)
17. Fleischmann, M., Bloemhof-Ruwaard, J., Dekker, R., van der Laan, E., Van Wassenhove, L.N.:
Quantitative models for reverse logistics: a review. Eur. J. Oper. Res. 103, 1–17 (1997)
18. Fleischmann, M., van Nunen, J., Grave, B.: Integrating closed-loop supply chain and spare
parts management at IBM. ERIM Report Series Research in Management, ERS-2002-107-LIS
(2002)
19. Geng, Q., Mallik, S.: Inventory competition and allocation in a multi-channel distribution
system. Eur. J. Oper. Res. 182(2), 704–729 (2007)
20. Gerchak, Y., Wang, Y.: Revenue-sharing vs. wholesale-price contracts in assembly systems
with random demand. Prod. Oper. Res. 13(1), 23–33 (2004)
21. Geyer, R., Van Wassenhove, L.N., Atasu, A.: The economics of remanufacturing under limited
component durability and finite product life cycles. Manage. Sci. 53(1), 88–100 (2007)
22. Ginsburg, J.: Once is ot enough. Business Week, April 16 (2001)
23. Guide, Jr., V.D.R.: Production planning and control for remanufacturing: industry practice and
research needs. J. Oper. Manage. 18, 467–483 (2000)
24. Guide, Jr., V.D.R., Jayaraman, V., Linton, J.D.: Building contingency planning for closed-loop
supply chains with product recovery. J. Oper. Manage. 21, 259–279 (2003)
25. Guide, Jr., V.D.R., Van Wassenhove, L.N.: The evolution of closed-loop supply chain research.
Oper. Res. 57(1), 10–18 (2009)
26. Guide, Jr., V.D.R., Van Wassenhove, L.N.: Managing product return for remanufacturing. Prod.
Oper. Manage. 10(2), 142–155 (2001)
27. Guide, Jr., V.D.R., Souza, G.C., Van Wassenhove, L.N., Blackburn, J.D.: Time of value of
commercial product returns. Manage. Sci. 52(8), 1200–1214 (2006)
28. Hauser, W.M., Lund, R.T.: The Remanufacturing Industry: Anatomy of Giant. Boston Univer-
sity, Boston (2003)
29. He, X., Prasad, A., Sethi, S.: Cooperative advertising and pricing in a dynamic stochastic
supply chain: feedback Stackelberg strategies. Prod. Oper. Manage. 18, 78–94 (2009)
30. Hussain, S.S.: Green consumerism and ecolabelling: a strategic behavioural model. J. Agric.
Econ. 51(1), 77–89 (2000)
31. Ingene, C.A., Parry, M.E.: Mathematical Models of Distribution Channels. Kluwer Academic,
Dordrecht (2004)
32. Jørgensen, S., Sigué, S.P., Zaccour, G.: Dynamic cooperative advertising in a channel. J. Retail.
76(1), 71–92 (2000)
33. Jørgensen, S., Sigué, S.P., Zaccour, G.: Stackelberg leadership in a marketing channel. Int.
Game Theory Rev. 3(1), 13–26 (2001).
34. Jørgensen, S., Taboubi, S., Zaccour, G.: Retail promotions with negative brand image effects:
is cooperation possible? Eur. J. Oper. Res. 150, 395–405 (2003)
35. Jørgensen, S., Zaccour, G.: Differential Games in Marketing. International Series in Quantita-
tive Marketing. Kluwer Academic, Boston, MA (2004)
36. Karray, S., Zaccour, G.: Could co-op advertising be a manufacturer’s counterstrategy to store
brands? J. Bus. Res. 59, 1008–1015 (2006)
37. Kodak: Corporate Environmental Annual Report. The Kodak Corporation, Rochester (1999)
38. Krikke, H., Le Blanc, I., van de Velde, S.: Product modularity and the design of closed-loop
supply chains. Calif. Manage. Rev. 46(2), 23–39 (2004)
39. Majumder, P., Groenevelt, H.: Competition in remanufacturing. Prod. Oper. Manage. 10,
125–141 (2001)
40. Mantovalli, J.: The producer pays. Environ. Mag. 8(3), 36–42 (1997)
41. Muller, E., Peles, Y.C.: Optimal dynamic durability. J. Econ. Dyn. Control 14(3–4), 709–719
(1990)
42. Ray, S., Boyaci, T., Aras, N.: Optimal prices and trade-in rebates for durable, remanufacturable
products. Manuf. Serv. Oper. Manage. 7(3), 208–228 (2005)
43. Savaskan, R.C., Bhattacharya, S., Van Wassenhove, L.N.: Closed loop supply chain models
with product remanufacturing. Manage. Sci. 50, 239–252 (2004)
44. Savaskan, R.C., Van Wassenhove, L.N.: Reverse channel design: the case of competing
retailers. Manage. Sci. 52(1), 1–14 (2006)
45. Talbolt, S., Lefebvre, E., Lefebvre, L.A.: Closed-loop supply chain activities and derived
benefits in manufacturing SMEs. J. Manuf. Technol. Manage. 18(6), 627–658 (2007)

Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st

Uploaded by

Copyright:

Available Formats

Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annals of the International Society of Dynamic Games, v.12 - Pierre Cardaliaguet_ Ross Cressman (eds.) - Advances in dynamic games _ theory, applications, and numerical methods for differential and st

Uploaded by

Copyright:

Available Formats

Annals of the International Society of Dynamic Games

Tamer Başar, University of Illinois at Urbana-Champaign

For further volumes:

Advances in Dynamic Games

ISBN 978-0-8176-8354-2 ISBN 978-0-8176-8355-9 (eBook)

© Springer Science+Business Media New York 2012

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.birkhauser-science.com)

Tom was a dedicated and valued member of

Paris, France Pierre Cardaliaguet

Part I Evolutionary Games

1 Some Generalizations of a Mutual Mate Choice Problem

Part II Dynamic and Differential Games: Theoretical

5 Strong Strategic Support of Cooperative Solutions

9 Nash Equilibrium Seeking for Dynamic Systems

Part III Pursuit-Evasion Games and Search Games

13 Differential Game-Theoretic Approach to a Spatial

Part IV Applications of Dynamic Games

19 Advertising and Price to Sustain The Brand Value

Andrei R. Akhmetzhanov Department of Biology (Theoretical Biology Labora-

Pietro De Giovanni Department of Information, Logistics and Innovation, VU

Tom N. Sherratt Department of Biology, Carleton University, 1125 Colonel By

Keywords Dynamic game • Mate choice problem • Policy iteration

D.M. Ramsey ()

P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 3

1.2 The Original Model with Continuous Time

1.2.1 Outline of the Model

with a female of age y is given by u(x, y), where

Calculations analogous to the ones made in Point 1 lead to

The decisions of the players at equilibrium are illustrated in Fig. 1.1.

λ [a(x)]2 a( f (x)) f (x)

Fig. 1.1 Illustration of the decisions made at equilibrium

1.2.2 Numerical Computation of a Symmetric Equilibrium

denoted a1 . Given f1 and a1 , we can indeed determine a female’s optimal response

( fi+1 , ai+1 ) = H ( fi , ai ) = (H1 ( fi , ai ) , H2 ( fi , ai )) .

1.2.3 Definition of the Mapping H

Dividing by eλ x and differentiating with respect to x, after some simplification we

1.3 A Symmetric Game with a Fixed Mortality Rate

Analogously, when x > y it can be shown that

Suppose that the threshold profile and age profile

1 − e−2μ [1− fi+1(x)] ln[1 − 2 μ ri+1(x)]

Dividing by e(λ +μ )x and differentiating with respect to x, after some simplification

1.4 A Model in Which the Interaction Rate Depends

Dividing by eλ x and differentiating with respect to x, after some simplification we

1.5 An Asymmetric Game

equilibrium λ f = λm = λ = λ20 . It follows from this argument that a fixed point

and both individuals obtain a reward of t − w. If 1 + w − t ≤ y ≤ fi+1 (w), then a pair

Dividing by eλi x and differentiating with respect to x, after some simplification we

obtain that for x < t − 1

and for x > t − 1

1.6 Numerical Results

1.6.1 Model with Mortality

1.6.2 Model with Interaction Rates Dependent

1.6.3 Asymmetric Model

Mike Mesterton-Gibbons and Tom N. Sherratt

Abstract A possible rationale for victory displays—which are performed by the

Keywords Contest behavior • Evolutionarily stable strategies • Post-conflict

P. Cardaliaguet and R. Cressman (eds.), Advances in Dynamic Games, Annals of the 25

2.2 Mathematical Model

The dominance status of a victor relative to the vanquished is determined by a