Game Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 186

Game Theory

edited by
Qiming Huang
SCIYO
Game Theory
Edited by Qiming Huang
Published by Sciyo
Janeza Trdine 9, 51000 Rijeka, Croatia
Copyright 2010 Sciyo
All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share
Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any
medium, so long as the original work is properly cited. After this work has been published by Sciyo,
authors have the right to republish it, in whole or part, in any publication of which they are the author,
and to make other personal use of the work. Any republication, referencing or personal use of the work
must explicitly identify the original source.
Statements and opinions expressed in the chapters are these of the individual contributors and
not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of
information contained in the published articles. The publisher assumes no responsibility for any
damage or injury to persons or property arising out of the use of any materials, instructions, methods
or ideas contained in the book.

Publishing Process Manager Jelena Marusic
Technical Editor Teodora Smiljanic
Cover Designer Martina Sirotic
Image Copyright Slpix, 2010. Used under license from Shutterstock.com
First published September 2010
Printed in India
A free online edition of this book is available at www.sciyo.com
Additional hard copies can be obtained from [email protected]
Game Theory, Edited by Qiming Huang
p. cm.
ISBN 978-953-307-132-9
SCIYO.COM
WHERE KNOWLEDGE IS FREE
free online editions of Sciyo
Books, Journals and Videos can
be found at www.sciyo.com
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Preface VII
Theory of Games: An Introduction 1
Dr. Omar Raoof and Prof. Hamed Al-Raweshidy
Auction and Game-Based Spectrum Sharing
in Cognitive Radio Networks 13
Dr. Omar Raoof and Prof. Hamed Al-Raweshidy
Game Theory in Wireless Ad- hoc Opportunistic Radios 41
Shahid Mumtaz and Atilio Gameiro
Reliable Aggregation Routing
for Wireless Sensor Networks based on Game Theory 59
Qiming Huang, Xiao Liu and Chao Guo
Inductive Game Theory: A Basic Scenario 83
Mamoru Kaneko and J. Jude Kline
Cooperative Logistics Games 129
Juan Aparicio, Natividad Llorca, Joaquin Sanchez-Soriano,
Julia Sancho and Sergio Valero
Stochastic Game Theory Approach
to Robust Synthetic Gene Network Design 155
Bor-Sen Chen, Cheng-Wei Li and Chien-Ta Tu
Contents
Game theory is a formal framework with mathematical tools to research on the complex
interactions among interdependent rational players. The most well-known concept in game
theory is the celebrated Nash equilibrium. Really, game theoretic approaches are multifarious,
including among others cooperative and non-cooperative models, static and dynamic games,
single-slot and repeated games, and nite- and innite-horizon games. Game theory has led
to revolutionary changes in economics and has found important applications in sociology,
modern communication, biology engineering, and transportation. This book presents the
introduction of game theory and supplies applications of game theory.
In the chapter Introduction to game theory, an introduction to the concepts and history of
game theory is presented, and the most common types of games are discussed in details.
The chapter game application in cognitive radio networks introduce an adaptive competitive
second-price pay-to-bid sealed auction game as solution to the fairness problem of spectrum
sharing among one primary user and a large number of secondary users in cognitive radio
environment, and it is shown by numerical results the proposed mechanism could reach the
maximum total prot for secondary with better fairness.
In the chapter game theory in wireless ad-hoc opportunistic radios, a scenario based UMTS
TDD opportunistic cellular system with an ad hoc behavior that operates over UMTS FDD
licensed cellular network is considered, the ad hoc radio is modeled as a game and the unique
Nash equilibrium for the game is applied in ad-hoc opportunistic radio.
The chapter reliable aggregation routing for wireless sensor networks based on game theory
proposes a game-theoretic model of reliable data architecture in wireless sensor network,
each selected group leaders uses game-theoretic model which tradeoffs between energy
dissipation and data transmission delay to determine the degree of aggregation.
In the chapter cooperative logistic games, the concepts, theory and application of the
cooperative logistic games, which are focused mainly on transportation, inventory and supply
chain games, are surveyed.
In the chapter stochastic game theory approach to the robust synthetic gene network design,
synthetic biological can increase efciency of gene circuit design through registries of biological
parts and standard datasheets. In synthetic gene networks, there is much uncertainly about
what affects the behavior of biological circuitry and systems. The proposed robust minimax
synthetic biology design method can predict the most robust value of genetic parameters
from the perspective of stochastic game theory. The proposed synthetic genetic network not
only can achieve the desired steady state but also can tolerate the worst-case effect due to
these uncertain parameter variations and external noises on the host cell.
Preface
VIII
Game theory provides a powerful mathematical framework that can accommodate the
preferences and requirements of various stakeholders in a given process as regards the outcome
of the process. The chapters content in this book will give an impetus to the application of
game theory to the modeling and analysis of modern communication, biology engineering,
and transportation, etc..
Editor
Qiming Huang,
Beijing University of Science and Technology,
China
[email protected]
1
Theory of Games: An Introduction
Dr. Omar Raoof and Prof. Hamed Al-Raweshidy
Brunel University-West London,
UK
1. Introduction
'Game Theory' is a mathematical concept, which deals with the formulation of the correct
strategy that will enable an individual or entity (i.e., player), when confronted by a complex
challenge, to succeed in addressing that challenge. It was developed based on the premise
that for whatever circumstance, or for whatever 'game', there exists a strategy that will allow
one player to 'win'. Any business can be considered as a game played against competitors,
or even against customers. Economists have long used it as a tool for examining the actions
of economic agents such as firms in a market.
The ideas behind game theory have appeared through-out history [1], apparent in the bible,
the Talmud, the works of Descartes and Sun Tzu, and the writings of Charles Darwin [2].
However, some argue that the first actual study of game theory started with the work of
Daniel Bernoulli, A mathematician born in 1700 [3]. Although his work, the Bernoullis
Principles formed the basis of jet engine production and operations, he is credited with
introducing the concepts of expected utility and diminishing returns. Others argue that the
first mathematical tool was presented in England in the 18
th
century, by Thomas Bayes,
known as Bayes Theorem; his work involved using probabilities as a basis for logical
conclusion [3]. Nevertheless, the basis of modern game theory can be considered as an
outgrowth of a three seminal works; a Researches into the Mathematical Principles of the
Theory of Wealth in 1838 by Augustin Cournot, gives an intuitive explanation of what
would eventually be formalized as Nash equilibrium and gives a dynamic idea of players
best-response to the actions of others in the game. In 1881, Francis Y. Edgeworth expressed
the idea of competitive equilibrium in a two-person economy. Finally, Emile Borel,
suggested the existence of mixed strategies, or probability distributions over one's actions that
may lead to stable play. It is also widely accepted that modern analysis of game theory and
its modern methodological framework began with John Von Neumann and Oskar
Morgenstern book [4].
We can say now that Game Theory is relatively not a new concept, having been invented
by John von Neumann and Oskar Morgenstern in 1944 [4]. At that time, the mathematical
framework behind the concept has not yet been fully established, limiting the concept's
application to special circumstances only [5]. Over the past 60 years, however, the
framework has gradually been strengthened and solidified, with refinements ongoing until
today [6]. Game Theory is now an important tool in any strategist's toolbox, especially
when dealing with a situation that involves several entities whose decisions are influenced
by what decisions they expect from other entities.
Game Theory 2
In [4], John von Neumann and Oskar Morgenstern conceived a groundbreaking
mathematical theory of economic and social organization, based on a theory of games of
strategy. Not only would this reform economics, but the entirely new field of scientific
inquiry it yielded has since been widely used to analyze a host of real-world phenomena
from arms races to optimal policy choices of presidential candidates, from vaccination
policy to major league baseball salary negotiations [6]. In addition, it is today established
throughout both the social sciences and a wide range of other sciences.
Game Theory can be also defined as the study of how the final outcome of a competitive
situation is dictated by interactions among the people involved in the game (also referred to
as 'players' or 'agents'), based on the goals and preferences of these players, and on the
strategy that each player employs. A strategy is simply a predetermined 'way of play' that
guides an agent as to what actions to take in response to past and expected actions from
other agents (i.e., players in the game).
In any game, several important elements exists, some of which are; the agent, which
represents a person or an entity having their own goals and preferences. The second
element, the utility (also called agent payoff) is a concept that refers to the amount of
satisfaction that an agent derives from an object or an event. The Game, which is a formal
description of a strategic situation, Nash equilibrium, also called strategic equilibrium, which is
a list of strategies, one for each agent, which has the property that no agent can change his
strategy and get a better payoff.
Normally, any game G has three components: a set of players, a set of possible actions for
each player, and a set of utility functions mapping action profiles into the real numbers. In
this chapter, the set of players are denoted as I, where I is finite with, i = {1,2,3,, I}. For
each player i e I the set of possible actions that player i can take is denoted by A
i
, and A,
which is denoted as the space of all action profiles is equal to:
A = A
1
A
2
A
3
A
I
(1)
Finally, for each i e I, we have U
t
: A R, which denotes is player utility function. Another
notation to be defined before carrying on; suppose that a e A is a strategy profile and i e I is
a player; and then a
i
e A
i
denote player is action in a
i
and a
-i
denote the actions of the other
I - 1 players.
In this chapter, some famous examples of games, some important definitions used in games
and classifications of games are presented. Throughout this chapter, a mathematical proof is
presented to show when mixed strategy games can be valid and invalid in different
scenarios.
2. Examples of games
2.1 Prisoners dilemma
In 1950, Professor Albert W. Tucker of Princeton University invented the Prisoners
Dilemma [7] and [8], an imaginary scenario that is without doubt one of the most famous
representations of Game Theory. In this game, two prisoners were arrested and accused of a
crime; the police do not have enough evidence to convict any of them, unless at least one
suspect confesses. The police keep the criminals in separate cells, thus they are not able to
communicate during the process. Eventually, each suspect is given three possible outcomes:
1. If one confesses and the other does not, the confessor will be released and the other will
stay behind bars for ten years (i.e. -10);
Theory of Games: An Introduction 3
2. If neither admits, both will be jailed for a short period of time (i.e. -2,-2); and
3. If both confess, both will be jailed for an intermediate period of time (i.e. six years in
prison, -6).
The possible actions and corresponding sentences of the criminals are given in Table 1.
2
nd
Criminal
Cooperate Defect
Cooperate -2, -2 -10, 0
Defect 0, -10 -6, -6
Table 1. Prisoners' Dilemma game.
To solve this game, we must find the dominating strategy of each player, which is the best
response of each player regardless of what the other player will play. From player ones
point of view, if player two cooperates (i.e. not admitting), then he is better off with the
defect (i.e. blaming his partner). If player two defects, then he will choose defect as well. The
same will work with player two. In the end, both prisoners conclude that the best decision is
to defect, and are both sent to intermediate imprisonment.
2.2 Battle of the sexes
Another well know game is the battle of the sexes, in which two couple argues where to
spend the night out. In this example, she would rather attend an audition of Swan Lake in
the opera and he would rather a football match. However, none of them would prefer to
spend the night alone. The possible actions and corresponding sentences of the couple are
given in Table 2.
Female
Ballet Football
Ballet 2, 4 0, 0
Football 0, 0 4, 2
Table 2. Battle of the Sexes game.
It is easy to see that both of them will either decide to go to the ballet or to the football
match, as they are much better off spending the evening alone.
3. Nash Equilibrium
Definition: Nash Equilibrium exists in any game if there is a set of strategies with the
property that no player can increase her payoff by changing her strategy while the other
players keep their strategies unchanged. These sets of strategies and the corresponding
payoffs represent the Nash Equilibrium. More formally, a Nash equilibrium is a strategy
profile a such that for all a
i
e A
i
,
( , ) ( , )
i i i i
U a a U a a

>

(2)
Male
1
st
Criminal
Game Theory 4
Where , denotes another action for the player is [1-3]. We can simply see that the action
profile (defect, defect) is the Nash Equilibrium in the prisoners dilemma game and the
actions profile (ballet, ballet) and (football, football) are the ones for the battle of the sexes
game.
4. Pareto efficiency
Definition: Pareto efficiency is another important concept of game theory. This term is
named after Vilfredo Pareto, an Italian economist, who used this concept in his studies and
defined it as; A situation is said to be Pareto efficient if there is no way to rearrange things
to make at least one person better off without making anyone worse off [9].
More Formally, an action profile a e A is said to be Pareto if there is no action profile a

e A
such that for all i,
( ) (a )
i i
U a U >

(3)
In another word, an action profile is said to be Pareto efficient if and only if it is impossible
to improve the utility of any player without harming another player.
In order to see the importance of Pareto efficiency, assume that someone was walking along
the shore on an isolated beach finds a 20 bill on the sand. If bill is picked up and kept, then
that person is better off and no one else is harmed. Leaving the bill on the sand to be washed
out would be an unwise decision. However, someone might argue the fact that the original
owner of the bill is worse off. This is not true, because once the owner loses the bill he is
defiantly worse off. On the other hand, once the bill is gone he will be the same whether
someone found it or it was washed out to the sea. This will lead us to another argument;
assume there are two people walking on the beach and they saw the bill on the sand.
Whether one of them will pick up the bill and the other will not get anything or they decide
to split the bill between themselves. Who gains from finding the bill is quite different in
those scenarios but they all avoid the inefficiency of leaving it sitting on the beach.
5. Pure and mixed strategy Nash Equilibrium
In any game someone will find pure and mixed strategies, a pure strategy has a probability
of one, and will be always played. On the other hand, a mixed strategy has multiple purse
strategies with probabilities connected to them. A player would only use a mixed strategy
when she is indifferent between several pure strategies, and when keeping the challenger
guessing is desirable, that is when the opponent can benefit from knowing the next move.
Another reason why a player might decide to play a mixed strategy is when a pure strategy
is not dominated by other pure strategies, but dominated by a mixed strategy. Finally, in a
game without a pure strategy Nash Equilibrium, a mixed strategy may result in a Nash
Equilibrium.
From the battle of the sexes game, we can see the mixed strategy Nash equilibria are the
action profile (ballet, ballet) and (football, football). In order to drive that, we will assume
first that the women will go to the ballet and the man will play some mixed strategy . Then
the utility of playing this action will be U
F
= f().
Then, U
B
=
B
(4) + (1 -
B
)(0), therefore in another word, the women gets 4 some percentage
of the time and 0 for the rest of the time. Assuming the women will be going with her
Theory of Games: An Introduction 5
partner to the football match, then U
F
=
B
(0) + (1 -
B
)(2), she will get 0 some percentage of
the time and 2 for the rest of the time. Setting the two equations equal to each other and
solving for , this will
B
= 1/3. This means that in this mixed strategy Nash equilibrium, the
man is going to the ballet third of the time and to going to the football match two-third of
the time. Taking another look to the Table 2-2 , we can see that the game is symmetrical
against the strategies, which means that the women will decide to go the ballet two-third of
the time and third of the time to go to the football match.
In order to calculate the utility of each player in this game, we need to multiply the
probability distribution of each action with by the user strategy, as shown in Table 3. We can
simply see that the utility of both players is 4/3, which means that if they wont
communicate with each other to decide where to go, they are both better-off to use mix
strategies.
Female
Ballet (2/3) Football (1/3)
Ballet (1/3)
2/9
2, 4
1/9
0, 0
Football (2/3)
4/9
0, 0
2/9
4, 2
Table 3. Pure and Mixed Strategies, Battle of the Sexes example.
6. Valid and invalid mixed strategy Nash Equilibrium
This section shows how mixed strategies can be invalid with games in general forms.
Recalling the prisoners dilemma game from the previous section, where we going to solve
the general class of the game by removing the numbers from the table and use the following
variables;
2
nd
Criminal
Cooperate Defect
Cooperate B, b D, a
Defect A, d C, c
Table 4. Valid and Invalid Mixed Strategy Nash Equilibrium, Prisoners' Dilemma example.
Where we have, A > B > C > D and a > b > c > d. We will simply start to solve this game the
same way we did before, we will start looking for the dominate strategies. From the player
one point of view, if player two cooperate then player one will not as A > D. If player two
defect, then player one will defect as well as C > D. Doing the same thing for player two; if
player one confess, then player two will defect as a > d. If player one defect, then player two
will defect as well as c > d. Then, the only sensible equilibrium will be (Dont confess, Dont
confess).
To make sure that there are no mixed strategy Nash equilibrium in this scenario, we need to
find the utility of player two confessing as a function of some mixed strategy of player one.
That is, some percentage of the time player two will get b and for the rest of the time will get
d. Mathematically this will be; U
C
=
C
(b) + (1
C
)(d). Then, we do the same to find what the
Male
1
st
Criminal
Game Theory 6
utility of player two will be as function of player one mixed strategy. This can be shown as;
U
D
=
C
(a) + (1
C
)(c). To find the mixed strategy, U
C
must be equal to U
D
, and that will
lead us to the following equation;
C
c d
b d a c
o

=
+
(4)
In order to proof that this is a valid mixed strategy Nash equilibrium, the following
condition must be satisfied; Pr(i)e[0,1] (i.e. no event can occur with negative probability and
no event can occur with probability greater than one). That is the probability that this
strategy will happen is grater than zero and not less than one. For the first case, when
C
0,
the nominator and the denominator must be both positive or negative, otherwise, this mixed
strategy will be invalid. Recalling our assumption, a > b > c > dm then the nominator must
be grater than zero, the denominator must be grater than zero as well. That is b + c a d >
0, which can be re-arranged as b + c > a + d, at this point we can be sure whether this will
give us the right answer of whether this is a valid mixed strategy or not as there will be
some times where b + c is grater than a + d and some times where it is not. So, for the mixed
strategy Nash equilibrium for this game does exist,
C
must be less than or equal to one. This
will lead us to the following equation:
1
c d
b d a c

s
+
(5)
That is c d b - d a + c, which can be solved to a b, which is not right as this violate or
rule that a > b, so this is an invalid mixed strategy. Thus, we proved that there is no mixed
strategy Nash equilibrium in this game and the two players will defect.
Female
Ballet Football
Ballet A, b C, c
Football C, c B, a
Table 5. Valid and Invalid Mixed Strategy Nash Equilibrium, Battle of the Sexes example.
On the other hand, if we work for the example of the Battle of the Sexes game. Table 5 shows
the game in general format, were we removed the numbers again and used the following
variables; A B C 0 and a b c 0. Following the same procedure we used in the
previous example, we can solve for the man mixed strategy when his partener goes to watch
the match, which will lead us to the following equality: U
F
=
F
(b) + (1
F
)(c), as the women
get b some percentage of the time and get c the rest of the time. If she decides to go to the
ballet, the equality becomes; U
B
=
F
(c) + (1
F
)(a). Now, taking these two equations to solve
for the man mixed strategy, we can finally get:
F
= (a c)/(a + b -2c).
In order to prove that this mixed strategy is valid, the same condition used before must be
satisfied, Pr(i)e[0,1]. That is,
F
0, we already have a > c, then the numerator is positive and
greater than zero. For the denominator to be positive, (a + b -2c) must be positive. That is
a + b -2c 0, which can be arranged as a c c b, which proves that the denominator is
positive as this is always true.
Male
Theory of Games: An Introduction 7
We must prove that
F
1 to prove the validity of such mixed strategy. That means we must
prove the following; a c a + b 2c, which can be arranged to the following c b, which is
true as we already mentioned that b c 0.
Thus, we have proved that there exist three equilibriums in this game, the two players can
go the Ballet or to the match together or each one of them can go to their preferred show
with a probability of (a c)/(a + b -2c).
7. Classification of game theory
Games can be classified into different categories according to certain significant features.
The terminology used in game theory is inconsistent, thus different terms can be used for
the same concept in different sources. A game can be classified according to the number of
players in the game, it can be designated as a one-player game, two-player game or n-
players game (where n is greater than 2). In addition, a player need not be an individual
person; it may be a nation, a corporation, or a team comprising many people with shared
interests.
7.1 Non-cooperative and cooperative (coalition) games
A game is called non-cooperative when each agent (player) in the game, who acts in her self
interest, is the unit of the analysis. While the cooperative (Coalition) game treats groups or
subgroups of players as the unit of analysis and assumes that they can achieve certain
payoffs among themselves through necessary cooperative agreements [10].
In non-cooperative games, the actions of each individual player are considered and each
player is assumed to be selfish, looking to improve its own payoff and not taken into
account others involved in the game. So, non-cooperative game theory studies the strategic
choices resulting from the interactions among competing players, where each player chooses
its strategy independently for improving its own performance (utility) or reducing its losses
(costs). On the other hand, Cooperative game theory was developed as a tool for assessing
the allocation of costs or benefits in a situation where the individual or group contribution
depends on other agents actions in the game [11]. The main branch of cooperative games
describes the formation of cooperating groups of players, referred to as coalitions, which can
strengthen the players positions in a game.
In Telecommunications systems, most game theoretic research has been conducted using
non-cooperative games, but there are also approaches using coalition games [12]. Studying
the selfishness level of wireless node in heterogeneous ad-hoc networks is one of the
applications of coalition games. It may be beneficial to exclude the very selfish nodes from
the network if the remaining nodes get better QoS that way [13].
7.2 Strategic and extensive games
One way of presenting a game is called the strategic, sometimes called static or normal,
form. In this form the players make their own decisions simultaneously at the beginning of
the game, the players have no information about the actions of the other players in the
game. The prisoners dilemma and the battle of the sexes are both strategic games.
Alternatively, if players have some information about the choices of other players, the game
is usually presented in extensive, sometimes called as a game tree, form. In this case, the
players can make decisions during the game and they can react to other players actions.
Game Theory 8
Such form of games can be finite (one-shot) games or infinite (repeated) games [14]. In
repeated games, the game is played several times and the players can observe the actions
and payoffs of the previous game before proceeding to the next stage.
7.3 Zero-sum games
Another way to categorize games is according to their payoff structure. Generally speaking,
a game is called zero-sum game (sometimes called if one gains, another losses game, or
strictly competitive games) if the players gain or loss is exactly balanced those of other
players in the game. For example, if two are playing chess, one person will lose (with payoff
-1) and the other will win (with payoff +1). The win added to the loss equals zero. Given
that sometimes a loss can be a gain, real life examples of zero-sum game can be very difficult
to find. Going back to the chess example, a loser in such game may gain as much from his
losses as he would gain if he won. The player may become better player and gain experience
as a result of loosing at the first place.
In telecommunications systems, it is quite hard to describe a scenario as a zero-sum game.
However, in a bandwidth usage scenario of a single link, the game may be described as a
zero-sum game.
7.4 Games with perfect and imperfect information
A game is said to be a perfect information game if each player, when it is her turn to choose
an action, knows exactly all the previous decisions of other players in the game. Then again,
if a player has no information about other players actions when it is her turn to decide, this
game is called imperfect information game. As it is hardly ever any user of a network knows
the exact actions of the other users in the network, the imperfect information game is a very
good framework in telecommunications systems. Nevertheless, assuming a perfect
information game in such scenarios is more suitable to deal with.
7.5 Games with complete and incomplete information
In games with complete information, all factors of the game are common knowledge to all
players. That is, each individual player is fully aware of other players in the game, their
strategies and decisions and the payoff of each player. As a result, a complete information
game can be represented as an efficient perfectly competitive game. On the other hand, in
the incomplete information games, the players dose not has all the information about
other players in the game, which made them not able to predict the effect of their actions on
others.
One of the very well known types of such games is the sealed-bid auctions, in which a
player knows his own valuation of the good but does not knows the other bidders
valuation. A combination of incomplete but perfect information game can exist in a chess
game, if one player knows that the other player will be paid some amount of money if a
particular event happened, but the first player does not know what the event is. They both
know the actions of each other, perfect information game, but does not know the payoff
function of the other player, incomplete information game.
7.6 Rationality in games
The most fundamental assumption in game theory is rationality [15]. It implies that every
player is motivated by increasing his own payoff, i.e. every player is looking to maximize
Theory of Games: An Introduction 9
his own utility. John V. Neumann and Morgenstern justified the idea of maximizing the
expected payoff in their work in 1944 [4]. However, pervious studies have shown that
humans do not always act rationally [16]. In fact, humans use a propositional calculus in
reasoning; the propositional calculus concerns truth functions of propositions, which are
logical truths (statements that are true in virtue of their form) [17]. For this reason, the
assumption of rational behaviour of players in telecommunications systems is more
justified, as the players are usually devices programmed to operate in certain ways.
7.7 Evolutionary games
Evolutionary game theory started its development slightly after other games have been
developed [18]. This type of game was originated by John Maynard Smith formalization of
evolutionary stable strategies as an application of the mathematical theory of games in the
context of biology in 1973 [19]. The objective of evolutionary games is to apply the concepts
of non-cooperative games to explain such phenomena which are often thought to be the
result of cooperation or human design, for example; market information, social rules of
conduct and money and credit. Recently, this type of games has become of increased interest
to scientist of different background, economists, sociologists, anthropologists and also
philosophers. One of the main reasons behind the interest among social scientists in the
evolutionary games rather than the traditional games is that the rationality assumptions
underlying evolutionary game theory are, in many cases, more appropriate for the
modelling of social systems than those assumptions underlying the traditional theory of
games [20].
8. Applications of game theory in telecommunications
Communications systems are often built around standard, mostly open ones, such as the
TCP/IP (Transmission Control Protocol/Internet Protocol [21]) standard in which the
internet is based. Devices that we use to access these systems are being designed and built
by a diversity of different manufactures. In many cases, these manufacturers may have an
incentive to develop products, which behave selfishly by seeking a performance
advantage over other network users at the cost of overall network performance [22]. On the
other hand, end users may have the ability to force these devices in order to work in a
selfish manner. Generally speaking, the maximizing of a players payoff is often referred to
as selfishness in a game. This is true in the sense that all the players try to gain the highest
possible utility of their actions. However, a player gaining a high utility does not necessarily
mean that the player acts selfishly. As a result, systems that are prepared to cope with users
who behave selfishly need to be designed. If the designs of such systems are possible,
designers should make sure that selfish behaviour within the system is unprofitable for
individuals. When designing such system is not possible, they should be at least aware of
the impact of such behaviour on the operation of the specified system.
One important thrust in these efforts focuses on designing high-level protocols that prevent
users from misbehaving and/or provide incentives for cooperation. To prevent
misbehaviour, several protocols based on reputation propagation have been proposed in the
literature, e.g., [23], [24]. The mainstream of existing research in telecommunications
networks focused on using non-cooperative games in various applications such as
Game Theory 10
distributed resource allocation [25], congestion control [26], power control [27], and
spectrum sharing in cognitive radio, among others. This need for non-cooperative games led
to numerous tutorials and books outlining its concepts and usage in communication, such as
[28], [29]. Another thrust of research analyzes the impact of user selfishness from a game
theoretic perspective, e.g., [22], [30]. Since the problem is typically too involved, several
simplifications to the network model are usually made to facilitate analysis and allow for
extracting insights. For example, in [22], the wireless nodes are assumed to be interested in
maximizing energy efficiency. At each time slot, a certain number of nodes are randomly
chosen and assigned to serve as relay nodes on the source- destination route. The authors
derive a Pareto optimal operating point and show that a certain variant of the well known
TIT-FOR-TAT algorithm converges to this point. In [22], the authors assume that the
transmission of each packet costs the same energy and each session uses the same number of
relay nodes. Another example is [30], which studies the Nash equilibrium of packet
forwarding in a static network by taking the network topology into consideration. More
specifically, the authors assume that the transmitter/receiver pairs in the network are
always fixed and derive the equilibrium conditions for both cooperative and non-
cooperative strategies. Similar to [22], the cost of transmitting each packet is assumed fixed.
It is worth noting that most, if not all of, the works in this thrust utilize the repeated game
formulation, where cooperation among users is sustainable by credible punishment for
deviating from the cooperation point.
Cooperative games have also been widely explored in different disciplines such as
economics or political science. Recently, cooperation has emerged as a new networking
concept that has a dramatic effect of improving the performance from the physical layer
[23], [24] up to the networking layers [25]. However, implementing cooperation in large
scale communication networks faces several challenges such as adequate modelling,
efficiency, complexity, and fairness, among others. In fact, several recent works have shown
that user cooperation plays a fundamental role in wireless networks. From an information
theoretic perspective, the idea of cooperative communications can be traced back to the
relay channel [31]. More recent works have generalized the proposed cooperation strategies
and established the utility of cooperative communications in many relevant practical
scenarios, such as [25], [26] and [32]. In another line of work, in [27], the authors have shown
that the simplest form of physical layer cooperation, namely multi hop forwarding, is an
indispensable element in achieving the optimal capacity scaling law in networks with
asymptotically large numbers of nodes. Multi-hop forwarding has also been shown to offer
significant gains in the efficiency of energy limited wireless networks [28], [29]. These
physical layer studies assume that each user is willing to expend energy in forwarding
packets for other users. This assumption is reasonable in a network with a central controller
with the ability to enforce the optimal cooperation strategy on the different wireless users.
The popularity of ad-hoc networks and the increased programmability of wireless devices,
however, raise serious doubts on the validity of this assumption, and hence, motivate
investigations on the impact of user selfishness on the performance of wireless networks.
The following chapters will be full of more details about the applications of game theory in
wireless telecommunications systems, including applications of game theory in interface
selections mechanisms, Mobile IPv6 protocol extensions, resource allocations and routing in
Ad-Hoc wireless network and spectrum sharing in Cognitive Radio networks.
Theory of Games: An Introduction 11
9. Summary
This chapter gives a detailed insight in the game theory definition, classifications and
applications of games in telecommunications. Prisoners Dilemma and the Battle of the Sexes
games have been discussed in details, showing different strategies from the players and
discussing the expected outcome of such games. Nash Equilibrium and Pareto Efficient
terms are discussed in details with detailed examples. Moreover, we have discussed mixed
strategies in games and mathematically proved that a mixed strategy in Prisoners Dilemma
example does not exist. We have also proved that a mixed strategy exists in the battle of the
sexes game. Finally, after classifying games into different categories, an introduces to the
applications of game theory in Telecommunications.
10. References
[1] E.R. Weintraub, Toward a History of Game Theory, Duke University Press, 1992.
[2] M. Shor, Brief Game Theory History, available online at;
http://www. gametheory.net/Dictionary/Game_theory_history.html [Accessed
14
th
February 2010].
[3] P. Dittmar, Practical Poker Math, ECW Press, November 2008.
[4] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior,
Princeton University Press, 1944.
[5] Game Theory, SiliconFarEast.com, available online at
http://www.siliconfareast.com/ game-theory.htm [Accessed 20
th
February 2010].
[6] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior
(Commemorative Edition, 60th-Anniversary Edition), With an introduction by
Harold Kuhn and Ariel Rubinstein., 2007.
[7] Prisoner's Dilemma, Stanford encyclopaedia of Philosophy, Available online at;
http://plato.stanford.edu/entries/prisoner-dilemma/ [Accessed 20 February
2010].
[8] A. Rapoport and M. Chammah, Prisoners dilemma: a study in conflict and
cooperation, The University of Michigan Press, Second edition 1970.
[9] D. Fudenberg and J. Tirole, Game Theory, MIT Press, 1983.
[10] J. Nash, Non-Cooperative Games, Second series, vol. 54, No. 2, pp. 286-295, 1951.
[11] W. David, K. Yeung, and L. A. Petrosyan, Cooperative Stochastic Differential Games,
Springer Series in Operations Research and Financial Engineering, 2004.
[12] A.B. MacKenzie, S.B. Wicker, Game Theory in Communications: Motivation,
Explanation, and Application to Power Control, IEEE GLOBECOM 2001, vol. 2,
pp. 821-826, 2001.
[13] J. Leino, Applications of Game Theory in Ad Hoc Networks, Helsinki University of
Technology, Master Thesis, October 30, 2003.
[14] J. Ratliff, Repeated Games, University of Arizona Press, Graduate-level course in
Game Theory, Chapter 5, 1996.
[15] American Mathematical Society, Rationality and Game Theory, Available online at;
http://www.ams.org/featurecolumn/archive/rationality.html [Accessed 1st
March 2010].
[16] J. Friedman (Ed.), The Rational Choice Controversy, Yale University Press, 1996.
[17] A. Lacey, A Dictionary of Philosophy, London: Rout ledge, 3rd ed, 1996.
Game Theory 12
[18] J.W. Weibull, Evolutionary Game Theory, MIT Press, First edition 1997.
[19] J.M. Smith, Evolution and the Theory of Games, Cambridge University Press, 1982.
[20] Evolutionary Game Theory, Stanford encyclopaedia of Philosophy, available online at;
http://plato.stanford.edu/entries/game-evolutionary/ [Accessed 26
th
February
2010].
[21] T. Socolofsky, and C. Kale, TCP/IP tutorial, RFC1180, Network Working Group,
January 1991. Available online at:
http://www.faqs.org/rfcs/rfc1180.html [Accessed 14th March 2010].
[22] R.J. Aumann, and B. Peleg, Von neumann-morgenstern solutions to cooperative games
without side payments, Bulletin of American Mathematical Society, vol. 6, pp.
173179, 1960.
[23] R. La and V. Anantharam, A game-theoretic look at the Gaussian multiaccess channel,
in Proceeding of the DIMACS Workshop on Network Information Theory, New
Jersey, NY, USA, Mar. 2003.
[24] S. Mathur, L. Sankaranarayanan, and N. Mandayam, Coalitions in cooperative wireless
networks, IEEE Journal in Selected Areas in Communications, vol. 26, pp. 1104
1115, Sep. 2008.
[25] Z. Han and K.J. Liu, Resource Allocation for Wireless Networks: Basics, Techniques,
and Applications, New York, USA: Cambridge University Press, 2008.
[26] T. Alpcan and T. Basar, A Globally Stable Adaptive Congestion Control Scheme for
Internet-Style Networks with Delay, IEEE/ACM Trans. On Networking, vol. 13,
pp. 12611274, Dec. 2005.
[27] T. Alpcan, T. Basar, R. Srikant, and E. Altman, CDMA Uplink Power Control as A
Noncooperative Game, Wireless Networks, vol. 8, pp. 659670, 2002.
[28] A. MacKenzie, L. DaSilva, and W. Tranter, Game Theory for Wireless Engineers,
Morgan&Claypool Publishers, March 2006.
[29] T. Basar, Control and Game Theoretic Tools for Communication Networks
(overview), Application of Computer and Mathematics, vol. 6, pp. 104125, 2007.
[30] R. Thrall, and W. Lucas, N-person Games in Partition Function Form, Naval Research
Logistics Quarterly, vol. 10, pp. 281298, 1963.
[31] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory, Philadelphia, PA,
USA: SIAM Series in Classics in Applied Mathematics, Jan. 1999.
[32] G. Owen, Game Theory, London, UK: Academic Press, 3rd edition, October1995.
2
Auction and Game-Based Spectrum Sharing
in Cognitive Radio Networks
Dr. Omar Raoof and Prof. Hamed Al-Raweshidy
Brunel University-West London,
UK
1. Introduction
One of the main reasons behind the concurrent increase in the demand for and congestion of
Radio Frequency (RF) spectrum is the rapid development of radio networks of all kinds in
our world, which has defiantly changed the public feeling about radio. Nowadays, almost
everybody has a mobile phone and radio stations are literary everywhere. Someone can
argue that our world is becoming a radio world where waves are weaving everywhere
around the Earth. Whats more, this congestion has created a battle between the public,
private and military sectors over frequency ownership and has put a premium on the cost of
spectrum. According to a recent research introduced by the FCC (Federal Communications
Commission) and Ofcom, it was found that most of the frequency spectrum was inefficiently
utilized [1-2]. The existing spectrum allocation process, denoted as Fixed Spectrum Access
(FSA), headed for static long-term exclusive rights of spectrum usage [3] and shown to be
inflexible [4]. Studies have shown, however, that spectral utilization is relatively low when
examined not just by frequency domain, but also across the spatial and temporal domains
[5]. Thus, an intelligent device aware of its surroundings and able to adapt to the existing RF
environment in consideration of all three domains, may be able to utilize spectrum more
efficiently by dynamically sharing spectral resources [6 and 7]. Since the 19
th
century, when
the laws of electromagnetic have been discovered and described by the set of Maxwells
equations and technical devices been invented to produce and use these electromagnetic
waves predicted by theory, man has added his own man-made waves to the natural ones
[7].
It is fair to say that, from the very beginning of wireless telephony, maritime radio systems
has always used shared channels [7-8]. For example, 2,182 KHz is used as a calling
frequency as well as emergency signalling frequency and other frequencies are used as
working frequencies. If two ships want to communicate, one should identify a working
frequency and make a call. By specifying a channel or channels, that ships keep watch on,
both emergency and establishing connections between ships can be facilitative. In fact,
channel sharing was necessary and effective because of the lack of sufficient channels
offered to every single ship and due to the fact that, the typical ship will require far less than
a full channel of capacity [7-8]. Around the mid of 1970s, the FCC permitted land mobile
operation on some of the lower UHF channels in several large cities, in order to expand land
mobile services. One group of channels was made available to Radio Common Carriers
(RCCs) to provide mobile service on a common carrier basis. The FCC adopted rules
Game Theory 14
permitting open entry for these channels and requiring carriers to monitor the channels and
select unused channel to carry each conversation. In essence, exclusivity was provided on a
first come, first-served basis one conversation at a time [7-9].
Another example of spectrum sharing is the second generation of cordless telephone (CT2),
developed by the British industry and government in the mid of 1980s. CT2 was designed
to be used in both in home and in public and uses a pool of 40 channels. To establish a call,
any equipment will automatically identify a vacant channel or a channel with the minimum
interference and begins operation on that channel [7-8]. No one can ignore one of the main
advantages of the radio, it can be used anywhere, at any time, capable of building links at
very short distances as well as on a cosmic scale. Radio is a unique tool to connect men and
things without any material medium. It is a wonderful tool for social progress. Having said
all these facts about spectrum sharing, spectrum management can now be seen as a major
goal for telecommunications efficiency. It is necessary that this natural and public resource
be utilized for the profit of as many users as possible, taking care of the largest variety of
needs.
If we want to talk about Cognitive Radio (CR), then we must mention Software Defined
Radio (SDR), which is a transmitter in which operating parameters including transmission
frequency, modulation type and maximum radiated or conducted output power can be
altered without making any hardware changes. The sophistication possible in an SDR has
now reached the level where a radio can possibly perform beneficial tasks that help the user,
the network and help to minimize spectral congestion [7]. In order to raise an SDRs
capabilities to make it known as a CR, it must support three major applications [7]:
Spectrum management and optimization.
Interface with a wide range of wireless networks leading to management and optimization
of network resources.
Interface with human providing electromagnetic resources to aid the human in his and/or
her activates.
We must begin with a few of the major contributions that have led us to todays CR
developments, to truly recognize how many technologies have come together to drive CR
technologies. The development of Digital Signal Processing (DSP) technologies arose due to
the efforts of the research leaders [10-14], who taught an entire industry how to convert
analog signal processes to digital processes. In the meantime, the simulation industry used
in the radio industry was not only practical, but also resulted in improved radio
communication performance, reliability, flexibility and increased value to the user [15-18].
The concept of CR emerged as an extension of SDR technology. Although, definitions of the
two technologys are different, most radio expert agree with the fact that a CR device must
have the following characteristic in order to be distinguished from an SDR one:
1. The named device should be aware of its environment.
2. The device must be able to change its physical behaviour in order to adapt to the
changes of its current environment.
3. The device must be able to learn from its previous experience.
4. Finally, the device should be able to deal with situations unknown at the time of the
device design. In another word, the device should be able to deal with any unexpected
situations.
That being said, up to the authors knowledge, the idea of CR was first discussed officially in
1999 by [19]. It was a novel approach in wireless communications that the author describes
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 15
it as The point in which wireless personal digital assistants (PDAs) and the related
networks are sufficiently computationally intelligent about radio resources and related
computer-to-computer communications to detect user communications needs as a function
of use context, and to provide radio resources and wireless services most appropriate to
those needs. [19]. Whats more, the work introduced in [19] can be considered one of the
novel ideas which discussed CR technology. The work was based on the situation in which
wireless nodes and the related networks are sufficiently computationally intelligent about
radio resources and related computer-to-computer communication to detect the user
communication needs as a function of use context and to provide resources and wireless
resources most required. In another word, a CR is a radio that has the ability to sense and
adapt to its radio environments. This work defined two basic characteristics of any CR
device, which are cognitive capability and re-configurability. In order for the device to
detect the spectrum parameters, the device should be able to interact with its environment.
The spectrum needs to be analysed for spectrum concentration, power level, extent and
nature of temporal and spatial variations, modulation scheme and existence of any other
network operating in the neighbourhood. The CR device should be capable to adopt itself to
meet the spectrum needs in the most optional method. The recent developments in the
concept of software radios DSP techniques and antenna technology helped in this flexibility
in CR devices design.
Finally, the intelligent support of CRs to the user arises by sophisticated networking of
many radios to achieve the end behaviour, which provides added capability and other
benefits to the user.
2. Game theory and spectrum sharing
Players in cooperative games try to maximize the overall profit function of everyone in the
game in a fair fashion. This type of games has the advantage of higher total profit and better
fairness. On the other hand, in non-cooperative or competitive games players try to
maximize their own individual payoff functions. If such a game has a designer with
preferences on the outcomes, it may be possible for the designer to decide on strategy spaces
and the corresponding outcomes (i.e. the mechanism) so that the players' strategic behavior
will not lead to an outcome that is far from desirable [20 and 21]. Recent studies have shown
that despite claims of spectral insufficiency, the actual licensed spectrum remains
unoccupied for long periods of time [8]. Thus, cognitive radio systems have been proposed
[22] in order to efficiently exploit these spectral holes.
Previous studies have tackled different aspects of spectrum sensing and spectrum access. In
[23], the performance of spectrum sensing, in terms of throughput, is investigated when the
secondary users (SUs) share their instantaneous knowledge of the channel. The work in [24]
studies the performance of different detectors for spectrum sensing, while in [25] spatial
diversity methods are proposed for improving the probability of detecting the Primary User
(PU) by the SUs. Other aspects of spectrum sensing are discussed in [26-27]. Furthermore,
spectrum access has also received increased attention, e.g. [28-34]. In [28], a dynamic
programming approach is proposed to allow the SUs to maximize their channel access time
while taking into account a penalty factor from any collision with the PU. The work in [30]
and [35-44] establishes that, in practice, the sensing time of CR networks is large and affects
the access performance of the SUs. In [29], the authors model the spectrum access problem
as a non-cooperative game, and propose learning algorithms to find the correlated equilibria
Game Theory 16
of the game. Non-cooperative solutions for dynamic spectrum access are also proposed in
[30] while taking into account changes in the SUs environment such as the arrival of new
PUs, among others.
Auctions of divisible goods have also received much attention [32] and [45-50]. Where the
authors address the problem of allocating a divisible resource to buyers who value the
quantity they receive, but strategize to maximize their net payoff (i.e. value minus payment).
An allocation mechanism is used to allocate the resource based on bids declared by the
buyers. The bids are equal to the payments, and the buyers are assumed to be in Nash
equilibrium. When multiple SUs compete for spectral opportunities, the issues of fairness and
efficiency arise. On one hand, it is desirable for an SU to access a channel with high
availability. On the other hand, the effective achievable rate of an SU decreases when
contending with many SUs over the most available channel. Consequently, efficiency of
spectrum utilization in the system reduces. Therefore, an SU should explore transmission
opportunities in other channels if available and refrain from transmission in the same
channel all the time. Intuitively, diversifying spectrum access in both frequency (exploring
more channels) and time (refraining from continuous transmission attempts) would be
beneficial to achieving fairness among multiple SUs, in that SUs experiencing poorer
channel conditions are not starved in the long run.
The objective of the work in this chapter is to design a mechanism that enables fair and
efficient sharing of spectral resources among SUs. Firstly, we model spectrum access in
cognitive radio networks as a repeated cooperative game. The theory and realization of
cooperative spectrum sharing is presented in detail, where we assume that there is one PU
and several SUs. We also consider the case of dynamic games, where the number of SUs
changes. The advantages of cooperative sharing are proved by simulation. Secondly, we
discuss the case of large number of SUs competing to share the offered spectrum and how
the cooperative game will reduce the sellers and bidders revenue. Finally, we introduce a
competitive auction and game-based mechanism to improve the overall system efficiency in
terms of a better fairness in accessing the spectrum.
Throughout this chapter, an adaptive competitive second-price pay-to-bid sealed auction
game is adapted as solution to the fairness problem of spectrum sharing between one
primary user and a large number of secondary users in cognitive radio environment. Three
main spectrum sharing game models are compared, namely optimal, cooperative and
competitive game models introduced as a solution to the named problem. In addition, this
chapter prove that the cooperative game model is built based on achieving Nash
equilibrium between players and provides better revenue to the sellers and bidders in the
game. Furthermore, the cooperative game is the best model to choose when the number of
secondary users changes dynamically, but only when the number of competitors is low. As
in practical situations, the number of secondary users might increase dramatically and the
cooperative game will lose its powerful advantage once that number increases. As a result,
the proposed mechanism creates a competition between the bidders and offers better
revenue to the players in terms of fairness. Combining both second-price pay-to-bid sealed
auction and competitive game model will insure that the user with better channel quality,
higher traffic priority and fair bid will get a better chance to share the offered spectrum. It is
shown by numerical results that the proposed mechanism could reach the maximum total
profit for SUs with better fairness. Another solution is introduced in this chapter, which is
done by introducing a reputation-based game between SUs. The game aims to elect one of
the SUs to be a secondary-PU and arrange the access to other SUs. It is shown by numerical
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 17
results that the proposed game managed to give a better chance to SUs to use the spectrum
more efficiently and improve the PU revenue.
3. Assumptions and system model
3.1 PUs and SUs and allocation function
In the following sections, we consider a spectrum overlay-based cognitive radio wireless
system with one PU and N SUs (as shown in Figure 6-1). The PU is willing to share some
portion (b
i
) of the free spectrum (F) with SU i. The PU asks each SU a payment of c per unit
bandwidth for the spectrum share, where c is a function of the total size of spectrum
available for sharing by the SUs. The revenue of SU i is denoted by r
i
per unit of achievable
transmission rate. A simple example is shown in Figure 1.
Fig. 1. System model for spectrum sharing.
Both centralized and distributed decision making scenarios are considers in this work. In the
former case, each SU is assumed to be able to observe the strategies adopted by other users
(i.e., either the users have the ability to discuss their shares between them, or the PU sends
update of each SU share). In the latter case, the adaptation for spectrum sharing is performed
in a distributed fashion based on communication between each of the SUs and the PU only
(i.e., the secondary users are unable to observe the strategies and payoffs of each other).
3.2 Cost function, and wireless system model
A wireless transmission model based on adaptive modulation and coding (AMC) where the
transmission rate can be dynamically adjusted based on channel quality is to be assumed in
this chapter. With AMC, the signal-to-interference noise ratio (SINR) at the receiver is
denoted as and equals to;
0
i ij
j i i ij
p h
n p h

=
=
+ E
(1)
Game Theory 18
Where h
ij
is the channel gain from the user js transmitter to user is receiver, p
i
is the
transmitting power of the user i, and n
0
is the thermal noise level. The rate for user i (in
bits/sec/Hz) is given by;
R
i
=log
2
(1+) (2)
The spectral efficiency I
s
of transmission by a secondary user can be obtained from [16];
I
s
= log
2
(1+K) (3)
Where k=1.5/ (ln0.2/BER
tar
), BER
tar
is the target bit-error-rate of the system. The pricing
function [17] which the SUs pay is given by;
c(F)= y(b
1
+b
2
++b
n
)
z
(4)
y and z are assumed to be positive constants and greater than one so that the function in
convex (i.e., the function is continues and differentiable), knowing that B is the set of bids for
all SUs (i.e., B={b
1
, b
2
, ., b
n
}). Now let us denote w as the worth of the spectrum to the PU.
Then, the condition c(F) > w
b
jeF
b
j
must be satisfied in order to ensure that the PU is
willing to share spectrum of size b =
b
jeF
b
i
d
j
with the SUs (if it is equal, then PU will not
gain any profit).
The overall revenue of any SU can be explained as the combination of the user revenue of
achievable transmission rate, the spectral efficiency and the shared portion of the spectrum
(i.e., r
i
I
s
b
i
). While the cost the user must pay is b
i
c(F). Then, the profit of every SU can be
represented as;

i
= r
i
I
s
b
i
- b
i
c(F) (5)
The marginal profit of SU i can be obtained from;
1 ( )
( ) ( )
j j
z z
i
b F j b F j s i i
i
d F
b b
y yzb
r I
db

e e
E E
= (6)
Knowing that, the optimal size of allocated spectrum to one SU depends on the strategies of
other SUs are using. Nash equilibrium is considered as the solution of the game to ensure
that all SUs are satisfied with it. By definition, Nash equilibrium of a game is a strategy
profile with the property that no player can increase his payoff by choosing a different
action, given the other players actions. In this case, the Nash equilibrium is obtained by
using the best response function, which is the best strategy of one player given others
strategies. Let ST
-i
denote the set of strategies adopted by all except SU i (i.e., ST
-i
= {st
j
|j=1,
2, , N; ji} and ST = ST
-i
{st
i
}). The best response function of SU i given the size of the
shared spectrum by other SUs b
j
, where j i, is defined as follows;
BR
i
=arg max
bi

i
(ST
-i
{b
i
}) (7)
Then the game is in Nash Equilibrium if and only if;
b
i
= BR
i
(ST
-i
),
i
(8)
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 19
4. Spectrum sharing strategies
Cognitive radio is an intelligent wireless communication system that is aware of its
surrounding environment and can be used to improve the efficiency of frequency spectrum
by exploiting the existence of spectrum holes [22]. Spectrum management in cognitive radio
aims at meeting the requirements from both the primary user and the secondary users.
There are three strategies in spectrum sharing optimal, competitive and cooperative models.
4.1 Optimal spectrum sharing model
The objective of optimal model is to maximize the profit sum, which may make some
secondary users have no spectrum to share [28, 32 and 51]. Therefore, it is unfair for all
secondary users. From equation 6-6, the total marginal profit function for all the SUs can be
denoted as follows:
1
( ( ))
( )
N
j i
i
d F t
db t

=
E
.
In order to get the solution of the biggest profit for all the secondary users, an optimal
equation is built, as (6-9);
Maximize:
1
( )
N
j i
F
=
E (9)
Subject to: b
i
0, b
i
e F
Our assumption works as follow, the initial sharing spectrum is b
i
(0) for the SU i, which is
sent to the primary user. The PU adjusts the pricing function c, and then it is sent back to the
SU. Since all secondary users are rational to maximize their profits, they can adjust the size
of the requested spectrum b
i
based on the marginal profit function. In this case, each
secondary user can communicate with the primary user to obtain the differentiated pricing
function for different strategies. The adjustment of the requested/allocated spectrum size
can be modelled as a dynamic game [49] as follows:
( )
( 1) ( ( )) ( ) ( )
( )
i
i i i i i
i
d F
b t f b t b t b t
db t

q + = = + (10)
Where b
i
(t) is the allocated spectrum size at time t to SU i and
i
is the adjustment speed
parameter (i.e., which can be expressed as the learning rate) of SU i. f(.) denotes the self-
mapping function. The SU can estimates the marginal profit function in the actual system by
asking the price for share a spectrum from the PU of size b
i
(t) , where is a small number
(i.e., is 0.0001). Simply after that the SU observes the response price from the PU c
-
(.) and
c
+
(.) for b
i
(t)- and b
i
(t)+ , respectively. Then, the marginal profits for the two cases
i

(t)
and
i
+
(t)are compared and the marginal profit can be estimated from;
(.) (.) (.)
2
i i i
i
d
db

t
+

= (11)
The overall optimal profit can be estimated using equation (9).
4.2 Competitive spectrum sharing model
The main objective of competitive model is to maximize the profits of individual SUs by a
game. The result is Nash equilibrium. In the distributed dynamic game, SUs may only be able
Game Theory 20
to observe the pricing information from the PU; they cannot observe the strategies and
profits of other SUs. The Nash equilibrium for each SU is built based on the interaction with
the PU, similar to the case of the optimal sharing model. Since all SUs are rational to
maximize their own profits, they can adjust the size of the requested spectrum b
i
based on
the marginal profit function (i.e., equation (6)). In this case, each SU can communicate with
the primary user to obtain different pricing function for different strategies. The adjustment
of the requested/allocated spectrum size in competitive games show only a slight difference
with optimal games, as each individual user is looking at improving his/her own profit. So
equation (9) can be rewritten as;
Maximize:
i
(F) (12)
Subject to: b
i
0, b
i
e F
In a similar way to the optimal game, an SU can estimate its marginal profit using the
following equation:
{ }
( ( )) 1
( ( ) ) ( ( ) )
( ) 2
i
i i i i
i
d F t
F t F t
db t

t t
t
= + (13)
When b
i
(t + 1) = bi (t) is satisfied, the Nash Equilibrium points (b
0
, b
1
, b
2
, , b
N
) can be
obtained.
4.3 Cooperative spectrum sharing model
As explained in previous section, in the model of competitive spectrum sharing, Nash
equilibrium obtained at the maximum of the individual profit of SU. The result is not the best
because they do not consider the interaction on other users. For cooperative spectrum
sharing, the SUs can communicate with the consideration on the behaviour to other users.
In this chapter, we assume that players can reach in common by communicating with each
other. Decreasing the size of sharing spectrum a little for all the SUs on Nash equilibrium,
(i.e., a factor
i
(0 <
i
< 1) is multiplied on each SU strategy of Nash equilibrium). Although the
size of shared spectrum has decreased, the cost which the PU charges to the SU decreases
too, which results in the increase of the overall profit for all SUs and the total profits
increase as well, but it might reduce the PU revenue.
SUs Nash Equilibrium strategy can be got from equation (10). All SUs will negotiate and
multiply
i
, the cooperative strategy is obtained (i.e.,
1
b
1
,
2
b
2
, ..,
N
b
N
).
i
is chosen in such
a way that both the overall and individual profit is maximized, which we called as the
cooperative state;
Maximize:
1
( )
N
j i
F
=
E and
i
(F) (14)
Subject to: b
i
0, b
i
e F
However, we need to raise the problem of instability of this model. It is possible that one or
more SUs may deviate from Nash equilibrium. For example, suppose u1 to be the first SU to
share the spectrum and want to deviate, its profit may increase by setting its marginal profit
function of equation (6) to zero. If another SU u2 does not change its strategy, the profit of u2
will decrease. Therefore, any SU has the motive to deviate from cooperative state. In order
to solve this problem, a mechanism needs to be applied to encourage the SUs not to deviate
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 21
from the Nash state by computing the long term profit of the SU. Suppose SU i is looking
deviate from the Nash state, while SU j (ji) is still in the named state. Before SU i deviate, it
will compute the long term profit. The mechanism will multiply the future profit of SU i (if
decided to deviate) with a weight
i
(0 <
i
<1), which would make the profit in future stages
are not higher than that of the previous stages, which means that the current profit is more
valuable than future stages.
For any SU i,
i
Ns
,
i
N
,
i
d
denotes the profits of Nash state, Nash Equilibrium and deviation,
respectively. There are two cases: one is that they all in Nash at all stages, no SU to deviate
from the optimal solution, the long term profit of any SU i is shown in equation (15). The
other case is that SU i deviates from the optimal solution at the first stage, it will be in Nash
equilibrium state in the following stages, and the long term profit of SU i is shown in
equation (16).
2
1
...
1
Ns Ns Ns
i i i i i i
i
o o
o
+ + + =

(15)
2
...
1
d N N d i
i i i i i i i
i
o
o o
o
+ + + = +

(16)
The Nash state will be maintained if the long-term profit due to adopting the state is higher
than that caused by deviation.
1
1 1
d i
i i i
i i
o

o o
> +

i.e.,
d Ns
i i
i
d N
i i

o

>

(17)
From equation (15), we know that the Nash state will be kept because of low long term
profit for the SU who wants to deviate. The weights
i
are the vindictive factors to inhabit
the motive of leaving the cooperative state.
5. Dynamic cooperative model
In reality, the number of SUs may change. Sometimes there are more secondary users to
apply for the spectrum offered by the primary user, and sometimes the secondary users
have finished the communication and drop out of the spectrum as it has taken up. For
example, let us suppose that there are two SUs, which have been in Nash state. Now there is
another (new) SU to apply for the offered spectrum. We assume that the PU has no more
spectrums to share. This will lead us to one solution, which is that the two SUs should make
some of their spectrums exist to the newcomer.
During the process of reallocating, an adaptive method is applied with the following
requirements. The total profit for all the SUs should be the biggest and it should be fair for
the reallocation. Being prior users it is rational for them to have priority in spectrum
allocation than those who comes later. In order to keep the total profit to maximum, those
Game Theory 22
with better channel quality could take up more spectrum space. Therefore, the SUs with
better channel quality could stop spectrum retreating earlier than those with worse channel
quality. When the SUs reach optimal solution, the fairness will not be as good as the three
SUs getting into Nash state directly. The reason is that these SUs coming at different time do
not have the same priorities.
When SUs have finished the communication and exited the spectrum they had shared, an
adaptive method is applied. A fixed part of the spectrum is allocated to the remaining SUs
for each step. It is possible for SUs with better channel quality acquire more spectrum in
order to make the total profit bigger.
6. Simulation results
6.1 Static game (two SUs only in the game)
In this section, we will consider a CR environment with one PU and two SUs sharing a
frequency spectrum of 20MHz to 40MHz. The system has the following settings; for the
pricing function, c(F), we use y=1 and z=1. The worth of spectrum for the PU is assumed to
be one (i.e. w=1). The revenue of a SU per unit transmission rate is r
i
= 10, i. The target
average BER is BER
tar
= 10
-4
. The initial value is b
i
(0)= 2 . The adjustment speed parameter
i
=0.09. The SNR for SUs u
1
and u
2
are denoted by
1
,
2
where
1
=11dB,
2
=12dB.
6.1.1 Optimal and competitive models
As explained in the previous section, the total profit is represented by (B) =
1
(B) +
2
(B) .
In Figure 2, the total profits in optimal model arrived at its biggest value 228.7333 when (b
1
,
b
2
) = (4.1, 15.6).
The trajectories of optimal model and competitive model are shown in Figure 3, (with
1
=11dB,
2
=12dB), the initial value is (2, 2) for the two models. In competitive model, the
Fig. 2. Total profit and spectrum share using optimal game.
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 23
Fig. 3. Optimal and Competitive games
shared spectrum is determined by a game, where the two SUs have been in Nash equilibrium.
In our simulation, the Nash equilibrium is at (14.2591, 24.1302). The sum of spectrum sharing
is 11.3893 with the total profit of 228.2378.
It can be seen that the total profit for optimal model is higher than that of competitive model
obviously. But one SU has no spectrum sharing for the optimal model, which means the lack
of fairness. The advantage of competitive model is fair with a lower profit sum.
6.1.2 Cooperative spectrum sharing game
Based on the Nash equilibrium, we set the weight
i
in the range of [0.5, 1]. In order to keep
the fairness, we assume |
1

2
| 1 to guarantee the size of sharing spectrum is similar for
both two SUs. Two SUs got their Nash equilibrium at (18.2591, 19.1302). At
1
=0.70,
2
=0.80,
the total profit of 234.4963. Compared with the competitive model, we found that the shared
spectrum in cooperative model is less than that of competitive model; it has a bigger total
profit than that of Nash equilibrium, as shown in Figure 3.
The reason is that we set (
1
b
1
,
2
b
2
) as the strategies to share the spectrum, the price is
lower, and the total profit will increase. Now, let us suppose the SU u
1
deviates from the
optimal solution. The strategy of SU u
2
does not change. SU u
1
adopts the strategy based on
the marginal profit function. The profit for the two SUs will change when SU u
1
deviated.
The comparison of the individual profit in cooperative model, competitive model and
deviation is shown in Figure 4. The total profit for the SUs is shown in Figure 5.
1
is a
variable, which changes in the range of 8~11dB,
2
=12dB.
It can be seen that
1
,
2
are bigger in the cooperative model, compared with the competitive
model. Therefore, the total profit is bigger too in the cooperative model. When SU u1
deviates from the cooperative state,
1
is higher, and
2
is lower, and the total profit is lower
(i.e. the amount of
1
increasing is smaller than that of
2
decreasing) as well.
Game Theory 24
Fig. 4. Total profit with different modes.
Fig. 5. User Profit with different modes.
6.1.3 Dynamic spectrum sharing game
The pervious results were based on the two SUs. The analyzing method is similar for more
SUs. In practice, the number of SUs may change. For example, there is another secondary
user denoted by u
3
looking to apply for the offered spectrum. We assume that the channel
quality for u
3
is the same with secondary user u
2
(
1
is a variable,
2
=
3
=12dB). There is no
more free spectrum for the primary user to share with others. The previously mentioned
adaptive method is applied in the allocation of spectrum. First u
1
and u
2
exit a fixed ratio of
spectrum to u
3
, and the total profit is computed. If the total profit could increase, the process
will go on. If the total profit decreases, the SU with a better channel state will stop the
process of exit. The trajectory of the process is shown in Figure 6. In addition, the
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 25
corresponding total profit is shown in Figure 6-7. When a new SU applies for spectrum
sharing, it would converge to the point of (3.418948, 5.4642, 0.4936). The total profit is
62.3421, which is a little bigger than the case with two SUs. When the third SU exits the
spectrum, an adaptive method is applied to reallocate the spectrum. The left two SUs
converge to (2.2148, 5.9393) with a total profit of 73.9867, as shown in Figure 6-8.
Fig. 6. Spectrum sharing in dynamic game.
Fig. 7. Dynamic game and user profit.
Game Theory 26
Fig. 8. Spectrum Share when user retreats.
7. Is the cooperative game visible?
So far we have discussed three game models to solve the problem of spectrum sharing in CR
systems. We proved that the optimal game would improve the overall profit of the players
in the game, which might lead to unfair distribution of the offered spectrum. The
competitive game shows a lower overall profit, but gives a better share to the user with
better channel quality, who ask for a share earlier and stays active for longer period (i.e., a
higher priority as compared to new comers). Finally, the cooperative game gives the best
overall individual profit and it is the best way to insure a fair share between multiple users
in any CR system. However, does the cooperative game model works in an actual CR
system?
In practical CR environment, the communication between competitors (i.e., players) is very
hard to achieve. Individual users tend to contact the PU and ask for service [49], users can
only observe the pricing function form the PU, but not the strategies and profits of other
users. Nevertheless, achieving a cooperative scheme between the SUs (either, the PU forces
the SU to get a fair share or using the model mentioned earlier) would improve both the
seller and users revenue. Let us use the same assumption used in the previous section,
where a PU have a 30MHz of free spectrum to offer to a group of users. The cooperative
mode will work when the number of players is relatively small, so each player can discuss a
fair share with the rest of the players. However, when the number of SUs increases, let say
20 or more SUs, the cooperative mode will not be useful anymore. If the PU or the users in
such a scenario would decide to use the cooperative mode, the individual profit and share
will be very low as compared to competitive game, taking into account the channel quality,
user need and priority.
In order to solve such a problem, two solutions are proposed in the following sections.
Firstly, a second-price pay-to-bid (or sometimes called as pay-as-bid) sealed auction
mechanism is introduced to insure a fair competitive game between SUs. Secondly,
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 27
reputation-based auction game is introduced as non-cooperative game to assign a SU to be a
secondary-PU between other SUs. More details in the following sections:
7.1 Pay-to-Bid competitive auction
The allocation mechanism works as follows, let W= [w
1
, w
2
, , w
n
] be the non-negative bids
(i.e., user valuation) that the SU will pay in order to get a share of the offered spectrum and
let X= [x
1
, x
2
, ., x
n
] be the amount of the spectrum per unit bandwidth they are allocated
as a result. We assume that the PU will announce the auction per unit bandwidth, for
example the SUs will offer a bid for every 1MHz they will be allocated.
This allocation is made according to a cost-based allocation mechanism t, so that with the
given payment w, the allocation to SU i is given by x
i
=
i
(w), as shown in Figure 6-9. c will
be assumed to be the reserved price of the PU, any SU bidding less than that will be
withdrawn from the auction.
In order to reflect user is valuation of the offered spectrum, a simple valuation function is
proposed:
v
i
= I
s
up
i
(18)
Where v
i
is user is valuation to the offered spectrum per unit bandwidth, and up
i
defines
how much the user needs to get the desired share of the spectrum, which is a function of
user traffic priority (tp
i
) and the channel SNR (
i
);
up
i
= tp
i

i
(19)
Fig. 9. Pay-to-bid allocation mechanism.
The user valuation can be interpreted that user i uses the importance of his traffic and the
channel quality (already known to all users) as a ruler to set his bid in the auction. This
valuation measures the SU (if he wins the auction) capabilities to bid more for the offered
spectrum keeping in mind the capacity of his channel. We can see that when the channel
condition is good (according to equation (3)), the user will be more willing to increase his
bid. As a result, a higher bid would be expected from him/her and vice versa.
We must mention that the auction mechanism is designed in such a way that v
i
does not
represent the real price that an SU has to pay during the auction. Simply it is an
interpretation of the strategic situation that a node is facing. In fact v
i
reflects the
relationship between the user valuation and the channel condition. Additionally, since the
Game Theory 28
channel coefficient k is a random variable with a known distribution to each user, the
distribution of the valuation v
i
is also known (according to their relationship shown in
equation (16)). This means that v
i
lies in the interval [v
min
, v
max
]. We defined Bid as the bid
space in the auction, {bid
1
, bid
2
, , bid
N
}, which represent the set of possible bids submitted
to the PU. We can simply assign bid
0
to zero without loss of generality, as it represents the
null bid. Accordingly, bid
1
is the lowest acceptable bid, and bid
N
is the highest bid. The bid
increment between two adjacent bids is taken to be the same in the typical case. In the event
of ties (i.e. two bidders offer the same final price), the object would be allocated randomly to
one of the tied bidders.
To find the winner of the first-price sealed-bid pay-to-bid auction, a theoretical model is
defined based on the work of [52]. The probability of detecting a bid bid
i
is denoted as
1
, the
probability of not participating in the named auction will be denoted as
0
. Then the vector ,
which equals to (
1,

2, .,

N
), denotes the probability distribution over Bid, where (
0
N
i =

i
=
1). Now we introduce the cumulative distribution function, which is used to find out
whether a user i will bid with bid
i
or less,
0
i
j =

j
= , all of them are collected in the vector .
Then, any rational potential bidder with a known valuation of v
i
faces a decision problem of
maximizing his expected profit from winning the auction; i.e.;
( ) ( | )
i
max
bid Bid i i i
v bid Pr winning bid
< e >
(20)
The equilibrium probability of winning for a particular bid b
i
is denoted as
i
, and these
probabilities are collected in 0, (0
0,
0
1,
0
2
, , 0
n
). Using , the elements of the vector 0 can be
calculated. We can easily find that 0
0
is known to be zero, as if any bidder submitted a null
bid to the source, he is not going to win. We can calculate the remanning elements of 0 as it
can be directly verified that the following constitute a symmetric, Bayes-Nash equilibrium [53]
of the auction game:
1
1
0, 1, 2,...,
( )
n n
i i
i
n n
i i
i n
n

= =

(21)
We used the notation of Bayes-Nash equilibrium as defined in [53], there approach is to
transform a game of incomplete information into one of imperfect information, and any
buyer who has incomplete information about other buyers values is treated as if he were
uncertain about their types. From equation (21), we can see that the numerator is the
probability that the highest bid is exactly equal to bid
i
, while the denominator is the expected
number of users how are going to submit the same bid (i.e., bid
i
). For any user in the game,
the best response will be to submit a bid which satisfies the following inequality;
( ) ( )
i i i j j j
v bid v bid j i 0 0 > =
The above inequality shows that user is profit is weakly beat any other user js profit. The
above inequality is the discrete analogue to the equilibrium first-order condition for
expected-profit maximization in the continuous-variation model [52], which takes the form
of the following ordinary differential equation in the strategy function (v
i
);
,
( 1) ( ) ( 1) ( )
( ) ( )
( ) ( )
i i
i i i
i i
n f v n f v
v v v
F v F v

+ = (22)
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 29
Where f(v
i
) and F(v
i
) are the probability density and cumulative distribution function of each
bidder valuation respectively. We assume that they are common knowledge to bidders
along with n, the number of bidders in the system. The reserve price is denoted by c, (In
many instance, sellers reserve the right not to sell the object if the price determined in the
auction is lower than some threshold amount [53], say c > 0), and the above differential
equation has the following solution;
1
1
( )
( )
( )
i
v n
r
i i
n
i
F u du
v v
F v

=
)
(23)
In the case of the first-price sealed-bid auction, the bidder i will submit a bid of bid
i
= (v
i
) in
equilibrium and he will pay a proportional price to his bid if he wins. On the other hand, for
the second-price sealed-bid auction, a user I will submit his valuation truthfully. This is
because the price a user has to pay if he wins the auction is not the winning bid but the
second highest one. Therefore, there is nothing to drive a user to bid higher or lower than
his true valuation to the data offered by the server. In this case, bid
i
= v
i
, shown in equation
(18), and the payment process is the same as in the first-price auction. Once the winner has
been announced, the PU will send an update message to all the SUs with the second highest
price they need to pay in order to gain access. All SUs must pay the winning bid per unit
bandwidth. To insure that the winner will get a higher priority than the rest of competitors,
PU will send the winning bid to everyone and treat their replies according to the first bid
was offered by the SUs in the first place.
This mechanism will offer a better competition in terms of fairness between players, the user
with a better channel quality, a higher priority traffic and honest valuation will get a much
better chance than other users to gain access to his/her desired share. Moreover, the named
mechanism will improve the seller and winners revenue as compared to the optimal and
cooperative game models.
Finally, next we will test the named mechanism with similar scenario assumptions as in the
previous section. We are comparing three models; first, when the spectrum is offered to the
users using a cooperative game. Second, using a similar setting but with a competitive game
and finally a competitive second-price pay-to-bid sealed auction. We will study the effects in
two simple scenarios; one, a SU (named u
1
) who is competing with other bidders to get a
share of the spectrum since the PU announce the auction. Two, a new comer is joining the
game (the newcomer will join the game as the eleventh user onward) and how the
introduced mechanism will improve his/her revenue, taking into account that the new
comer has an excellent channel quality and a fair bid.
Figure 10, proofs what we discussed in section 6.1.3 in terms of individual user revenue.
Although the cooperative games shows a better start (i.e., when the number of bidders is
low), the cooperative game tries to improve the players revenue and keep a fair share
between all bidders. This would cause a sharp decrease in the seller revenue when the
number of bidders increases. On the other the competitive game takes into account the
channel condition and the user ability to grab his/her share before the others, thats why it
shows better revenue when compared to the cooperative model.
For the second scenario, Figure 11 shows the dramatic improvement in the newcomer
revenue; keeping in mind that his/her priority is rather high. Clearly, the introduced
mechanism helped in improving spectrum share in terms of fairness, massively improving
the players revenue when compared to the other models and gives the PU a better deal by
using the second-price sealed-auction.
Game Theory 30
Fig. 10. SU revenue vs. number of users with different models.
Fig. 11. Newcomer revenue vs. number of users.
7.2 Reputation-based non-cooperative auction games
With this game, PU will assign the spectrum to the winner of the second-price sealed
auction process. The revenue of the PU will not change, as using the second-price auction
insures that all bidders will bid around the real value of the offered spectrum. The winner of
the auction will be a new PU between the rest of the SUs, and will have the right to decide
whether to share the spectrum with the rest or not. However, a penalty factor is introduced
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 31
to insure that not only paying more will guarantee a share of the spectrum but also
reputation will be combined with each bid. This factor will be forwarded to the PU and will
show whether the winner of the last auction was popular or not, which is done by helping
other SUs to share the offered spectrum.
In this section we will represent the infinitely repeated version of game G by G

(i.e. this is
the case when G is going to be played over and over again in successive time periods). We are
assuming that the PU is offering a single frequency band to be shared by other SUs.
However, if the PU is planning to offer more bands then the proposed mechanism must be
repeated for the other bands between the secondary users. We will define the user
reputation as R which will depends on user performance during any time period t as well as
in prior time periods. Reputation of player i in some time period t is denoted by
i
t
R .
Formally, we define node reputation as follows:
1
(1 ) 0 1, 2
i i
t t
R R w t

= + ss > (24)
Where is the history of the user, it depends on the user reputation in the previous periods
according to user behaviour. w is equal to 1 when player i at time t is interested in
sharing the offered spectrum and 0 otherwise. Therefore, 0
i
t
R 1, i.e. the reputation
value of each player varies between 0 and 1 (including) (
i
t
R e[0,1]). Moreover, the
reputation value of all players is equal to 0 when t = 0. A high value of means the more
importance is assigned to a players need in sharing the spectrum with the PU (higher
priority) during the current period than its previous need record, and vice versa. Thus,
when is high, a user with even low reputation value in the current time period t, can
significantly improve his/her reputation when it realises that it needs a better share of the
spectrum.
As was defined the Nash equilibrium case earlier, the evaluation of the Nash equilibrium of
the repeated game G

will be engaged. By finding the Nash equilibrium of G

it leads to the
deduction of the Nash equilibria of G. The proposed incentive mechanism is based on a
players links reputation R. The benefit of which is that a player draws from the system to its
contribution, the benefit is a monotonically increasing function of a players contribution.
Thus, this is a non-cooperative game among the players, where each player with high
priority traffic wants to maximize his/her utility. The classical concept of Nash equilibrium
points a way out of the endless cycle of speculation and counter-speculation as to what
strategies the players should use. The intent is to deduce a symmetric Nash equilibrium
because all the players belong to the same population/network (i.e., assume the same role)
and it is therefore easier (i.e., require no coordination among players) to achieve such an
equilibrium. If the players in a game either do not differ significantly or are not aware of any
differences among themselves (i.e., if they are drawn from a single homogeneous
population) then it is difficult for them to coordinate and a symmetric equilibrium, in which
every player uses the same strategy, is more compelling.
The argument of a single homogeneous population implies that all the peers in a CR
network have equivalent responsibilities and capabilities as everybody else. We assume that
if the player chooses the action {want to share}, this will assign him a probability of p, and if
the player chooses the action {does not want to share}, this will assign one a probability of
1 - p.
Game Theory 32
It must be mentioned that in the action profile, a time and money saving Nash equilibrium
case is defined, if all players choose the action {does not want to share}. As this will mean that,
players are not interested in sharing the spectrum for the entire communication time. That is
to say, users have low priority traffic and accessing the spectrum will be by chance, players
will not compete to send their data and will not offer more money to the PU to get the
spectrum. If any other player i decided to switch to the action {want to share}, its payoff will
be C which is less than a payoff of 0 that the node gets when decided not to share the
spectrum. An undesirable Nash equilibrium case is generated, if all the players choose the
action {want to share}. This is easy to see because all nodes will have to compete against each
other again, this will waste time and the winner will be the PU, as one of the SUs should
pay more to share the offered spectrum.
The expected payoff of any player in period t when it selects the action {want to share} is:
( )
share
t
p C R U + (25)
This payoff is denoted as Payoff
share
, U is the nodes utility. Similarly, the payoff for any
player selects the action {does not want to share} will be:
(1 )( )
dontshare
t
p R U (26)
This will be denoted as payoff
dontshare
. It is easy to show that the term
share
t
R U captures the
notation that the probability of SU becoming a secondary PU by sharing the offered
spectrum is directly proportional to nodes reputation.
share
t
R is player i reputation when he/she wants to share the offered spectrum at time t (i.e.
w = 1 in equation (24)), and
' don tshare
t
R is player i reputation when he/she decides to take the
action {does not want to share} at the same time period t (i.e. w = 0 in equation (24)), from
equation (24), we can get:
1
(1 )
share
t t
R R

= +
and
1
(1 )
dontshare
t t
R R

= (27)
Generally, each players expected payoff in equilibrium is his/her expected payoff to any of
its actions that he/she uses with positive probability. The above useful characterization of
mixed-strategy Nash equilibrium yields to:
payoff
share
= payoff
dontshare
(28)
Using equations 6-25, 6-26, and 6-27;
1 1
( ( (1 ) ) ) (1 )( (1 ) )
t t
p C R U p R U

+ + = (29)
Solving equation 9 to get the final value of p;
1
1
(1 )
2 (1 )
t
t
R U
p
C R U U


=
+ +
(30)
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 33
It must be mentioned that the value p obtained above is not a constant, but varies in each
time interval depending upon a nodes reputation at the end of the previous time interval
t -1.
Finally, the mixed strategy pair (p, 1 p) for actions { want to share, does not want to share}
respectively, is a mixed strategy Nash equilibrium for the players (i.e. nodes in the network).
Assuming no collusion among nodes, if all the other nodes follow the above strategy, then
the best strategy for any node is to also to follow one of the above strategies. Actually, this is
a symmetric mixed strategy Nash equilibrium for any G, as well as G

. In fact, it is a more
stable equilibrium than the one in which no node is interested in sharing the offered
spectrum. This is caused by two reasons. First, when none of the SUs is interested in sharing
the spectrum, the network is not useful to any user. Second, in real-time scenarios, users that
derive finite utility from altruism would always send some messages irrespective of how
much they obtain in return. Therefore, it is unlikely to have a scenario in which no node is
looking to contact the PU to share the spectrum.
7.3 Properties of the proposed Nash Equilibrium
In this section, we will present some of the interesting properties of the Nash equilibrium
derived in the section above
7.3.1 Simplicity of calculating the Nash Equilibrium
In section 6.7.2, we have calculated the probability of achieving the equilibrium point
between the SUs. This was based on which node will decide to share the spectrum with the
PU and become a secondary PU. In each round of the game (or time period t) players decide
whether they should ask to share the offered spectrum or not, based on their reputation at
the end of the prior time period. This probability, as one can see, does not remain constant
from one period to another. Moreover, it depends on a players reputation at the end of the
last time period. Players can calculate their reputation using equation (24), since they know
precisely their actions at each round of the game. Thus, determining the Nash equilibrium
strategy is fairly straightforward for any player. However, it must be noted that there is an
inherent assumption that nodes are serviced based on their current reputation.
Figure 12, shows how players reputations change in every time interval depending on their
Nash strategy. At the beginning of the communication time, both, player 1 and 2 are
competing with each other to guarantee access to the offered spectrum. However, player 1
uses the spectrum but at the same time managed to help player 2 (i.e. player 1 will be the
secondary PU and will manage the access of players 2 and 3 to the offered spectrum). Player
3 shows his interest in the offered spectrum after the third time interval, and managed to
use the spectrum once both player 1 and 2 finished using it or they are not interested
anymore in sharing it. The figure shows the players (nodes) reputation values 0
i
t
R 1
over ten time intervals.
On the other hand, Figure 13 below shows the same result but over a longer time period,
around nine hundred time intervals. Similarly, three nodes are competing with each other,
player one with the highest reputation and player three with the lowest. Player 1 will act as
the secondary PU over the other two users (i.e. player 2 and 3). In this figure we used a
random matrix generator to show different reputations when player 1 is interested to share
the spectrum for 80% of the time, player 2 for 50% of the time and player 3 for 8% of the
time only.
Game Theory 34
Fig. 12. Change in players reputation controlled by their Nash equilibrium strategies.
Fig. 13. Changing player reputation over a longer time period.
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 35
7.3.2 Addressing the spectrum to the right user
The simple game theoretic model presented in the previous sections, wherein node
reputation is used as a basis for deciding who will share the offered spectrum, predicts that
it is in every peers best interest to serve others. This includes the nodes that are not
interested to share the spectrum at the current time period. Our simulations support this
behaviour as we found that the total service received by a node is balanced by the total
service that it has to offer to others, as shown in Figure 12.
7.3.3 Addressing the problem of competitive sharing
An important property of the equilibrium emerges from equation (30) that predicts the
probability with which one node will be a secondary PU and it should serve others. If we set
the value of C in away such that, C <<< U (i.e. C can be ignored from equation (6-30)), then
equation (6-30) becomes:
1
1
(1 )
2 (1 )
t
t
R
p
R


=
+
(31)
That would lead us to the conclusion that p < 0.5. Then, Nash equilibrium of the proposed
game predicts that players should help each other less than 50 percent of the time when PU
offers the spectrum. This, although it appears to be very restrictive, is a consequence of the
fact that all nodes are selfish and are better off trying to share the spectrum than serving
others. Intuitively, if a node knows that everyone else in the network behaves selfishly, i.e.,
provide as little service as possible, then the best strategy for the named node cannot be to
serve others most of the time (i.e., with probability greater than 0.5).
7.3.4 Fairness and equal sharing of cost and spectrum
We concluded from the previous section that serving with a priority of less than 50 percent
(i.e. when C <<< U) is an optimal point, the observer can notice that the overall system
efficiency is severely reduced. This is because most of the nodes in the network act selfishly
and at least half of the service requests from other nodes are not fulfilled. On the other hand,
this equilibrium strategy provides fairness in the sense that the cost of system inefficiency is
not burn by a single node (i.e. has one positive side), but it is shared among all nodes. This is
because each nodes request is likely to be turned down by the serving node (i.e. selfish
secondary PU). In this work, we assume that if a nodes request at one node is turned down,
the node tries at some other candidate node capable of serving the request. On average, the
probability that a nodes request is successfully served in a time period is proportional to its
current reputation.
7.3.5 Decreasing for a better share of the spectrum
Figure 14 shows the effects of on the reputation probability of the nodes in the case where
the node is not interested in sharing the spectrum. On the other hand, the node in figure 15
is looking to keep its share of the spectrum (derived from equation (27)).
As can be seen from Figures 14 and 15, a lower value of shifts the reputation probability
curve upwards. However, that all depends on whether the node is interested in using the
offered spectrum or not. If the node is looking to give its share of the spectrum to other
nodes, a low value of will gradually help the node to lose its share, however a high value
of will guarantee a faster release of the spectrum. This is true for Figure 15 as well, which
Game Theory 36
Fig. 14. Players reputation with respect to and the node is not interested in sharing the
offered spectrum.
Fig. 15. Players reputation with respect to and the node is definitely interested in sharing
the offered spectrum from the PU.
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 37
is to be expected since determines how much importance is given to a nodes current
performance as compared to its past service record. A low value of (i.e., giving more
importance to nodes past actions up to the current time period t) means that nodes need to
continually provide service to be able to maintain high reputation and access spectrum
offered from the PU. If however is high, nodes can easily increase their reputation in any
period in which they provide service to other nodes. This is irrespective of how cooperative
they have been in the past with regards to providing service to others. Therefore a simple
way to improve the system efficiency is to set as low as possible.
8. Summery
Cognitive radio is regarded as the key technology for next generation of wireless network.
Dynamic spectrum sharing is one of the most important problems related to Cognitive
Radio networks. Based on the competitive spectrum sharing on game theory, an adaptive
competitive game and auction-based spectrum sharing mechanism is presented in this
chapter. The advantages over the optimal, cooperative and competitive modes have been
proved by simulation. A general solution for the instability problem has been proposed and
an adaptive method is used for the changing number of secondary users by using
cooperative game model when the number of users is small. Another solution to such a
problem is presented by using a non-cooperative game model combined with second-price
auction to choose a secondary primary user. The decision is based on user reputation and
users valuation of the offered spectrum. We have the solution with maximum total profit
and better fairness in spectrum sharing. We have discussed how the increase of competitors
would affects the fairness of spectrum sharing and proved that the proposed mechanism
offers better revenue to the seller and the bidders in terms of fairness.
9. References
[1] FCC, Spectrum Policy Task Force Report, ET Docket No. 02-135, pp. 35-53, November
7, 2002.
[2] S.S. Brave, S.B. Deosarkar, and S.A. Bhople, A Cognitive approach to spectrum sensing
in Virtual Unlicensed wireless network, International Conference on Advances in
Computing, Communication and Control (ICAC309), pp. 668-673, 2009.
[3] M. Buddhikot, Understanding Dynamic Spectrum Allocation: Models, Taxonomy and
Challenges, Proceeding of IEEE DySPAN, 2007.
[4] T. Kamakaris, M. Buddhikot, and R. Iyer, A Case of Coordinated Dynamic Spectrum
Access in Cellular Networks, Proceeding of IEEE DySPAN, 2005.
[5] R. Axelrod, The Evolution of Cooperation, Basic Books, reprint edition, New York,
1984.
[6] C. Ianculescu, and A. Mudra, Cognitive Radio and Dynamic Spectrum Sharing,
Proceeding of the SDR 05 Technical Conferences and Product Exposition, 2005.
[7] C. Jackson, Dynamic Sharing of Radio Spectrum: A Brief History, Book chapter, July
2002. Available on line at:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber =01542659 [Accessed
March 2010].
[8] Federal Communications Commission, Spectrum Policy Task Force Report, Report ET
Docket, no. 02-135, Nov. 2002.
Game Theory 38
[9] 47 CFR 22.501 Scope. Code of Federal Regulations-Title 47: Telecommunications
(December 2005), Available online at:
http://cfr.vlex.com/vid/22-501-scope-19849829 [Accessed March 2010].
[10] A.V. Oppenheim, W.R. Schafer, and R.J. Buck, Discrete-Time Signal Processing,
Prentice-Hall Signal Processing Series, 2
nd
edition, 1999.
[11] R.V. Cox, C.A. Kamm, L.R. Rabiner, J. Schroeter, and J. G.Wilpon, Speech and
Language Processing for Next-Millennium Communications Services, Proceedings
of the IEEE, vol. 88, no. 8, pp. 1314-1337, August 2000.
[12] L. Rabiner, B.H. Juang and M.M. Sondhi, Digital Speech Processing, Encyclopaedia of
Physical Science and Technology, Third Edition, vol. 4, pp. 485-500, 2002.
[13] J.H. McClellan and C.M. Rader, Number Theory in Digital Signal Processing,
Prentice-Hall, Inc., Englewood Cliffs, N.J., 1979.
[14] J.H. McClellan, R.W. Schafer and M.A. Yoder, Signal Processing First: A Multimedia
Approach, Upper Saddle River, NJ: Prentice-Hall, 1998.
[15] B. Schaller, The Origin, Nature, and Implications of "MOORE'S LAW", the Benchmark
of Progress in Semiconductor Electronics, Spring 1996, available online at:
http://research.microsoft.com/en-us/um/people/gray/Moore_Law.html
[Accessed March 2010].
[16] D. Angel, Restructuring for Innovation: The Remaking of the U.S. Semiconductor
Industry, (New York: The Guilford Press), 1994.
[17] E. Braun, and S. Macdonald, Revolution in Miniature: The History and Impact of
Semiconductor Electronics, (Cambridge: Cambridge University Press), 1982.
[18] A. Dorfman, and S. Nancy, Innovation and Market Structure: Lessons from the
Computer and Semiconductor Industries, (Cambridge: Ballinger), 1987.
[19] J. Mitola III, Cognitive Radio: An Integrated Agent Architecture for Software Defined
Radio, PhD Thesis, KTH Royal Institute of Technology, 2000.
[20] R. Axelrod, The Evolution of Cooperation, Basic Books, reprint edition, New York,
1984.
[21] P.K. Dutta, Strategies and Games: Theory and Practice, MIT press, 2001.
[22] S. Haykin, Cognitive Radio: Brain-empowered Wireless Communications, IEEE
Journal in Selected Areas in Communications, vol. 23, pp. 201220, Feb. 2005.
[23] K. Lee and A. Yener, Throughput Enhancing Cooperative Spectrum Sensing Strategies
for Cognitive Radios, in Proceeding of Asilomar Conference on Signals, Systems
and Computers, Pacific Grove, CA, Nov. 2007.
[24] D. Cabric, M. S. Mishra, and R. W. Brodersen, Implementation Issues in Spectrum
Sensing for Cognitive Radios, in Proceeding Asilomar Conference on Signals,
Systems, and Computers, Pacific Grove, CA, Nov. 2004.
[25] W. Zhang and K.B. Letaief, Cooperative Spectrum Sensing With Transmit and Relay
Diversity in Cognitive Networks, IEEE Transactions in Wireless Communications,
vol. 7, pp. 47614766, Dec. 2008.
[26] J. Unnikrishnan and V. Veeravalli, Cooperative Sensing for Primary Detection in
Cognitive Radio, IEEE Journal in Selected Topics in Signal Processing, vol. 2, no. 1,
pp. 1827, Feb. 2008.
[27] B. Wang, K.J.R. Liu, and T. Clancy, Evolutionary Game Framework for Behavior
Dynamics in Cooperative Spectrum Sensing, in Proceeding IEEE Global
Communications Conference, New Orleans, LA, Dec. 2008.
Auction and Game-Based Spectrum Sharing in Cognitive Radio Networks 39
[28] S. Huang, X. Liu, and Z. Ding, Optimal Sensing-Transmission Structure for Dynamic
Spectrum Access, in Proceedings International Conference on Computer
Communications (INFOCOM), Rio de Janeiro, Brazil, Apr. 2009.
[29] M. Maskery, V. Krishnamurthy, and Q. Zhao, Decentralized Dynamic Spectrum
Access for Cognitive Radios: Cooperative Design of a Noncooperative Game, IEEE
Transaction in Communications, vol. 57, no. 2, pp. 459469, Feb. 2009.
[30] S. Subranami, T. Basar, S. Armour, D. Kaleshi, and Z. Fan, Noncooperative
Equilibrium Solutions for Spectrum Access in Distributed Cognitive Radio
Networks, in Proceeding to IEEE DySPAN, Chicago, IL, Oct. 2008.
[31] M. Bloem, T. Alpcan, and T. Basar, A Stackelberg Game for Power Control and
Channel Allocation in Cognitive Radio Networks, in Proceedings ICST/ACM
Game communication Workshop, Nantes, France, Oct. 2007.
[32] S. Sanghavi and B. Hajek, Optimal Allocation of Divisible Good to Strategic Buyers,
The 43rd IEEE Conference on Decision and Control, Paradise Island, Bahamas.
December 2004.
[33] O. Raoof, Z. Al-Banna, and H.S. Al-Raweshidy, Competitive Spectrum Sharing in
Wireless Networks: A Dynamic Non-cooperative Game Approach, Springer.
Boston, vol. 308/2009, pp. 197-207, August 2009.
[34] O. Raoof, Z. Al-Banna, and H.S. Al-Raweshidy, A Dynamic Non-cooperative Game
Approach for Competitive Spectrum Sharing in Wireless Networks, 15th EUNICE
International Workshop - The Internet of the Future. Barcelona, Spain, September
2009.
[35] FCC, Facilitating Opportunities for Flexible, Efficient, and Reliable Spectrum Use
Employing Cognitive Radio Technologies, Notice of Proposed Rulemaking, 03-
322A1, Dec 2003. Available online at:
www.cs.ucdavis.edu/~liu/289I/Material/FCC-03-322A1.pdf
[Accessed March 2010].
[36] Q. Zhao, L. Tong, A. Swami, and Y. Chen, Decentralized Cognitive MAC for
Opportunistic Spectrum Access in Ad Hoc Networks: A POMDP Framework,
IEEE Journal on Selected Areas in Communication (JSAC), April 2007.
[37] C. Peng, H. Zheng, and B. Zhao, Utilization and Fairness in Spectrum Assignment for
Opportunistic Spectrum Access, ACM Journal of Mobile Networks and
Applications (MONET), August 2006.
[38] R. Etkin, A. Parekh, and D. Tse, Spectrum Sharing for Unlicensed Bands, IEEE Journal
on Selected Areas in Communication (JSAC), vol. 25, Iss. 3, pp. 517, April 2007.
[39] T. Alpcan, T. Basar, R. Srikant, and E. Altman, CDMA Uplink Power Control as a
Noncooperative Game, Wireless Networks, vol. 8, pp. 659-670, 2002
[40] J. Huang, R. Berry, and M. Honig, Distributed Interference Compensation for Wireless
Networks, IEEE Journal on Selected Areas in Communication (JSAC), May 2006.
[41] N. Nie, and C. Comaniciu, Adaptive Channel Allocation Spectrum Etiquette for
Congnitive Radio Networks, 1st IEEE Conference on Dynamic Spectrum Access
Network (DySPAN), Nov. 2005.
[42] M. Felegyhazi, M. Cagali, S. Bidokhti, and J. Hubaux, Non-Cooperative Multi-Radio
Channel Allocation in Wireless Networks, 26th IEEE conference on Computer
Communications (INFOCOM), May 2007.
Game Theory 40
[43] I. Hoang, and Y. Liang, Maximizing Spectrum Utilization of Cognitive Radio
Networks Using Channel Allocation and Power Control, IEEE Vehicular
Technology Conference (VTC), Sep. 2006.
[44] T. Basar, and G. Olsder, Dynamic Noncooperative Game Theory, Series in Classics in
Applied Mathematics, SIAM, Philadelphia, 2nd Ed, 1999.
[45] B. Platt, J. Price, and H. Tappen, Pay-to-Bid Auctions, (July 9, 2009), Available at
SSRN: http://ssrn.com/abstract=1432169 [Accessed April 2010].
[46] S. H. Chun, and R.J. La, "Auction Mechanism for Spectrum Allocation and Profit
Sharing, Game Theory for Networks, GameNets '09., pp. 498-507, 13-15, May 2009.
[47] W.W. Sharkey, F. Beltran, and M. Bykowsky, "Computational analysis of an auction for
licensed and unlicensed use of spectrum", Game Theory for Networks, 2009.
GameNets '09, pp. 488-497, 13-15, May 2009.
[48] D. Niyato, and E. Hossain, "Market-Equilibrium, Competitive, and Cooperative Pricing
for Spectrum Sharing in Cognitive Radio Networks: Analysis and Comparison",
IEEE Transactions on Wireless Communications, vol. 7, no. 11, pp. 4273-4283,
November 2008.
[49] D. Niyato, and E. Hossain, "Competitive spectrum sharing in cognitive radio networks:
a dynamic game approach", IEEE Transactions on Wireless Communications, vol. 7,
no. 7, pp. 2651-2660, July 2008.
[50] D. Niyato, E. Hossain, and L. Long, "Competitive Spectrum Sharing and Pricing in
Cognitive Wireless Mesh Networks", in Proceedings of IEEE WCNC 2008, Las
Vegas, NV, 31 March-3, April 2008.
[51] S.L. Chen, W.A. Cui, Y.F. Luo, and X.H. Yang, "Optimal Design of Online Auctions with
Shill Bidding and Open Reserve Price", WiCOM '08. 4th International Conference
on Wireless Communications Networking and Mobile Computing, pp. 1-4, 12-14,
October 2008.
[52] H.J. Parrsch and J. Robert, Testing Equilibrium Behaivour At First-Price, Sealed-Bid
Auctions With Discrete Bid Increments, Scientific series, Montreal, June 2003.
[53] F.M. Menezes and P.K. Monteiro, An Introduction to Auction Theory, Oxford
Scholarship Online, November 2004.
3
Game Theory in Wireless
Ad- hoc Opportunistic Radios
Shahid Mumtaz and Atilio Gameiro,
Institute of Telecommunication
Portugal
1. Introduction
In this chapter we explain how we use game theory application in wireless communication
ad-hoc network. The application of mathematical analysis to the study of wireless
communication ad hoc networks has met with limited success due to the complexity of
mobility, traffic models and the dynamic topology. A scenario based UMTS TDD
opportunistic cellular system with an ad hoc behaviour that operates over UMTS FDD
licensed cellular network is considered. We describe how ad hoc opportunistic radio can be
modeled as a game and how we apply game theory based Power Control in ad-hoc
opportunistic radio
2. Game theory
Game theory is a field of applied mathematics that describes and analyzes interactive
decision situations. It provides analytical tools to predict the outcome of complex
interactions among rational entities, where rationality demands strict adherence to a
strategy based on perceived or measured results. The main areas of application of game
theory are economics, political science, biology and sociology. From the early 1990s,
engineering and computer science have been added to this list. We limit our discussion to
non-cooperative models that address the interaction among individual rational decision
makers. Such models are called games and the rational decision makers are referred to as
players. In the most straightforward approach, players select a single action from a set of
feasible actions. Interaction between the players is represented by the influence that each
player has on the resulting outcome after all players have selected their actions. Each player
evaluates the resulting outcome through a payoff or utility function representing her
objectives.
There are two ways of representing different components (players, actions and payoffs) of a
game: normal or strategic form, and extensive form. Here we will focus on the normal form
representation.
Formally, a normal form of a game G is given by
G = { N, A, {u
i
}} (1)
where N={1,2,...,n} is the set of players (decision makers), A
i
is the action set for player i, A =
A
1
A
2
,..., A
n
is the Cartesian product of the sets of actions available to each player, and
Game Theory 42
{u
i
}={u
1
,..., u
n
} is the set of utility functions that each player i ,wishes to maximize, where u
i
:
AR. For every player i, the utility function is a function of the action chosen by player i, a
i
and the actions chosen by all the players in the game other than player i, denoted as a
-i
.
Together, a
i
and a
-i
make up the action tuple a. An action tuple is a unique choice of actions
by each player. From this model, steady-state conditions known as Nash equilibria can be
identified. Before describing the Nash equilibrium we define the best response of a player as
an action that maximizes her utility function for a given action tuple of the other players.
Mathematically, a is a best response by player i to a
-i
if
a e {arg max u
i
(a
i ,
a
-i
)} (2)
Nash equilibrium (NE) is an action tuple that corresponds to the mutual best response: for
each player i, the action selected is a best response to the actions of all others. Equivalently, a
NE is an action tuple where no individual player can benefit from unilateral deviation.
Formally, the action tuple
a* = (a
1
* , a
2
*, a
3
*,
.. .
, a
n
*) is a NE if u
i
(a
1
* , a
-i
*) (a
1
*, a
-i
*) for all a
i
e A
i
and for all i e N. (3)
The action tuples corresponding to the Nash equilibria are a consistent prediction of the
outcome of the game, in the sense that if all players predict that Nash equilibrium will occur
then no player has any incentive to choose a different strategy. There are issues with using
the Nash equilibrium as a prediction of likely outcomes (for instance, what happens when
multiple such equilibria exist?). There are also refinements to the concept of Nash
equilibrium tailored to certain classes of games. A detailed discussion of these is outside the
scope of this deliverable. There is no guarantee that a Nash equilibrium, when one exists,
will correspond to an efficient or desirable outcome for a game (indeed, sometimes the
opposite is true). Pareto optimality is often used as a measure of the efficiency of an
outcome. An outcome is Pareto optimal if there is no other outcome that makes every player
at least as well off while making at least one player better off.
Mathematically, we can say that an action tuple
a = (a
1
, a
2
, a
3
,..., a
n
) is Pareto optimal if and only if there exists no other action tuple
b = (b
1
, b
2
, b
3
,..., b
n
) such that u
i
(b) (a) for i e N , and
for some k e N u
k
(b ) u
k
(a ).
3. Game theory in wireless communication
There is a significant amount of work in wired and wireless networking that make use of
game theory. The strategic situations in wireless networking the players have to agree on
sharing or providing a common resource in a distributed way, our approach focuses on the
theory of non-cooperative games.
Cooperative games require additional signalization or agreements between the decision
makers and hence a solution based on them might be more difficult to realize. In a non-
cooperative game, there exist a number of decision makers, called players, who have
potentially conflicting interests. In the wireless networking context, the players are the users
or network operators controlling their devices. In compliance with the practice of game
theory, we assume that the players are rational, which means that they try to maximize their
payoffs (or utilities). This assumption of rationality is often questionable, given, for example,
Game Theory in Wireless Ad- hoc Opportunistic Radios 43
the altruistic behaviour of some animals. Herbert A. Simon was the first one was to question
this assumption and introduced the notion of bounded rationality . But, we believe that in
computer networks, most of the interactions can be captured using the concept of
rationality, with the appropriate adjustment of the payoff function. In order to maximize
their payoff, the players act according to their strategies. The strategy of a player can be a
single move or a set of moves during the game.
We take an intuitive top-down approach in the protocol stack to select the examples in
wireless networking as follows. Let us first assume that the time is split into time steps and
each device can make one move in each time step.
In the first game called the Forwarders Dilemma, we assume that there exist two devices as
players, p
1
and p
2
. Each of them wants to send a packet to his destination, dst
1
and dst
2
respectively, in each time step using the other player as a forwarder. We assume that the
communication between a player and his receiver is possible only if the other player
forwards the packet. We show the Forwarders Dilemma scenario in Figure 1. If player p
1
forwards the packet of p
2
, it costs player p
1
a fixed cost 0 < C << 1, which represents the
energy and computation spent for the forwarding action. By doing so, he enables the
communication between p
2
and dst
2
, which gives p
2
a benefit of 1. The payoff is the difference
of the benefit and the cost. We assume that the game is symmetric and the same reasoning
applies to the forwarding move of player p
2
. The dilemma is the following: Each player is
tempted to drop the packet he should forward, as this would save some of his resources; but
if the other player reasons in the same way, then the packet that the first player wanted to
send will also be dropped. They could, however, do better by mutually forwarding each
others packet. Hence the dilemma.
Fig. 1. The network scenario in the Forwarders Dilemma game.
In the second example, called Joint Packet Forwarding Game, we present a scenario, in which a
source src wants to send a packet to his destination dst in each time step. To this end, he
needs both devices p
1
and p
2
to forward for him. Similarly to the previous example, there is a
forwarding cost 0 < C << 1 if a player forwards the packet of the sender. If both players
forward, then they each receive a benefit of 1 (e.g., from the sender or the receiver). We
show this packet forwarding scenario in Figure 2.
The third example, called Multiple Access Game, introduces the problem of medium access.
Suppose that there are two players p
1
and p
2
who want to access a shared communication
channel to send some packets to their receivers re
1
and re
2
. We assume that each player has
one packet to send in each time step and he can decide to access the channel to transmit it or
to wait. Furthermore, let us assume that p
1
, p
2
, re
1
and re
2
are in the power range of each
Game Theory 44
Fig. 2. The Joint Packet Forwarding Game.
other, hence their transmissions mutually interfere. If player p
1
transmits his packet, it incurs
a sending cost of 0 < C << 1. The packet is successfully transmitted if p
2
waits in that given
time step (i.e., he does not transmit), otherwise there is a collision. If there is no collision,
player p
1
gets a benefit of 1 from the successful packet transmission. The framework
presented by Cagalj et al. in is a generalized version of the Multiple Access Game.
In the last example, called the Jamming Game, we assume that player p
1
wants to send a
packet in each time step to a receiver re
1
. In this example, we assume that the wireless
medium is split into two channels x and y according to the Frequency Division Multiple
Access (FDMA) principle. The objective of the malicious player p
2
is to prevent player p
1
from
a successful transmission by transmitting on the same channel in the given time step. In
wireless communication, this is called jamming. Clearly, the objective of p
1
is to succeed in
spite of the presence of p
2
. Accordingly, he receives a payoff of 1 if the attacker cannot jam
his transmission and he receives a payoff of -1 if the attacker jams his packet. The payoffs for
the attacker p
2
are the opposite of those of
player p
1
. We assume that p
1
and re
1
are synchronized, which means that re
1
can always
receive the packet, unless it is destroyed by the malicious player p
2
. Note that we neglect the
transmission cost C, since it applies to each payoff (i.e., the payoffs would be 1-C and -1-C)
and does not change the conclusions drawn from this game.
The Jamming Game models the simplified version of a game-theoretic problem presented by
Zander .We deliberately chose these examples to represent a wide range of problems over
different protocol layers (as shown in Figure 3). There are indeed fundamental differences
between these games as follows. The Forwarders Dilemma is a symmetric nonzero-sum game,
because the players can mutually increase their payoffs by cooperating (i.e., from zero to 1-C).
The conflict of interest is that they have to provide the packet forwarding service for each
other. Similarly, the players have to establish the packet forwarding service in the Joint Packet
Forwarding Game, but they are not in a symmetric situation anymore. The Multiple Access
Game is also a nonzero-sum game, but the players have to share a common resource, the
wireless medium, instead of providing it. Finally, the Jamming Game is a zero-sum game
because the gain of one player represents the loss of the other player. These properties lead to
different games and hence to different strategic analyses.
3.1 Cognitive radio
In information times, the increase of wireless equipments makes the spectrum to be the most
essential and important resources. Now the wireless networks are regulated by a fixed
spectrum assignment policy. However, according to Federal Communications Commission
(FCC), a large portion of the assigned spectrum is used sporadically and geographically, so
the serious problem is the inefficiency usage. This restriction of the tradition spectrum
policy necessitates a new technology to exploit the spectrum available opportunities which
is calledcognitive radio.
Game Theory in Wireless Ad- hoc Opportunistic Radios 45
Fig. 3. The classification of the examples according to protocol layers.
A cognitive radio is a radio that can change its transmitter parameters base on interaction
with the environment in which it operates . It is characterized by cognitive capability and
reconfigurability. The cognitive capability refers to the capture and sense of the information
from the radio environment by monitoring the power and
capturing the temporal and spatial variations. The reconfigurability enables the radio to be
dynamically programmed by the radio knowledge representation language (RKRL) to select
the best spectrum and appropriate operating parameters. Therefore, the cognitive radio can
enhance the flexibility through the cognitive cycle, which has three main steps: radio-scene
analysis, channel state estimation and predictive modeling, transmit power control and
spectrum management . The cognitive cycle is pictured in Figure 4.
Fig. 4. Basic cognitive cycle
Game Theory 46
Transmit-power control is necessary for the cognitive radio system to broaden the scope of
its applications and enhance the performance. It would have to operate under two
limitations on network resources: the interference temperature limit imposed by regulatory
agencies, and the availability of a limited number of spectrum holes depending on usage. In
a multiuser cognitive radio environment, all the users operate in a decentralized manner;
they are characterized by cooperation and competition. In such a case, information theory
and game theory could be applied to exercise control over the transmit power.
4. Game theory in wireless ad- hoc opportunistic radios
Wireless communications play a very important role in military networks and networks for
crisis management, which are characterised by their ad hoc heterogeneous structure. An
example of a future network can be seen in Figure 5. This illustrates a range of future
wireless ad hoc applications. In the heterogeneous ad hoc network, it is difficult to develop
plans that will cope with every eventuality, particularly hostile threats, due to the
temporary nature. Thus, dynamic management of such networks represents the ideal
situation where the new emerging fields of cognitive networking and cognitive radio can
play a part. Here we assume a cognitive radio is a radio that can change its transmitter
parameters based on interaction with the environment where it operates, and additionally
relevant here is the radios ability to look for, and intelligently assign spectrum holes on a
dynamic basis from within primarily assigned spectral allocations. The detecting of holes
and the subsequent use of the unoccupied spectrum is referred to as opportunistic use of the
spectrum. An Opportunistic Radio (OR) is the term used to describe a radio that is capable
of such operation .We use the opportunistic radio system which was proposed that shares
the spectrum with an UMTS cellular network. This is motivated by the fact that UMTS radio
frequency spectrum has become, in a significant number of countries, a very expensive
commodity, and therefore the opportunistic use of these bands could be one way for the
owners of the licenses to make extra revenue.
The OR system exploits the UMTS UL bands, therefore, the victim device is the UMTS base
station, likely far from the opportunistic radio, whose creates local opportunities. These
potential opportunities in UMTS FDD UL bands are in line with the interference
temperature metric proposed by the FCC s Spectrum Policy Task Force. The interference
temperature model manages interference at the receiver through the interference
temperature limit, which is represented by the amount of new interference that the receiver
could tolerate. As long as OR users do not exceed this limit by their transmissions, they can
use this spectrum band. However, handling interference is the main challenge in CDMA
networks, therefore, the interference temperature concept should be applied in UMTS
licensed bands in a very careful way.
The UMTS is a DS-CDMA system, thus all users transmit the information spreaded over 5
MHz bandwidth at the same time and therefore users interfere with one another. Figure 6
shows a typical UMTS FDD paired frequencies. The asymmetric load creates spectrum
opportunities in UL bands since the interference temperature (amount of new interference
that the UMTS BS can tolerate) is not reached.
In order to fully exploit the unused radio resources in UMTS, the OR network should be
able to detect the vacant channelization codes using a classification technique. Thus the OR
network could communicate using the remaining spreading codes which are orthogonal to
the used by the UMTS network. However, classify and identify CDMAs codes is a very
computational intensive task for real time applications.
Game Theory in Wireless Ad- hoc Opportunistic Radios 47
Fig. 5. Ad-hoc future network
MHz
Original Noise floor
1
2
Interference
temperature
Original Noise floor
1
2
SF
UMTS UL band UMTS DL band
1920 1925
2110
2115
Fig. 6. UMTS FDD spectrum bands with asymmetric load
Moreover, synchronization between UMTS UL signals and the OR signals to keep the
ortogonality between codes will be a difficult problem. Our approach is to fill part of the
available interference temperature raising the noise level above the original noise floor. This
rise is caused by the OR network activity, which aggregated signal is considered AWGN
(e.g CDMA, MC-CDMA, OFDM).We consider a scenario where the regulator allows a
secondary cellular system over primary cellular networks. Therefore we consider
opportunistic radios entities as secondary users. The secondary opportunistic radio system
can use the licensed spectrum provided they do not cause harmful interference to the
owners of the licensed bands i.e., the cellular operators.Specifically we consider as a primary
cellular network an UMTS system and as secondary networks an ad hoc network with extra
sensing features and able to switch its carrier frequency to UMTS FDD frequencies. Figure 7
illustrates the scenario where an opportunistic radio network operates within an UMTS
cellular system.
We consider an ah hoc OR network of M nodes operating overlapped to the UMTS FDD cell.
The OR network acts as a secondary system that exploit opportunities in UMTS UL bands.
The OR network has an opportunity management entity which computes the maximum
allowable transmit power for each OR node in order to not disturb the UMTS BS.
Game Theory 48
UMTS FDD
Node B OR
OR
OR
UL
band
I
n
t
e
rf
e
r
e
n
c
e
(
U
L
)
S
e
n
s
in
g
(
D
L
)
C
P
IC
H
+
C
C
P
C
H
UMTS
Fig. 7. Ad hoc ORs networks operating in a licensed UMTS UL band
4.1 The opportunities network with ad-hoc topology
The opportunistic network, showed in Figure 8, will interface with the link level simulator
through LUTs.
Fig. 8. Block diagram of the system level platform
Game Theory in Wireless Ad- hoc Opportunistic Radios 49
The propagation models developed for the UMTS FDD network will be reused, and the
entire channel losses (slow and fast fading) computed. The outputs will be the parameters
that usually characterize packet transmissions: Throughput, BLER and Packet Delay. The
LUT sensing algorithm characterization block contains the cyclostationary detectors
performance, i.e. the output detection statistic, d, as a function of the SNR measured at the
sensing antenna for different observation times [6].The sensing OR-UMTS path loss block
estimates the path loss between UMTS BS and the OR location through the difference
between the transmitted power and the estimated power given by cyclostationary detector
(LUT sensing algorithm characterization block output). The OR traffic generation block
contains real and non-real time service traffic models. OR QoS block defines the minimum
data rate, the maximum bit error rate and the maximum transmission delay for each service
class. The non-interference rule block compute the maximum allowable transmit power
without disturbing the UMTS BS applying a simple non-interference rule (according to
policy requirements).In the following, we briefly explain the opportunistic network blocks
that was designed and implemented, using a C++ design methodology approach.
First of all, we assume that the OR knows a priori the UMTS carrier frequencies and
bandwidths, which has been isolated and brought to the baseband. In order to get the
maximum allowable power for OR communications the OR nodes need to estimate the path
loss from its location to the UMTS BS, i.e., the victim device. The opportunistic user is
interested in predefined services which should be available every time. This motivates the
proposal of defining a set of usable radio front end parameters in order to support the
demanded services classes under different channel conditions. Basically, at the beginning of
each time step the opportunistic radio requires certain QoS guarantees including certain
rate, delay and minimum interference to the primary user (non interference rule policy).
The opportunistic network has an opportunity management entity which computes the
maximum allowable transmit power for each opportunistic node in order the aggregated
interference do not disturb the UMTS BS. The aggregated transmit power allowed to the
opportunistic network can be computed using a simple non-interference rule

( ) ( )
10 10 10
1
10log 10 10log 10 10
OR OR BS
P k G G Lp k Nth Nth
K
k
+ + +
=
| |
| |
|
|
s I
| |
\ .
\ .
_
(4)
Where G
OR
is the OR antenna gain, G
BS
is the UMTS BS antenna gain, Lp is the estimated
path loss between the OR node and the UMTS BS, K is the Number of ORs, performed by a
sensing algorithm, and Nth is the thermal noise floor. is a margin of tolerable extra
interference that, by a policy decision, the UMTS BS can bear. Finally, is a safety factor to
compensate shadow fading and sensing s impairments. Notice if the margin of tolerable
interference =0 the OR must be silent. I is a safety factor margin (e.g. 6-10 dB) to compensate the
mismatch between the downlink and uplink shadow fading and others sensings impairments. The
margin of tolerable interference is defined according to policy requirements.
Employing scheduling algorithms, we can provide a good tradeoff between maximizing
capacity, satisfying delay constraint, achieving fairness and mitigating interference to the
primary user. In order to satisfy the individual QoS constraints of the opportunistic radios,
scheduling algorithms that allow the best user to access the channel based on the individual
priorities of the opportunistic radios, including interference mitigation, have to be
considered. The objective of the scheduling rules is to achieve the following goals:
Game Theory 50
- Maximize the capacity;
- Satisfy the time delay guarantees;
- Achieve fairness;
- Minimize the interference caused by the opportunistic radios to the primary user.
A power control solution is required to maximize the energy efficiency of the opportunistic
radio network, which operates simultaneously in the same frequency band with an UMTS
UL system. Power control is only applied to address the non-intrusion to the services of the
primary users, but not the QoS of the opportunistic users.
A distributed power control implementation which only uses local information to make a
control decision is of our particular interest. Note that each opportunistic user only needs to
know its own received SINR at its designated receiver to update its transmission power. The
fundamental concept of the interference temperature model is to avoid raising the average
interference power for some frequency range over some limit. However, if either the current
interference environment or the transmitted underlay signal is particularly non uniform, the
maximum interference power could be particularly high.
Following we are going to explain why we consider Ad-hoc topology for the opportunistic
radio system in cellular scenario.Mobile ad-hoc network is an autonomous system of mobile
nodes connected by wireless links; each node operates as an end system and a router for all
other nodes in the network. Mobile ad-hoc network fits for opportunistic radio because the
following features:
Infrastructure
MANET can operate in the absence of any fixed infrastructure. They offer quick and easy
network deployment in situations where it is not possible. Nodes in mobile ad-hoc network
are free to move and organize themselves in an arbitrary fashion. This scenario is fit in the
Opportunities in UMTS bands which are local and may change with OR nodes movement
and UMTS terminals activity.
Dynamic Topologies
Ad hoc networks have a limited wireless transmission range. The network topology which
is typically multi-hop may change randomly and rapidly at unpredictable times, and may
consist of both bidirectional and unidirectional links which fits the typical short range
opportunities which operate on different links in UMTS UL bands.
Energy-constrained operation
Some or all of the nodes in a MANET may rely on batteries or other exhaustible means for
their energy. For these nodes, the most important system design criteria for optimization of
energy conservation. This power control mechanisms for energy conversion (power battery)
also helps to avoid harmful interference with the UMTS BS.
Reconfiguration
Mobile ad-hoc networks can turn the dream of getting connected "anywhere and at any
time" into reality. Typical application examples include a disaster recovery or a military
operation. As an example, we can imagine a group of peoples with laptops, in a business
meeting at a place where no network services is present. They can easily network their
machines by forming an ad-hoc network. In our scenario OR network reconfigure itself, as
the interference coming from licensed users (PUs) causes some links being dropped. Ad hoc
multi hop transmission allows decreases the amount of the ORs transmitted power and
simultaneously decreases the interference with the UMTS BS.
Game Theory in Wireless Ad- hoc Opportunistic Radios 51
Bandwidth-constrained, variable capacity links
Wireless links will continue to have significantly lower capacity. In addition, the realized
throughput of wireless communications after accounting for the effects of multiple access,
fading, noise, and interference conditions, etc. is often much less than a radio's maximum
transmission rate This constrained also fit in our scenario where maximum transmission rate
of ORs is less than the UMTS base station after the effects of multiple access, fading, noise
and interference conditions.
Security
Mobile wireless networks are generally more prone to physical security threats than are
fixed cable nets. The increased possibility of eavesdropping, spoofing, and denial-of-service
attacks should be carefully considered. Existing link security techniques are often applied
within wireless networks to reduce security threats. As a benefit, the decentralized nature of
network control in MANETs provides additional robustness against the single points of
failure of more centralized approaches. By using this property of MANETs, we avoid single
point failure in ORs.
4.2 Co-existence analysis of single opportunities Radio link
We consider the simplest case where a single OR link operates within a UMTS FDD cell.
Simulations were carried out to compute the coexistence analysis between the OR link and
the UMTS network. The main parameters used for the simulations are summarized in Table
1. We consider an omnidirectional cell with a radius of 2000 meters. Each available
frequency, in a maximum of 12, contains 64 primary user terminals. Each of these primary
users receives the same power from the UMTS base station (perfect power control). We
assume the primary users data rate equal to 12.2 kbps (voice call); the E
b
/N
o
target for 12.2
kbps is 9 dB. Thus, and since the UMTS receiver bandwidth is 3840 kHz, the signal to
interference ratio required for the primary users is sensibly -16 dB. There is (minimum one)
opportunistic radio in the cell coverage area, which has a transmitted power range from -44
to 10 dBm. The opportunistic radio duration call is equal to 90 seconds. We furthermore
consider load characteristics.
Simulation results for a single UMTS frequency
In order to calculate Cumulative Distribution Function (CDF) for the interference at UMTS
BS we consider 64 UMTS licensed UMTS terminals in each cell (with radius equal to R= 2000
m), as shown in the following Figure 9. The OR receiver gets interference from the PUs
located in the central UMTS cell and in 6 adjacent cells. The ORs are within an ad-hoc
network service area (with radius equal to R= 100 m); the OR receiver is 10 m away from the
OR transmitter. The OR transmitter is constrained by the non-interference rule.
Based on the capacitys Shannon formula, the ORs link capacity that can be achieved
between two OR nodes is given by:
2 _
2
log 1
OR Tx
Mbps
UMTS
L P
C B
Nth I
| |
= +
|
|
+
\ .

5 MHz
107 dBm
B
Nth
=
=
(4)
Where B=5 MHz, L
2
is the path loss between the OR_Tx and the OR_Rx, Nth is the average
thermal noise power and I
UMTS
is the amount of interference that the UMTS terminals cause
on the OR_Rx. On the other hand, the total interference at the UMTS BS caused by the OR
activity can not be higher than the UMTS BS interference limit, -116 dBm.
Game Theory 52
Parameter Name Value
UMTS system
Time transmission interval (T
ti
) 2 ms
Cell type Omni
Cell radius 2000 m
Radio Resource Management
Nominal bandwidth (W) 5 MHz
Maximum number of available frequencies (N
[max]
) 12
Data rate (R
b
) 12.2 kbps
E
b
/N
o
target 9 dB
SIR target () -16 dB
Spreading factor 16
Spectral noise density (N
o
) -174 dBm/Hz
Step size PC Perf. power ctrl
Channel Model Urban
Carrier frequency 2 GHz
Shadowing standard deviation (o) 8 dB
Decorrelation length (D) 50 m
Channel model ITU vehicular A
Mobile terminals velocity 30 km/h
Primary User (PU)
Number of primary user(s) terminals per cell/frequency (K) 64
Sensibility/Power received -117 dBm
UMTS BS antenna gain 16 dBi
Noise figure 9 dB
Orthogonally factor 0
Opportunistic Radio (OR)
Number of opportunistic radio(s) in the cell coverage area 2
Maximum/Minimum power transmitted (P
o [max/min]
) 10/-44 dBm
Antenna gain 0 dBi
Duration call 90 s
Table 1. Main parameters used for the simulations
Game Theory in Wireless Ad- hoc Opportunistic Radios 53
Cell limit
Ad-hoc network
service area
UL band
OR Rx OR Tx
UMTS BS
UMTS PU1
1
0
0
m
10 m
2
0
0
0
m
UMTS PU2
UMTS PUi
UMTS PU64
d
Fig. 9. Ad-hoc Single Link scenario
The following Figure 10 shows the CDF of the interference computed at the UMTS BS due
the OR network activity. The results show that an 8 Mbps ORs link capacity is guaranteed
for approximately 98% of the time without exceeding the UMTS BS interference limit (-116
dBm). However, this percentage decreases to 60% when an OR link with 32 Mbps is
established identical in every UMTS cellular system and the frequencies are close enough so
that the same statistical models apply.
Fig. 10. Interference at UMTS BS
Game Theory 54
5. Game theory in opportunitics radio
A wireless ad hoc network is characterized by a distributed, dynamic, self-organizing
architecture. Each node in the network is capable of independently adapting its operation
based on the current environment according to predetermined algorithms and protocols. So,
we are choosing analytical models to evaluate the performance of ad hoc networks with
opportunists radio access have been scarce due to the distributed and dynamic nature of
such networks. Game theory offers a suite of tools that may be used effectively in modeling
the interaction among independent OR nodes in an ad hoc network. We are choosing
analytical models to evaluate the performance of ad hoc networks with opportunists radio
access have been scarce due to the distributed and dynamic nature of such networks. Game
theory offers a suite of tools that may be used effectively in modeling the interaction among
independent OR nodes in an ad hoc network.
For over a decade, game theory has been used as a tool to study different aspects of
computer and telecommunication networks, primarily as applied to problems in traditional
wired networks. In the past three to four years there has been renewed interest in
developing networking games, this time to analyze the performance of wireless ad hoc
networks (ORs). Since the game theoretic models developed for ad hoc networks focus on
distributed systems, results and conclusions generalize well as the number of players (ORs)
is increased. It is also of interest to investigate how selfish behavior by individual nodes
(ORs) may affect the performance of the UMTS system as a whole.In a game, players (ORs)
are independent decision makers whose payoffs depend on other players (OR) actions.
Nodes (OR) in an ad hoc network are characterized by the same feature. This similarity
leads to a strong mapping between traditional game theory components and elements of an
ad hoc network. Table 2 shows typical components of an ad hoc networking game. Game
theory can be applied to the modeling of an ad hoc network at the physical layer
(distributed power control), link layer (medium access control) and network layer (packet
forwarding). Applications at the transport layer and above exist also, although less
pervasive in the literature. A question of interest in all those cases is that of how to provide
the appropriate incentives to discourage selfish behavior. Selfishness is generally
detrimental to overall network performance; examples include a nodes increasing its power
without regard for interference it may cause on its neighbors (layer 1), a nodes immediately
retransmitting a frame in case of collisions without going through a backoff phase (layer 2),
or a nodes refusing to forward packets for its neighbours (layer 3).
Components of a game Elements of an ad hoc network
Players Nodes in the network
Strategy
Action related to the functionality
Being studies(e.g. the decision to forward packets or not, the
setting of power level, the selection of
waveform/modulation scheme)
Utility function
Performance metrics(e.g. throughput, delay, target signal-to
noise ratio)
Table 2. Typical mapping of ad hoc network components to a game
Game Theory in Wireless Ad- hoc Opportunistic Radios 55
5.1 Using game theory as power control
Transmit-power control is necessary for the opportunistic radio system to broaden the scope
of its applications and enhance the performance. It would have to operate under two
limitations on network resources: the interference temperature limit imposed by regulatory
agencies, and the availability of a limited number of spectrum holes depending on usage. In
a multiuser opportunistic radio (ORs) environment, all the users operate in a decentralized
manner; they are characterized by cooperation and competition. In such a case, game theory
could be applied to exercise control over the transmit power. Distributed power control may
be adopted by a node (OR). From a physical layer perspective, performance is generally a
function of the effective signal-to-interference-plus-noise ratio (SINR) at the node(s) of
interest. When the nodes in a network respond to changes in perceived SINR by adapting
their signal, a physical layer interactive decision making process occurs. This signal
adaptation can occur in the transmit power level and the signaling waveform (modulation,
frequency, and bandwidth). The exact structure of this adaptation is also impacted by a
variety of factors not directly controllable at the physical layer, including environmental
path losses and the processing capabilities of the node(s) of interest. A game theoretic model
for physical layer adaptations can be formed using the parameters listed in Table 3.
From Table 2 , the stage game for interactive physical layer adaptations can be modeled as
G = { N, { P
j

j
},{u
j
(P, , H ) } (5)
Symbol Meaning Symbol Meaning
N The set of decision making nodes
in the network;{1,2,n}
P The power space (R
n
)
formed from the Cartesian
product of all P
j
P =P
1
P
2
... P
n
hij The link gain from node i to j .
Note this may be the function of
waveform selected
p A power profile vector)
fromP formed as p =(
p
1
,p
2
,...p
n
)
j
O The set of waveform know
by node j.
j
e A waveform chosen by j
from
j
O
H The network link gain matrix.
12 13 1
21
31
1 2
1
1
1
n
n n
h h h
h
h H
h h
(
(
(
(
=
(
(
(

"
#
#
# % #
O The waveform space
formed by the Cartesian
product of all
j
O .
O =
j N e

j
O
P
j
The set of power level available to
node j. This is presumde to be a
subset of real number line.
e A waveform profile
(vector) from O formed as
e =
1, 2
, , ...
n
e e e
j
p A power level chosen by j from P
j. ( ) , ,
j
u p H e The utility derived by j.
Table 3. Game theoretic model for OR ad hoc networks
Game Theory 56
For a general game, each OR node, j, selects a power level, p
j
, and a waveform,
j
, based on
its current observations and decision making process. Distributed power control systems
permit each OR radio to select p
j
, but restrict
j
to a singleton set; distributed waveform
adaptation systems (interference avoidance) restrict the choice of p
j
, but allow
j
to be
chosen by the physical layer.
Power control, though closely associated with cellular networks and is implemented in OR
ad hoc network that operated in the same bands that the primary user UMTS system We
now model the power control algorithm suggested as a normal form game. Note that a
similar approach can be followed to model the other distributed algorithms as games, with
each game involving a different utility function. We adopt the notation in Table 3.For most
game models, the game theoretic equivalent of a distributed algorithms steady state is a
Nash equilibrium (NE). An action vector (or alternative vector) a is said to be a NE if equation
(1) is satisfied.
u
i
(a) u
i
(b
i ,
a
-i
) i e N, b
i
e N (6)
Consider a DS-CDMA system with a centralized receiver where all OR nodes other than the
centralized receiver are adjusting their transmitted power levels in an attempt to maximize
their signal-to interference- plus-noise ratio (SINR) as measured at the receiver. Here our set
of players are the OR nodes (other than the centralized receiver); the action sets are the
available power levels (presumably a finite number of power levels) all OR players utility
functions are given by equation (7)
\
(p) / ((1/K) )
i i k k
j N i
u h p h p
i
o
e
= +
_
(7)
where p
i
is the transmitted power of node i, K is the statistical estimate of the spreading
factor, h
i
is the gain from a node to the receiver, and is the noise at the receiver.
0 50 100 150 200 250 300 350 400 450 500
-20
-10
0
10
iteration
U
l
i
t
i
y

F
u
n
c
t
i
o
n

V
a
l
u
e
0 50 100 150 200 250 300 350 400 450 500
-50
-40
-30
-20
-10
iteration
P
o
w
e
r

(
d
B
m
)
OR1
OR2
OR3
OR1
OR2
OR3
Fig. 11. 3 OR node closer to the UMTS system
Game Theory in Wireless Ad- hoc Opportunistic Radios 57
As would be indicated by intuition, the unique Nash equilibrium for this game is the power
vector where all OR nodes transmit at maximum power. This is an undesirable outcome as
(6) capacity is greatly diminished due to near-far problems (unless the nodes are all at the
same radius from the receiver as shown in the Figure 11 and Figure 12 where OR node are
closer and far away from the UMTS system), equation (2) the resulting SINRs are unfairly
distributed (the closest node will have a far superior SINR(as shown in the Figure 11) to the
furthest node(as shown in the Figure 11 and (12) battery life would be greatly shortened.
However, this outcome is Pareto optimal as any more equitable power allocation will reduce
the utility of the closest node, and any less equitable allocation will reduce the utility of the
disadvantaged nodes. In this scenario Pareto optimality actually misleads the analyst with
respect to the desirability of the outcome.
0 50 100 150 200 250 300 350 400 450 500
-20
-10
0
10
iteration
U
l
i
t
i
y

F
u
n
c
t
i
o
n

V
a
l
u
e
0 50 100 150 200 250 300 350 400 450 500
-50
-40
-30
-20
-10
iteration
P
o
w
e
r

(
d
B
m
)
OR1
OR2
OR3
OR4
OR5
OR1
OR2
OR3
OR4
OR5
Fig. 12. 5 OR node far away to the UMTS system
6. Conclusion
Emerging research in game theory based power control applied to ad hoc opportunist
networks shows much promise to help understand the complex interactions between OR
nodes in this highly dynamic and distributed environment. Also, the employment of game
theory in modeling dynamic situations for opportunist ad hoc networks where OR nodes
have incomplete information has led to the application of largely unexplored games such as
games of imperfect monitoring. Ad hoc security using game theory is the future area of
research in ORs we have considered an ah hoc behavior in the opportunists radio (ORs) and
suggested that by implementing ah hoc features in the ORs will improve the overall
performance of system.
Game Theory 58
7. References
H. A. Simon, The Sciences of the Artificial. MIT Press, 1969.
R. Axelrod, The Evolution of Cooperation. New York: Basic Books, 1982
R. Gibbons, A Primer in Game Theory. Prentice Hall, 1992.
D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991
M. J. Osborne and A. Rubinstein, A Course in Game Theory. Cambridge, MA: The MIT Press,
1994.
M. Cagalj, S. Ganeriwal, I. Aad, and J.-P. Hubaux, On selfish behaviour in CSMA/CA
networks, in Proceedings of the IEEE Conference on Computer Communications
(INFOCOM 05), Miami, USA, Mar. 13-17 2005.
T. S. Rappaport, Wireless Communications: Principles and Practice (2
nd
Edition). Prentice Hall,
2002
M. Schwartz, Mobile Wireless Communications. Cambridge Univ.Press, 2005.
J. Zander, Jamming in slotted ALOHA multihop packet radio networks, IEEE Transactions
on Communications (ToC), vol. 39, no. 10, pp. 15251531, Oct. 1991.
Mitola J. Cognitive radio: An integrated agent architecture for software defined radio, In:
the Dissertation for Doctor of Technology, Royal Institute
Haykin S. Cognitive Radio: Brain-Empowered Wireless Communications. IEEE Journal
FCC, "ET Docket No 03-222 Notice of proposed rule making and order, December 2003.
Mitola J. Cognitive Radio for Flexible Multimedia Communications, MoMuC 99, pp. 3 10,
1999.
Marques, P. Opportunistic use of 3G uplink Licensed Bands Communications, 2008. ICC
IEEE International Conference on May 2008.
FCC, ET Docket No 03-237 Notice of inquiry and notice of proposed Rulemaking, ET Docket
No. 03- 237, 2003.
C. Huang and A. Polydoros, Likelihood methods for MPSK modulation classification , IEEE
Transaction on Communications, vol 43, 1995
P. Marques et. al., Procedures and performance results for interference computation and
sensing, IST-ORACLE project report D2.5, May 2008
S. Agarwal, S. Krishnamurthy, R. Katz, and S. Dao, Distributed power control in ad-
hoc wireless networks, Intl. Symposium Personal, Indoor and Mobile Radio
Communications, 2001, pp. F-59-F-66
4
Reliable Aggregation Routing for Wireless
Sensor Networks based on Game Theory
Qiming Huang, Xiao Liu and Chao Guo
University of Science and Technology Beijing
Communication Engineering Department
China
1. Introduction
Wireless integrated sensor networks, which include collecting, managing data and
communication, are used more and more widely for their low cost and convenient
deployment. Nowadays the research concerning each aspect of sensor networks is fairly
active. Data Aggregation mechanism is one of the key problems in sensor networks. By
considering the data transmission delay and overall network energy efficiency, this chapter
develops a game-theoretic model of real-time reliable aggregation (RA-G) mechanism for
wireless sensor networks.
Based on the study of related literatures, first of all in this chapter, the research status of
WSN, the system architecture, the characteristics, and the critical technologies are
summarized, current typical routing algorithms of WSN are classified and introduced one
by one. Taking the implicit collaborative imperative for sensors to achieve overall network
objectives (accomplish real-time collection tasks effectively) subject to individual resource
consumption into account, this paper proposes a game-theoretic model of reliable data
aggregation architecture in wireless sensor networks, defines a multi-tier data aggregation
architecture in which semantic based aggregation and average computation aggregation is
performed in sensor-level and node-level aggregation respectively. All nodes that detect the
same target join the same logic group. Each selected group leader uses game-theoretic
model which tradeoffs between energy dissipation and data transmission delay to
determine the degree of aggregation. To meet the real-time constraints and balance the
energy consumption between nodes, a decision-making model based on game theory which
takes delay compensation into account is proposed in the data-relaying stage.
The simulation results show that the use of reliable data aggregation architecture can reduce
the total transmission overhead of WSN, make the network more energy-efficient and
prolong the lifetime of sensor network. On the other hand, the game-theoretic model used in
group-level aggregation and data-relaying stage balance the tradeoffs between the energy
dissipation and the timeliness of data transmission; therefore, also RA-G data aggregation
mechanism is reliable.
2. Wireless sensor networks
Wireless sensor network is a data-centric wireless self-organizing network [1] consisting of a
large number of integrated sensors, data processing unit, as well as short-distance wireless
Game Theory 60
communication module. From the 21st century, sensor networks attracted academic,
military and industry with great concern. The United States and Europe have launched a lot
of research programs about wireless sensor networks and obtain the corresponding
progress. The development of specific communication protocols and routing algorithm is
the first issue of current field of wireless sensor networks need to be resolved.
2.1 Wireless sensor network architecture
The architecture of Wireless sensor network is shown in Figure 1.1 [2], wireless sensor
network systems often include sensor nodes, Sink gateway nodes and the management
nodes. A large number of sensor nodes deploy randomly inside of or near the monitoring
area (sensor field), having ability of compositing networks through self-organization. Sensor
nodes monitor the collected data to transmit along other sensor nodes by-hop. During the
process of transmission, monitored data may be handled by multiple nodes, get to Sink
gateway node after a multi-hop routing, and finally reach the management node through
the Internet or satellite. The user configures and manages the wireless sensor network with
the management node, publish monitoring missions and collect monitoring data.
Fig. 1.1 Wireless Sensor Network Architecture
Sensor node is usually a tiny embedded system. Its processing power, storage capacity and
communications capability is relatively weak, and the energy limited by carrying batteries.
Sensor node consists of four parts [3] which are the sensor modules, processor modules,
wireless communication module and power supply modules. Sensor module is responsible
for the collection of information and the conversion of data in the area of monitoring;
processor module responsible for controlling the operation of the sensor nodes, storage and
processing their own collected data and the data sent by other nodes; wireless
communication module is responsible for communicating wireless with other sensor nodes,
exchanging controlled information, and sending and receiving collected data ; energy
supply module provide the energy required to run for the sensor nodes, usually with a
miniature battery.
Internet or
lli
Monitoring Sensor node
Management
Base station
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 61
Sensor nodes will be constricted by the limited supply of energy, communications capacity,
computing and storage capacity, when achieving a variety of network protocols and
applications. The features of sensor network are as follow:
1. Large-scale network [1, 2];
2. Self-organizing network [4];
3. Dynamic nature of networks;
4. Reliable network;
5. The application-specific networks;
6. The data-centric network [1, 2].
As a new research hot spot of information today, wireless sensor networks involve
interdisciplinary field of study, and there are a lot of key technologies and researches to be
found. The following list only some of the key technologies [1, 3, 5].
1. Network topology control. A good network topology generated automatically by
topology control, is able to improve the routing protocol and the efficiency of MAC
protocol and lay the foundation for many aspects such as data fusion, time
synchronization and targeting, which will help to save the nodes and energy to extend
the survival period of network. Therefore, the topology control is one of the core
technology researches in wireless sensor networks.
2. Network protocol. Sensor network protocol is responsible for making all the
independent nodes form a multi-hop data transmission network. The current study
focused on network-layer protocols and data link layer protocol. Network layer routing
protocols determine the transmission path of monitoring information; media access
control of data link layer used to build the underlying infrastructure and control the
communication process and work style for sensor nodes .
3. Network security. Ensuring the confidentiality of implementing the mandate, the
reliability of data generation, the efficiency of data fusion and the security of data
transmission is content which security issues in wireless sensor networks need to take
full account of.
4. The time synchronization. Time synchronization is a key mechanism of sensor network
systems needed to work together.
5. Location technology. Location information of sensor node is an integral part of the
collected data. Determining the location of the incident or the node position of data
collected is the most basic functions of sensor networks. Positioning mechanism must
satisfy the self-organization, robustness, capacity-efficient, distributed computing
requirements.
6. Data fusion. Sensor networks are constrained by energy. Reducing the amount of data
can save energy effectively. Therefore in the process of collecting data from various
sensor nodes, we can use computing and storage capacity of the local nodes to deal
with the integration of data and to remove redundant information, thereby to achieve
the purpose of saving energy.
7. Data management. From the view of data storage, sensor networks can be regarded as a
distributed database. As a database method for data management in sensor networks,
the logical view of data stored in the network can be separated from the realization of
the network, making users of sensor networks need to only care about the logical
structure of data query, no need to care about implementation details.
Game Theory 62
2.2 Comparative analysis of routing protocols of Wireless sensor network
After many years efforts of national researchers, sensor network routing protocol algorithm
has quite a number of results. According to the routing protocol algorithm, the network
structure [10] can be divided into three categories as a flat routing, hierarchical routing and
location-based routing; according to protocol operations rules, it can be divided into routing
consultations, multi-path routing, QoS routing, query routing, etc. (Table 1.1 below). The
following are introduced one by one by category.
Flat Routing
Directed Diffusion, SPIN, Rumor
routing
Hierarchical routing LEACH, PEGASIS, EEN&APTEEN
Classification
according to the
Structural of network
Location-based
Routing
GAF, GEAR
Consultation route SPIN, Directed Diffusion
Multi-path routing Directed Diffusion, SPIN, SPEED
QoS Routing SPEED
Classification
according to the
protocol operation
Query Routing Directed Diffusion, Rumor routing
Table 1.1 Classification of routing protocols of wireless sensor network
2.2.1 Protocol based on network structure
1. Flat routing protocols
In the flat multi-hop wireless sensor networks, flat routing protocols generally require each
node to play the same role. Multi-sensor nodes implement acquisition of data
synergistically. The studies for data-centric routing strategy have shown that energy can be
saved through collaboration of multi-node operation and the elimination of redundant data,
such as: SPIN [7-8] and Directed Diffusion [9-10]. Both protocols promote the other protocol
design following a similar idea (i.e. data-centric routing method).
SPIN (Sensor Protocols for Information via Negotiation) [7-8]: W. Heinzelman and others
made a class of adaptive SPIN routing protocol. The protocol assumes that all nodes in the
network are potential Sink nodes, and each node can disseminate information to the other
nodes in the network. It just needs to send the data which other nodes does not have. In
addition, SPIN protocol classes also use the data negotiation strategies and resources
adaptive algorithm. The node running SPIN protocol is assigned with each high-level data
meta-data descriptor used to describe their data collected completely. Implementing the
meta-data consultation before any data to be sent, to ensure that no redundant information
transmit in the network. In addition, SPIN protocols have right to access the current energy
level of each node, and adjust the running mold of protocol according to the residual energy
level of node. Meta-data negotiation strategies of SPIN protocol solve the existing typical
problems of the diffusion, thus improving energy efficiency and saving energy. However,
the data broadcasting mechanism of SPIN protocol class can not guarantee that the data can
transmit to the destination node.
Directed Diffusion [9-10]: C. Intanagonwiwat and others propose a new communication
model of data acquisition for sensor networks, called directed diffusion. As a data-centric
(DC data-centric) and application-aware communication model, directed diffusion protocol
requires all of the data generated by sensor nodes named with attribute value pairs. The
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 63
main idea of the model DC is a purposes to eliminate redundancy and minimize the amount
of data transfer through data fusion of different sources nodes and re-routing, thus saving
energy and extending the life of the network system. DC routing policy can find the path
from multiple sources nodes to a single destination node and take the operation of
redundant data fusion in the net. Comparing SPIN protocol, the capability of directed
diffusion protocol to adapt to the environment in mobile applications is weak. In addition,
the DC communication model may not apply to the application which requires a sustained
data transmission to Sink node, and the query and data-matching work may require
additional overhead.
2. Hierarchical routing protocols
Hierarchical or clustering routing strategy, first proposed in the wired network, is a better
scalability and communication efficient routing. Hierarchical routing reduce the amount of
data transmitting to Sink node through the implementation of data fusion, reduce energy
consumption of each node within the cluster, and it is an effective solution to improve
energy efficiency. Hierarchical routing mainly constituted by two levels: one level is used to
create clusters and select the cluster head node, another level is used to integrate and
process the collected data and routing data.
LEACH (Low Energy Adaptive Clustering Hierarchy) [11] [12-13]: W. Heinzelman and
others propose a hierarchical clustering routing algorithm for sensor networks. It is a
clustering routing protocol using distributed cluster formation technique. LEACH select a
number of sensor nodes randomly acting as cluster head nodes (CHs, Cluster-Heads), so
that all nodes take turns to act as cluster head nodes to bear the cost of energy evenly. In the
LEACH protocol, the cluster head node integrate the data collected by all non-cluster head
node (non-CHS, non-Cluster-Heads) which belong to it, and then sent the integrated data
packets to the Sink node to reduce transmission volume of data. Table 1.2 compares SPIN,
LEACH and Directed Diffusion routing technology according to the different parameters. It
can be seen from the table that directed diffusion protocol is an energy-efficient routing of
compromise due to the use of network processing and optimization path method.
SPIN protocol LEACH protocol
Directed Diffusion
protocol
Optimal path No No Yes
Internet Life Well Well Well
Resource-aware Yes Yes Yes
The use of meta-data Yes No Yes
Table 1.2 SPIN, LEACH and Directed Diffusion protocol comparisons
TEEN (Threshold-sensitive Energy Efficient sensor Network Protocol) [14] and APTEEN
(Adaptive Periodic Threshold-sensitive Energy Efficient sensor Network protocol) [15]:
these two kinds of hierarchical routing protocols are proposed for time-critical data
acquisition application. In the TEEN protocol, sensor nodes collect information constantly,
but the process of data transfer is less. A cluster head node send a hard threshold (collection
attributes), and a soft-threshold (can lead a change of sensed attribute value range for the
node open the transmitter to transmit data) to its members. Only when the sensed attribute
value in the context is in the range of interest, it will be allowed to transfer data.
The simulation results of TEEN and the APTEEN show that these two types of protocol are
better than LEACH protocol in operational performance. It is proved by Experiment,
Game Theory 64
according to energy consumption and network lifetime, the performance of APTEEN is
between LEACH and TEEN. TEEN provide the best performance because it reduces the
number of transmissions. The major shortcomings of these two protocols are the increase of
the cost and complexity which is related to the formation of a multi-level class, the
realization of the methods based on threshold functions and how to deal with the increases
cost of attribute based on named query methods.
3. GIS-based routing protocol
In this type of routing protocol, sensor nodes depend on the location information to address.
The distance between neighbor nodes can be estimated by the arrived signal strength. The
relative coordinates of neighbor nodes are get through the exchange of information between
the nodes [16-17, 18]. In other words, if the node equipped with small low-power GPS
receiver [19], nodes can get location information through communications with satellite
directly using GPS. To conserve energy, without uncertain situation, some strategy based
location information requires the nodes go to sleep. Make as many nodes as possible in
sleep, so that the network can save more energy. The problem of designing table of the sleep
cycle scheduling with a fixed way for each node are discussed in [19-20].
2.2.2 Protocol-based protocol operation
1. Negotiation-based routing protocol
These protocols using advanced data descriptors reduce the amount of data transmission
through consultation to eliminate redundant data. Communication decision-making is made
also based on the resources available to them. SPIN protocol suite [11-12] are examples of
routing protocols based on negotiated. Motives of consultation are: to avoid the defects of
diffusion, which will produce the problems of information explosion and overlap, so the
node will receive multiple copies of the same data. This operation will consume more
energy, bandwidth, and to spend more processing time due to send the same data to
different nodes. The important ideal of negotiation-based routing protocol is to eliminate
duplicate information, avoid redundant information sending to the next node or Sink node
and do a series of operation in consultations before sending the actual data.
2. Multi-path routing protocols
In order to improve network performance, such protocols will use multi-path data routing
rather than a single path. The fault-tolerant of protocol according to exist possibility of other
alternative path when the basic path between source node and destination node fail.
Increase of the fault tolerance get from maintaining the multi-path between the source node
and destination nodes, with the ever-increasing cost of energy consumption and traffic
generated. The paths of choice maintain its vitality through sending the message
periodically, so increasing network reliability and fault tolerance is obtained through
maintaining a number of alternative paths available with increasing cost.
3. QoS-based routing protocol
Once considering the performance QoS when address data, network has to strike a balance
between power and data quality. Especially when the node to send data to Sink node, the
network has to meet some QoS criteria, such as: delay, data accuracy, bandwidth utilization
rate and so on.
4. Routing protocol based on query
Such routing protocols are characterized by: the destination node transmit a query through
the Internet for collecting data needed to complete tasks, then after a node that owns the
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 65
data match the query, we send the data back to the node starting the query, which is the
destination node. Usually these queries are described by natural language or high-level
query language. All nodes have a table consisted of query mandates they received. After
receiving a query, they send the data matching with the queries. Directed diffusion protocol
[7] is an example of this kind of routing. In the communication model of directional
diffusion, Sink node sends interested information to all nodes. Once the interest spread
through the network, the gradient is established which is from the source node to Sink
nodes. When the source node has the data of the interest, the source node send data along
the interest gradient path. To reduce energy consumption, it implements the routing after
data fusion.
We provide an overview of a variety of routing algorithms above according to different
classification, compare similar routing algorithm and point out their advantage and
disadvantage.
3. An overview of game theory
Strictly speaking, the game theory is not a branch of economics. It is a methodology, whose
scope of application is not limited to economics. Political science, military, diplomatic,
international relations, public choice, criminology are related to game theory. Many scholars
have already introduced game theory into the field of communication, including flow
control, routing algorithms, power control. Game theory, also translated as game theory
[21], is to study the decision when the behavior of decision-making body makes a direct
interaction, as well as the balance of this decision-making.
Presentation of a complete game problem requires at least three basic elements: player,
strategy set, and payoff function.
1. Player
Player is the immediate parties involved in game. He is the main maker of decision-making
and strategy of game. In a different game, the player means different which can be
individual, group or collective, but these organizations or groups must be for a common
goal and interests to participate in game. Player should know clearly their own goals and
interests and always take the best strategy to achieve their maximum effectiveness and
interests in the game.
2. Strategy set
In a game, a practical, feasible and complete action which is available for participants to
chooses to be called a strategy. Strategy set is all the possible set of strategies taken by
player. It is the tools and instruments for player to play, and each set should be set at least
two different strategies. Strategies from each strategy set in game forming a game situation.
3. Payoff function
When strategy set adopted by all players is determined, they have their own "payoff
function" or "profit function". Payoff function express the level of the income or utility can
be get from the game by player, which is the function of strategy for all players. Different
strategies may lead to different benefits, which is the thing each player really cares about.
In game theory, one of the important bases for each player to make a rational decision-
making is the amount of his possible profits, which is an insider need to calculate carefully
the profit function. The structure and values of profit function will undoubtedly affect the
player's behavior, thus also affect the final outcome of the game. As a result, the
determination of profit function is a very important matter in game theory study.
Game Theory 66
Considering different point of view for game, a player can have all kinds of profit function
which is not unique.
3.1 Nash equilibrium
Game theory is a mathematical tool used to study the decision when the behavior of
decision-making body makes a direct interaction, as well as the balance of this decision-
making. In other words, it is decision-making problems and balance issues when a choice
involved in a subject is impacted by the choices of other subjects and return to influence the
choice of other subjects. The most basic components of game theory is the game concept,
using the formula is expressed as G =(N, A, {u
i
}), where G is a specific game, N = {1, 2, , n}
is a limited set of participants (decision makers), A
i
is a collection of optional behavior of the
participant i, A = A
1
A
2
A
n
is behavior space, {u
i
} = {u
1
, u
2
, u
n
} is the maximum
effectiveness (objective) of function set which participants hope to. Each objective function
of participant u
i
is a function of the special action a
i
selected by a participant i, but also the
functions of the action a
-i
chosen by all the other players in this game. That is to say the
individual objective function depends not only on its own choice, but also on other
participants choices. Game may include some additional components, such as the
information and communication mechanisms [21] which each participant can make use of.
For the game, the basic concept of steady state is the Nash equilibrium. In the Nash
equilibrium, there is no node which can improve its objective function value through
unilaterally deviating from the value of the state. For example: a* is the steady state, only if:
( , ) ( , ) ,
i i i i i i i i
u a a u a a a A i N
- - -

> e e . (1.1)
These steady states can predict the output of distribution algorithms. Strategy
i
a
-
is a "best"
strategy chosen by participant i in the face of opponents; this is true for all participants.
Game result is "stable", which means that no participant has a incentive to deviate from this
choice unilaterally; in a sense, Nash equilibrium is a "no regrets" solution of game.
Another expression for Nash equilibrium is sometimes very useful. For any a
-i
e A
-i
, we
define the best set of participants:
( ) { : ( , ) ( , )}
i i i i i i i i i i
B a a A u a a u a a

' = e > , for all
i
a' e A
i
(1.2)
In general, B
i
is called the "best response function" of the participants, so we can define Nash
equilibrium to a strategy vector
1
( ,... )
n
a a
- -
, where ( ),
i i i
a B a i N
- -

e e .
A very important point is: in many cases, the concept of the solution of a game exists
logically. In fact, the concept of Nash equilibrium is used widely because it exists in many
games.
3.2 Incentive theory
Motivation theory [22-23] is one of the most important applications for the game theory in
economics, which have a wide range of applications in all fields. It reveals the asymmetric
information as an important role played in economics. The main analytical framework for
incentive theory is made in the principal-agent relationship model. In this relationship, there
is a principal and one or more agents, as agents have the expertise or unique information
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 67
which a principal does not have, or simply because the client not has the time and energy to
deal with certain things, the principal delegate an agent to deal with certain matters which
originally belongs to his power or responsibility.
4. The model of data fusion based on game theory
In this section, the idea of game theory will be introduced to the wireless sensor networks to
model RA-G (Reliable Aggregation based on Game theory) for delay and the energy
efficiency of nodes integration of the data fusion mechanism. By the introduction of wireless
sensor networks, we can see that the network node has features of severe restrictions on
bandwidth resources, energy, storage capacity and computing. In the integration phase,
each intermediate node want integrate sufficient data packets before sending data to
minimize their consumption of energy required to send data. The more integration nodes
collect data packets, the more accurate for the description of monitored goals, that is the
accuracy of the information; but on the other hand, collecting more data packets need to
wait for the longer integration time, which will lead that the final information delay
received by network users would greatly increase. This situation is intolerable for real-time
target tracking system. This shows that the above-mentioned factors in the network are
contradictory. For the node, it want to save as much as possible the energy of their own
bandwidth resources, and for the network, the delay is a key issue, that is to say the nodes
and the interests of network exist contradictions; when the fusion node transmit fused data
packets to the sink node, there is another issue to be considered. As each node in each
period play different role and with different status, in data transmission phase, nodes have
to weigh their own needs to send data and to forward data services for other nodes. On the
one hand, when the node need to send data, other nodes can provide forwarding services;
the other hand, each node try to forward the data as less as possible for the other nodes in
order to reduce power consumption. But if all nodes are not willing to forward data for
other nodes, then the connectivity of network will decline sharply and reliable real-time
transmission of data packets can not be guaranteed, and ultimately affect the overall
performance of the network seriously which is also a contradiction between nodes and the
interest of network.
Game theory is a good mathematical tool in dealing with such a conflict of interest. The
following section will build a determination model of intermediate nodes integration based
on game theory for the real-time target / event monitoring system, and make some
preliminary attempts on node incentive mechanism.
4.1 Real-time target / event monitoring system
Real-time target / event monitoring [24] system consists of hundreds of tiny sensor nodes,
which can monitor and track goals efficiently and real-timely within the monitoring region,
and distinguish the targets. The result will be reported to end-users via satellite or cable
network by sink node. This section used the integration of hierarchical models [25] to
achieve efficient use of energy. If the particle size of integration is too small, a lot of useful
information of the collected raw data may be premature loss; however, if the particle size of
integration is too large, it will make wireless sensor networks consume excessive energy for
transmitting data and maybe cause serious network congestion and loss of information.
Therefore, in this section, real-time target / event monitoring system use a mechanism of
Game Theory 68
hierarchical integration to solve the above problems, as shown in Figure 1.2 for the
schematic of hierarchical integration.
Second layer: the fusion in nodes
First layer: the fusion of original data
Fourth layer: Sink node integration
Third layer: group integration
Sensor1 Sensor2 Sensor3
Signal3 Signal2 Signal1
Group Group Group
Node1 Node2 Node3
Fig. 1.2 the schematic of hierarchical integration
The first layer is about the fusion of original data. Data collected by sensor are the original
input of the entire network. The integration in this layer provides the basis processing for
the information of the tracked targets / monitoring of events in the network. Data fusion of
this level must meet the following requirements: 1) meet the real-time constraints; 2) be able
to handle a large number of input data. In order to enhance energy-saving effect of the
integration operation, this layer operation of data fusion is semantics-based integration. By
extracting the semantic of raw data collected from sensors to achieve higher efficiency
integration.
The second layer is fusion in node level. Each sensor node integrates several different types
of sensors. After collecting self-confidence vector of different sensors, nodes do the further
integration. It will calculate the average of all nodes confidence vector, and then forming a
single node-level confidence vectors. Semantics of sensor data should be extracted and fuse
at the node level, classification module of perception algorithm and the node level need to
cache and deal with selected data. Here the processing time require in a reasonable range.
The third layer is group integration. When the node level fusion gets monitored results, we
began to estimate related information of the current target, and should uniquely determine
the monitored objectives in logic. During the preliminary estimate, we should let the
collected information about the target location of each node use their confidence vector as
the weight to the average all the monitoring value. This involve an issue is when and where
the estimated calculation of such a collection should be done. Representation about the
target is a classic problem. There are already a number of centralized or distributed
algorithms of temporal and spatial correlation to achieve. In this system, there are two
related mechanisms used in this layer.
1. The fusion method based on logic group
In the target / event monitoring system, there are two main tasks which are to collect
relevant information of objectives and to represent goals. A simple solution is sending the
monitoring results, the location and other information of all the nodes to a central base
station, to estimate the current location and other information based on the location
information [26-27] of all nodes sending the information and other related information
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 69
collected, and in the process, to the use of space-time related algorithm to give and maintain
the coherence for the sole objective. But the efficiency of this centralized mechanism is low
both for energy consumption and delay. Sending the large amount of data report to the base
station will cause excessive energy consumption, and if the target is far away from the base
station will greatly increase the delay. In order to avoid the shortcomings of the above
mechanism, using a distributed mechanism is a solution. Processing the data near the
monitored target / event, and then sent fused information to the base station for further
operations.
2. Balance of energy and delay based on Game Theory
In the group fusion layer, managing the node need to wait for some time to gather the data
report of members in group, and integrate these reports, then forward to the Sink nodes
through other nodes. In this process, there is a variable parameter need to be considered,
degree of aggregation DOA, which is a direct expression to show whether the management
node has received a sufficient number of reports of group members. That is to say the
management node doesnt operate the fusion before receiving to a member of sufficient
DOA data reports in group. In the management nodes, the problem of balance description
need to be considered are as follows: For the management node, the larger DOA values
means the more members data report can be collected to fuse, and then sent data packets of
once fusion. It compared with the situation of smaller DOA, obviously management node
can save more energy consumption on sending data and is conducive to reducing the load
nodes of transmission; while for the network users, the goals of real-time monitoring are the
ultimate goals of the network. If the DOA value is so large that the producing delays beyond
the limits of real-time systems, it will inevitably harm the interests of Internet users,
resulting in unavailable purpose of real-time monitoring for target. In above process, the
interests between the nodes and network create a conflict, which is needed to use some
mechanism to guide the behavior of nodes in order to balance the interests of both.
From the above description we can see this game models participants are nodes and
networks, which should be a two-game model with incomplete information. Supposing
energy saving through the data fusion by management node is E
p
, while the wait time of
fusion which is the increased delay for participants in network is T
aggr
. Now we come to
quantitative analysis the impact of DOA for E
s
and T
aggr
.
1. Energy savings in fusion
In real system, due to the impact of various factors, such as sensing range, target movement
model and the node density, doing the analysis is difficult. Here we make some simplifying
assumptions to do approximate analysis. Suppose sensing range of sensor nodes is a circular
area with a radius R. The target moves forward with uniform speed along the straight line,
and nodes in an unlimited sensor network are uniformly distributed.
Figure 1.3 shows the schematic diagram of target and monitored region. The red star
represents the position of target. The sensor node in the circular can sense this target then
forming a logical group. The sensor nodes with the dark mark are the managed nodes of
logical group. Supposing the number of members nodes in group are n
g
. If the value of DOA
is 1, that is, dont do the operation of fusion in the management node. So for the
management node, the energy consumption of sending group members required for data
reporting is showed as follows:
( )
r
T woaggr g elec
E n lE l d c

= + (1.3)
Game Theory 70
Fig. 1.3 Monitored region
Where, l is length of a data packet. E
elec
is the energy consumption per bit data for sending or
receiving circuit. The constant is related with the transmission channel model used.
fs
is
the free-space transmission, the corresponding r is 2.
amp
is the multi-path fading
transmission, the corresponding r is 4. When the distance d between the transmitter and
receiver is less than the threshold value d
0
, we use free-space transmission model. On the
contrary if d is more than or d
0
, we use multi-path fading model. When the management
nodes do fusion of data, the value of DOA is a positive integer more than 1 and not more
than n
g
. At this point, the DOA data reporting of the members nodes in group will be
received and integrated by management nodes. Thus the energy consumption of sending
the members data reporting in group by nodes is:
( 1) ( )
r
T aggr g elec
E n DOA lE l d c

= + + (1.4)
We can draw the conclusion from the above two equations, when 2 DOA n
g
, the
percentage of energy savings by managed node is:
1
1 1
T aggr g
P
T woaggr g
E n DOA
E
E n

+
= = (1.5)
The above equation reveals the relationship between the saved energy obtained by data
fusion in management nodes and DOA. In the game model discussed in this chapter, we
define E
P
as the benefits obtained by management nodes through the integration.
Definition 1.1 The proceeds of management nodes in Game model of group-level fusion are
as follows:
1
1
1
g
I P
g g
n DOA
DOA
X E
n n
+

= = = (1.6)
2. The impact of convergence on the network delay
After management node generates its own data or receives the data reports of group
member, it doesnt transmit them immediately but wait for a while to obtain sufficient data
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 71
reporting, then do the fusion of data and transmit the fused data packet. The management
nodes in this article can integrate a number of its data-reporting received through data
fusing and processing into a new isometric data reporting, and the computing time of
integration is much smaller than the data transmission time. Therefore we ignore this data-
processing delay.
Fig. 1.4 Schematic diagram of the moving target trajectory
In Figure 1.4, goals move with speed TS for some time T, the target's perception range is SR.
White and gray circular area represents the perception of the region of the target mobile
before and after moving respectively. Nodes in the vertical shaded area are the existent new
sensor nodes perceived after targets begin to move. The management node in the shadow
need collect DOA data packet of members nodes to start data fusion. The delay for that is as
follows:
2
aggr
DOA
T
SR TS D
=

(1.7)
If the density D and sensing range SR of nodes in the network are determined, we can see that
here the delay is related with DOA and moving speed TS of target. In the game model of this
chapter, the longer time the integration of the management nodes are waiting for, more
negative for real-time targets of the network. So we define the delay brought by integration as
the penalty factor of the network for the management node, while getting the energy gains,
management node must pay the price. Internet users can guide the behavior of management
nodes through the definition of punish to make it operate in reasonable range.
Definition 1.2 The cost of the delay of the management node in group-level integration is
showed as follows,
( ) f TS
I
C DOA = (1.8)
Where f(TS) is a function of target move speed, and the output is a positive number between
0 and 1. In the real-time monitoring system, the faster movement of the target the shorter the
Target speed * Time
Perceptual
Range
Game Theory 72
time is needed for information monitored to send to the Sink node. Here f(TS) is the function
of the urgency for sending the data reporting to the node, and it increase with the accretion
of the target moving speed. So the expression of f(TS) is showed as follows,
f(TS) = Monitored target speed The greatest possible speed of target (Eq.1.9)
From equation 1.9 we can see that the output increase with the target moving speed increases,
that is to say the targeted information monitored has the higher degree of urgency.
3. The definition of game model
Definition 1.3 From the above analysis, we can define the utility function of management
nodes in a balanced game for the energy and delay as follows,
( )
1
f TS
I I I
g
DOA
U X C DOA
n

= = (1.10)
At this point, GA-G(Group Aggregation based on Game theory) can be described as follows.
1. Participants
In the game the two sides of the conflict of interest are manage nodes and network users.
2. Strategy
Management nodes evaluate the urgency of this monitoring information through the related
information of goal monitored by the nodes of the members in group and themselves, which
is the output value of the function f(TS). It increases with the moving speed of target
increases, which show the higher the degree of urgency for the information; in the game of
this article, the management node as to networks can take the value of DOA which is more
conducive to its own energy savings to carry out the operations of integration; while for
networks, through avoiding the excessive delay, using penalty for delay to constraint the
behavior of management node, punishment is harder as the intensity of target information
increased, so that it is better for a high degree emergency information can be transmitted to
the Sink node with the smaller delay.
3. The expression of utility function as follows:
( )
1
max max
f TS
I
g
DOA
U DOA
n
| |

| =
|
\ .
(1.11)
Where the constraint condition is the value of DOA can not exceed the number of members
nodes in group n
g
and no less than 2. Because in the real network, if the set for DOA over n
g
,
the management node will never do the operations of integration; and the values of nodes
should be the value when the utility of nodes to take the largest value of DOA.
Therefore, the optimal value of DOA as follows:
( )
2
1
argmax
g
f TS
opt
DOA n g
DOA
DOA DOA
n
s s


=
`

)
(1.12)
4. Qualitative Analysis of Game Model
In above model, the constraint condition is 2 DOA n
g
. Considering the case when DOA
take 1, there is equivalent to introduce no group-level fusion mechanism, therefore, no data
integration operation of the management node is involved in. That is, all nodes perceiving
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 73
objectives transmit its data to Sink node through multi-hops after collecting the required
data. There are not considerations for the balance of energy consumption and delay,
therefore there is no such thing as a balanced solution; when DOA 2, group-level
integration mechanisms began to play its role and need to balance the energy consumption
and delay in the management nodes. In this game model, the benefit of management node is
(DOA 1)/n
g
. During a target / event monitoring process, sensor network nodes which
perceiving the same target / event form a logical group. In the initial stage of group, the
nodes can know the information of other neighbor nodes in group through interaction, and
in a short period of time, the node monitoring of the goals / event is determined, that is to
say n
g
is certain. In this context, we can see the benefits of management nodes increase with
the value of DOA increases. Meaning mapping to the network is that the more data
reporting of members node is collected, the management node can save more energy in
transmitting data. Here it also implies a network parameter, the quality of information. If
management node collects more data reporting of member node in group, more accurate
description of the targets / event then for monitoring is shown. When the members of the
group increase, that is to say n
g
increases, the management node consequentially increase
the corresponding value of DOA, in order to obtain substantial benefits. It is good for both
the energy savings and the accuracy of the information, and useful for the management
node; in order to avoid excessive selfish of management node and setting too large values of
DOA to get own interest which will lead to the large transmission delay of information, the
network need to set the penalty factor to constrain the behavior of the management node. It
is expressed as the second one DOA
f(TS)
in this model. While getting the benefit through the
operations of integration, management node has to pay the appropriate price. The greater
value of DOA, the delay will be greater, which means that while getting more revenue,
management node also suffer the more punishment from the network. And in the real-time
monitoring system, the moving speed of target/event is also the factors that must be
considered. If the moving speed of goal is fast, then the propagation delay of information
will be small. In the model, the index f(TS) of DOA is an adjustment factor for the
corresponding speed. f(TS) will increase with the moving speed of the monitored target
increases. When monitoring a fast moving target, the costs paid by the management node
are higher than monitoring a low moving target. At this time, if the management node takes
the greater the value of DOA, the punishment received grow faster, which is negative for
the management node on the contrary. At this time, for the management node and network,
the balance effectiveness is max((DOA-1)/n
g
- DOA
f(TS)
). From the above discussion, we can
see that the game model of management node can adjust the value of DOA according to the
actual situation in the network to reach the balance between the interests of two sides,
thereby improving overall effectiveness.
4.2 Game model of data packet forwarding
After fusing the collected data reporting of the members in group, management nodes need
to forward packets through other nodes to the Sink node. In traditional routing in wireless
sensor, we assume that all nodes are selfless, that is, when each sensor node receives a
request of forwarding, it will accept the request and forward the received data packets. In
order to extend the life cycle of sensor networks, this chapter describes a approach which
use the self-serving nature of the nodes to balance the energy consumption of the network,
making the energy consumption of network nodes in a balance state and the result is that
the whole network will not split quickly.
Game Theory 74
We use game theory to solve the following conflicts of interest. The nodes in wireless sensor
network are rational, which means there is certain selfishness and their actions are driven by
self-interest. On the one hand, each node hopes that other nodes cant provide services of
forwarding when it send data; the other hand, each node wants as little as possible on
forwarding data for other nodes to reduce energy consumption. However, if all nodes are
not willing to forward data for other nodes, then the connectivity of network will be a sharp
decline, and even become non-connection; Moreover, the application background of this
section is a real-time monitoring, so how to balance the energy while does not to cause too
large delay is also a problem needed to be solved.
The game model of final stage for forwarding data described as follows:
1. Game participants
Game in the stage of data forwarding is defined as an extended two-person incomplete
information game. The game participants are nodes and networks. For each node in the
network, supposing the total number of transmitted data packets sent by this node to other
nodes is R
i
(t), the number of successfully transmitted data packets sent by the network
nodes for this node is T
i
(t); of these, T
i
(t) present that the number of successfully transmitted
data packets of node i forwarded by other nodes in the network until the time t; R
i
(t) present
that the number of transmitted data packets of node i forwarded by other nodes in the
network until the time t. f(TS) is the available delay compensation for agreeing to forward
data packets.
2. The strategy set
This phase of the game is the extend game. For the extended game, the game participants
can not predetermine a complete program of action. Participants operations of every step
are chosen based on the behavior of other participants before. In the game of this chapter,
for this participant in network, the action of the node which can be taken includes accepting
the forwarding request of the network to forward the data packets. At this time, the node
can get the delay compensation from network. In a certain extent, such a mechanism
encourage the nodes accept a forwarding request to reduce the forwarding delay of data
packets; or deny the forwarding request of the network, which need to pay a certain price at
the same time. Because the node refuse to forward the request means that a certain amount
of delay is brought to the network. The action of the relative node can be taken include
accepting the forwarding request of the node to forward the data packets, or refusing the
forwarding request of node. Whether the node or network, decision of whether to accept the
other's forwarding request is based on whether the other side forward a sufficient number
of data packets for themselves and the corresponding delay compensation.
3. Utility function
From the perspective of each node, when the network forward packets successfully for this
node, it means that the node obtain interest from the network. When the node accepts the
forwarding request of the network to forward data packets for the network, it means that
the node pay costs for the network. As the average number of hops o crossed by the
exchange of data between the nodes and Sink nodes are no less than l, the benefits received
after every successfully sending a own data packet is o times than the loss for forwarding a
data packet for the network. This encourage the nodes in network involving in data
forwarding; in addition, though the node's utility function is less than zero, if the node agree
to forward the data packet, then it will get awards from the network, which is delay
compensation, to encourage the node forwarding data; however, if the node refuse to
forward data packets, then it will dont get the value of delay compensation, as a
punishment to nodes from network.
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 75
As a result, the mathematical expression of utility function in the model of DR-G (Data
Relaying base on Game Theory) is as follows:
U(T
i
(t),R
i
(t)) = o T
i
(t) - R
i
(t) + f(TS)
From above equation, we can introduce a decision function of node forwarding as follow,
which is used to determine whether forward data for the other nodes.
( ) ( ) ( )
1, ( ) ( ) ( ) 0
,
0, ( ) ( ) ( ) 0
i i
i i
i i
T t R t f TS
T t R t
T t R t f TS
o
o
+ >
' A =

+ <

Where, ois the average number of hops crossed by transmitting a data packet to the sink
node, f(TS) is the available delay compensation for agreeing to forward data packets.
When the value of (T
i
(t),R
i
(t)) is 1, the intermediate node i agrees to forward; when the
value of (T
i
(t),R
i
(t)) is 0, the node i refuses to forward.
4.3 Nash equilibrium of Game Theory model
The game model of forwarding a wireless sensor networks data packet was defined in the
previous section, and in this section we will discuss that model. The main analysis of the
content is that during the network operation the game model which was proposed above
plays the role of the energy consumption of a balanced between the nodes with the passage
of time. In which the delay compensation is different with the different target. Each goal is
randomly independent of each other. The previous goals will not influence of the
characteristics of a next target. Therefore, in the discussion does not involve the delay
compensation of the model.
Wireless sensor networks which using the sensor nodes for forwarding decision function,
for the network nodes i, there are
1
limsup ( )
1
i
t
t o
o

s
+
(1.13)
In which,
i
(t) means that until the time t, the proportion of the number of packets which
send data packets of its own successfully the proportion among total which the node i had
sent, that
( )
( )
( ) ( )
i
i
i i
T t
t
T t R t
o =
+
(1.14)
When the node's utility function value is zero, that: o T
i
(t) - R
i
(t) = 0 The corresponding
network participants utility function value is also zero because it is a zero-sum game. At
this point, if the network node received the packet request, it will refuse to forward. When
t , only after the node i had been forwarded at least odata packets for the network, the
network will re-forward the data for the node i. Before this there is o T
i
(t) R
i
(t), added
T
i
(t) both sides of this inequality, that
o T
i
(t) + T
i
(t) R
i
(t) + T
i
(t) (1.15)
Into
Game Theory 76
( ) 1
( ) ( ) 1
i
i i
T t
T t R t o
s
+ +
, that is
1
( )
1
i
t o
o
s
+
When o T
i
(t) R
i
(t), the node i will forward data for other nodes, there are (T
i
(t) +1) o R
i
(t).
From this inequality can be derived o T
i
(t) + o + T
i
(t) R
i
(t) + T
i
(t), both sides are divided
T
i
(t) + R
i
(t), that
( ) ( )
1
( ) ( ) ( ) ( ) ( ) ( )
i i
i i i i i i
T t T t
T t R t T t R t T t R t
o o
+ + >
+ + +
(1.16)
Merger the first and third items of the left on the inequality, that
( 1) ( ) 1
( ) ( )
i
i i
t
T t R t
o
o o + + >
+
(1.17)
Then
1
( )
1 ( 1)( ( ) ( ))
i
i i
t
T t R t
o
o
o o
>
+ + +
, when t ,
1
lim 0
( ) ( )
t
i i
T t R t

=
+
and o is a finite
integer, so there are
1
lim ( )
1
i
t
t o
o

=
+
As can be seen from the above analysis, with the operation of the network over time, the
network and the nodes converged at the Nash equilibrium point gradually, the two sides
return to equilibrium. For the time t , even the network gradually closed to the most
advantage point of the overall performance, it will not affect the balance of return for
various participants.
4.4 The application of model in forwarding process
This section will introduce how to use the game model for forwarding data packets by node
to make the decision-making. Under considering the delay, we can do a better balance for
the energy consumption of wireless sensor networks.
The previous routing algorithms of wireless sensor networks assume that when the node
receives the data packets of other nodes in the network and requests its forwarding, the
node will unconditionally accept the request and forward the data packet. In DR-G model,
however, the node will priority to consider its own interest, and determine whether to
forward packets through the decision-making function of the node forwarding.
To ensure that the data packet of the node is transmitted toward Sink node, in the network
initialization phase, each sensor node adjust the distance between itself and the Sink nodes
according to the received initialization message sent by Sink node, and set their level, while
the Sink node is in the most "shallow" layer of the network (i.e., hop-count = 0). Adoption of
this mechanism has the following advantages,
1. To guarantee a source node sends sensor data to Sink node directionally;
2. To adapt to characteristics of rapid changes in wireless sensor network topology. When
the node failure, its child nodes can rapidly select the other nodes in the same floor as
the parent node, without additional routing overhead;
3. Selected routing paths avoid routing loop issue.
4. Network topology is more stable. As shown in Figure 1.5.
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 77
Fig. 1.5 Schematic diagram of layered wireless sensor network
For any one node in the network, the object requesting it to send the data packet includes
two aspects: the data packet from the upper layer of the routing protocol, required to send
to the other nodes in the network; the data packet which other nodes in network request for
this node.
1. Send own data
When the node has demands for sending the data, first of all to send the request message to its
previous direction neighbor nodes, the so-called previous direction is the nodes in wireless
sensor network which is more shallow than their level, while the deeper nodes is not
conducive to transmit data packets toward the Sink node due to the farther distance from the
Sink nodes. After the previous direction neighbor nodes receive the message for requesting
data, the node does the forwarding decisions according to the game model DR-G. The
neighbor nodes which agreed to forward will returns a value of the feedback information with
a utility function to the node requesting to send data. The node will choose the neighbor nodes
of largest utility for data transmission. After data transmission, the node will have an
additional one to T
i
(t), while the neighbor nodes of forwarding the data plus one to R
i
(t).
2. The other nodes request for forwarding data
When the node i receives the forwarding request of data packet, the first to determine by
using the node forwarding decision-making function adding delayed compensation, if the
output is 1, the node i will sent back a information of agreeing to forward data packets to the
requesting node, and incidentally add the value of U(T
i
(t), R
i
(t)) in this information. After
receiving data packets needed to be transmitted and forwarding successfully, it will plus 1
to the value of R
i
(t), while the node which requests to forward data packets will plus 1 to its
value of T
i
(t). If the output (T
i
(t), R
i
(t)) is 0, then the node will refuse to forward packets
for the network.
We can see from the above procedure, the node using the DR-G model to do the decision-
making of forwarding is with full autonomy. When a node on the path aware that it has
forwarded too much data packets for the network, the cost of the node for the utility
function is too large, then the node will refuse to forward data packets, which can prevent
Node
Game Theory 78
leaving networks prematurely because of their large own energy consumption, which will
also affect the normal data packet forwarding. At the same time, the introduction of delay
compensation makes the node to forward data for the network during decision-making
process, thus ensuring the data packet transmitted in real time.
5. Simulation and performance comparison and analysis
Through the front of the narrative, we know that wireless sensor networks consist of a large
number of tiny sensor nodes deployed in the monitoring region, and forming a network
system of multi-hop, self-organization by the methods of wireless communication. As the
system is relatively complex, the study of wireless sensor networks is not easy to use the
method of experimental analysis. TinyOS provides a powerful development language NesC,
a comprehensive component library and network protocol stack. It is a architecture of
component based, can quickly achieve a variety of applications, and use mainly in wireless
sensor networks. In this chapter, we use the simulation tools TOSSIM embedded in the
TinyOS to simulate, and do the performance of comparative analysis mainly from these two
aspects of energy consumption and delay.
We use the application simulation platform TOSSIM whose open-source is based on
TinyOS, and compare this reliable data fusion model RA-G to the classical data fusion
routing DD and TEEN in wireless sensor networks in performance simulation. The
operating system of experimental background is the virtual environment Cygwin of UNIX
running on the Windows platform. In this section, we compare the data fusion model RA-G
to the classical data fusion routing DD and TEEN in wireless sensor networks in
performance simulation to measure the performance of RA-G.
Figure 1.6 compares the average energy consumption of the three methods in the network
having 100 nodes in 2000s. As can be seen, in the beginning, the energy consumption of the
integration model RA-G based on game theory is almost similar with DD and TEEN.
However, with the operations of network, DD and TEEN gradually higher than the energy
consumed by RA-G, such advantage will increase as the size of the network which becomes
more apparent. This is mainly due to with the increases in network size, the interested
proliferation of DD algorithm, the enhancement of multi-path and a cluster reconstruction
work which require all nodes in the whole network to participate in TEEN algorithm will
consume a large amount of energy. While in the RA-G, the energy consumption is mainly
used by the node of participating target perception and needed to collect and integrate data,
thus the average energy consumption rise marginally. Thus, RA-G can also well adapt to the
changes in network size.
Figure 1.7 shows the comparison of the number of survival nodes in three methods with the
simulation time of 1000s. When the simulation reaches 450 seconds or so later, the nodes of
TEEN algorithm die quickly. As can be seen, the energy balance method of TEEN algorithm
has played a certain role in energy balance, but the price is a little higher. In the DD, due to
after increasing transmission delay in the shortest path, the data collected will forward
along this path to the Sink node, which leads to the energy consumption between the nodes
in network is extremely unbalanced, so the death rate of the node is faster. In the RA-G, the
problem of the energy balance is fully taken into account. The results can be seen from the
comparison, RA-G fusion model can effectively extend the network's normal working hours,
to achieve the purposes of energy balance.
Figure 1.8 compares the real-time performance of RA-G model to DD and TEEN. Each curve
is the average delay of data for the three method transfer under different network size when
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 79
the running time is 1000s. Can be seen from the figure, with the increases of network size,
the delay in DD and RA-G shows a rising trend, which is the same principle of the average
energy consumption. Because the larger the network size, the path returning to Sink node
for the data packet-by-hop is longer, and the delay in the transfer process will have a
corresponding increase naturally.
However, since TEEN uses a hierarchical structure of the network for data fusion method,
the time waiting for the cluster heads fusion is mainly delay, which is determine by the
number of the node from the cluster. Although DD algorithm is use the enhanced shortest
delay path for the data forwarding, the network is a better real-time performance in the
early, so the data packet which forwarded through the enhanced path will get to the sink
Fig. 1.6 Average energy consumption comparisons
Fig. 1.7 Comparison of the number of survival nodes
Time ( s )
Time ( s )
A
v
e
r
a
g
e

e
n
e
r
g
y

c
o
n
s
u
m
p
t
i
o
n

(
M
J
)


S
u
r
v
i
v
a
l

n
o
d
e
s

Game Theory 80
Fig. 1.8 Comparison of three methods delay
node with the shortest delay. However, with the network operations, the nodes on the
enhance path consumed the energy too fast so that the lowest delay path can no longer
continue to assume the task of forwarding data packets, the network had to choose another
sub-optimal path to transfer data. The number of nodes which may be involved in data
packet transmission is reduced, that will result in the delay become longer for data packet
forwarding after the network operated for a period of time. So, taking into account the long-
term stable operation of the network, DD algorithm does not highlight the real-time
performance. Among the three methods, DD algorithm has large power consumption, and
there is no mechanism for balanced energy consumption, the network's life cycle is shorter
than TEEN and RA-G. The TEEN curve increases as the network grew rapidly. It can be
concluded by observing and analyzing, that there is a delay less from the cluster head
forwards the data packet to the sink node in TEEN algorithm. In the RA-G fusion
mechanism, the data packets are forwarded to the sink node through multi-hop. According
to game model to determine the process of forwarding, then the node use utility function to
conduct merit-based routing. During this period it will bring some data packet transmission
delay, the TEEN algorithm does not involve multi-hop data packet forwarding. So, TEEN
data packet transfer delay is less than RA-G. But this is at the expense of a cluster head
nodes energy consumption. In the TEEN, the time waiting for the cluster heads fusion is
always longer, because after the cluster head node allocated time slot to the cluster
members, whether the members of the node want to send data or not, the other nodes are
waiting for their time slot to sending data. This would give the system the too much of
unnecessary delay. This trend will be more evident as the number of network nodes is
increasing. As the network operation, due to TEEN need to do the cluster reorganization
and the head cluster rotation in the whole network periodically, and each reorganization of
cluster need to broadcast the new threshold, which will bring a lot of energy consumption to
networks, in particular the head cluster node has a heavier burden. From the figure 1.12, we
can see the death rate of the nodes of TEEN is faster than the RA-G in the latter part of the
mechanism in the network. It has a negative impact for the reliability of the network. The
A
v
e
r
a
g
e

d
e
l
a
y

(
s
)

Network nodes
Reliable Aggregation Routing for Wireless Sensor Networks based on Game Theory 81
accelerated death of the nodes lead to the network does not work, and the real-time reliable
performance of a whole network degrades. While the RA-G can use GA-G fusion model in
the group management node to dynamically determine the waiting time of regulation, and
data packet forwarding game model DR-G can well balance energy consumption of each
node in network while considering the delay. And multi-layer fusion mechanism can
greatly reduce the traffic load of the network, effectively extend the life cycle of the network,
and thus the data packet transmission delay can be stability in a long period.
From the above analysis we can see that in the network, the energy and latency are two
interdependent and mutually constraining factors, only one aspect to be considered is not
enough. RA-G fusion model consider both tow aspects at the same time and using the idea
of game theory to build a balance model, effectively improve the network's overall
performance.
6. Summary
This chapter primarily focuses on a reliable structure of data fusion RG-A of wireless sensor
networks. Wireless sensor networks as a major form of mobile computing and treatment, so
its position can not be replaced by other networks. Study on the Reliability about the
Routing protocol for wireless sensor networks, which is the key to ensure that access to
network robustness and reliability. It has a very high value and research value.
RG-A integration model is built based on game theory model for data fusion layer by layer.
Nodes and network can be seen as rational actors and the two aspects of a conflict in game.
Their utility function according to rational reasoning, through the game to balance the
network parameters of the various constraints, so as to achieve a state of balanced,
eventually achieved the purpose that to balance a real-time network and energy
expenditure of the node. Not only improved the energy efficiency of the network but also to
meet the target / event monitoring system for real-time reliability requirements.
7. Reference
[1] Ren Fengyuan and Huang Haining, Wireless sensor networks. Software transactions.
Vo1.14 (No.2): 1148-1l57, 2003.
[2] Estrin D, Govindan R., Heidemann J., Kumar S. Next century challenges: Scalable
coordinate in sensor network.In Proc.of the 5th ACM/IEEE International
Conference on Mobile Computing and Networking, 1999, 263-270.
[3] C.Y Chong and S. Kumar, "Sensor networks evolution, opportunities, and challenge."
Proceedings of the IEEE. vol. 91, 1247-1256, 2003.
[4] Ma Zuchang, Sun Yining and Mei Tao, Summary of wireless sensor networks.
Communications transactions. 35 No. 4, 2004.
[5] Akyildiz I.F, Su W, Sankarasubramaniam Y and Cayirci E, A survey on sensor
networks. IEEE Communications Magazine.40(8):102-114, 2002
[6] L. Hester, Y. Huang, O. Andric, A. Allen, P. Chen. A Self-Organizing Wireless
Network. Computer Communications and Networks, IEEE. 364-369, 2002.
[7] Intanagonwivat, R.Govindan,D. Estrin. Directed diffusion: a scalable and robust
communication paradigm for sensor networks. Proceedings of ACM MobiCom'00,
Bostun,MA, 2000,56-67.
[8] N. Bulusu, J. Heidemann, D. Estrin, "GPS-less low cost outdoor localization for very
small devices", Technical report 00-729, Computer science department, University
of Southern California, Apr. 2000.
Game Theory 82
[9] A.Savvides,C-C Han,aind M.Srivastava, "Dynamic fine-grained localization in Ad-Hoc
networks of sensors," Proceedings of the Seventh ACM Annual International
Conference on Mobile Computing and Networking(MobiCom), pp.166-179, July 2001.
[10] Jamal N. Al-Karaki Ahmed E. Kamal, "Routing Techniques in Wireless Sensor
Networks: A Survey." IEEE Personal Communications. Vol. 11, Issue: 6 pp. 6- 28,
Dec. 2004.
[11] Kulik J, Heinzelman W R. Negotiation-based protocols for disseminating information in
wireless sensor networks. Wireless Networks. 2002, 8 (8) :169-185.
[12] W. Heinzelman, J.Kulik, and H. Balakrishnan, "Adaptive Protocols for Information
Dissemination in Wireless Sensor Networks," Proc. 5th ACM/IEEE Mobicom
Conference (MobiCom'99), Seattle, WA, pp. 174-85, August, 1999.
[13] Heinzelman W, Chandrakasan A, Balakrishnan H.An application specific protocol
architecture for wireless microsensor networks.IEEE Transactionson Wireless
Communications. 2002,(10):660-670
[14] A. Manjeshwar and D. P. Agarwal, "TEEN: a routing protocol for enhanced efficiency in
wireless sensor networks," 1 st International Workshop on Parallel and Distributed
Computing Issues in Wireless Networks and Mobile Computing, April 2001.
[15] W. Heinzelman, A. Chandrakasan Communication Protocol for Wireless and H.
Balakrishnan, "Energy-Efficient Communication Protocol for Wireless Microsensor
Networks," Proceedings of the 33rd Hawaii International Conference on System
Sciences (HICSS '00),January 2000.
[16] W. Heinzelman, "Application-Specific Protocol Architectures for Wireless Networks,"
Ph.D. Dissertation, Massachusetts Institute of Technology, June 2000.
[17] A. Manjeshwar and D. P. Agarwal, "APTEEN: A hybrid protocol for efficient routing
and comprehensive information retrieval in wireless sensor networks,"Proceedings
International Parallel and Distributed Processing Symposium(IPDPS 2002), pp. 195-
202, 2002.
[18] S. Capkun, M. Hamdi, J. Hubaux, "GPS-free positioning in mobile ad-hoc networks",
Proceedings of the 34th Annual Hawaii International Conference on Svstem
Sciences, pp. 3481-3490, 2001.
[19] Y. Xu, J. Heidemann, D. Estrin, Geography-informed energy conservation for ad hoc
routing, Proceedings ofACM MobiCom'2001, Rome, Italy, July 2001.
[20] B Chen, K. Jamieson, H. Balakrishnan, R. Morris, "SPAN: an energyefficient
coordination algorithm for topology maintenance in ad hoc wireless networks",
Wireless Networks, Vol. 8, No. 5, Page(s): 481-494,September 2002.
[21] Zhang Weiying, Game Theory and Information Economics. Shanghai People's
Publishing House. 2001:55~78.
[22] 39 Jean-Jacques Laffont and David Matimort, The Theory of Incentives. China
Renmin University Press. 2002, 6.
[23] Hart, O. and B.Holmstrrom, Theory of Contracts in Advances Economic Theory :fifth
world congress,edited by T.Bewley.Cambridge University Press. 1987.
[24] R. R. Brooks, P. Ramanathan, and A. Sayeed. Distributed Target Tracking and
Classsifcation in Sensor Networks. Proceedings of the IEEE, 2002.
[25] T. He, S. Krishnamurthy, J. A. Stankovic, and T. Abdelzaher. An Energy-Effcient
Surveillance System Using Wireless Sensor Networks. In MobiSys'04, June 2004.
[26] T. He, C. Huang, B. M. Blum, J. A. Stankovic, and T. Abdelzaher.Range-Free Localization
Schemes in Large-Scale Sensor Networks.In MOBICOM'03, September 2003.
[27] R. Stoleru, T. He, J. A. Stankovic, and D. Luebke. A High-Accuarcy, Low-Cost
Localization System for Wireless Sensor Networks. In SenSys'05, November 2005.
5
Inductive Game Theory: A Basic Scenario
Mamoru Kaneko
1
and J. Jude Kline
2
1
Institute of Policy and Planning Sciences, University of Tsukuba, Ibaraki 305-8573,
2
School of Economics,University of Queensland, Brisbane, QLD 4072,
1
Japan
2
Australia
1. Introduction
1.1 General motivations
In game theory and economics it is customary to assume, often implicitly and sometimes
explicitly, that each player has well formed beliefs/knowledge of the game he plays.
Various frameworks have been prepared for explicit analyses of this subject. However, the
more basic question of where a personal understanding of the game comes from is left
unexplored. In some situations such as parlour games, it might not be important to ask the
source of a players understanding. The rules of parlour games are often described clearly in
a rule book. However, in social and economic situations, which are main target areas for
game theory, the rules of the game are not clearly specified anywhere. In those cases,
players need some other sources for their beliefs/knowledge. One ultimate source for a
players understanding is his individual experiences of playing the game. The purpose of
this paper is to develop and to present a theory about the origin and emergence of
individual beliefs/knowledge from the individual experiences of players with bounded
cognitive abilities.
People often behave naturally and effectively without much conscious effort to understand
the world in which they live. For example, we may work, socialize, exercise, eat, sleep,
without consciously thinking about the structure of our social situation. Nevertheless,
experiences of these activities may influence our understanding and thoughts about society.
We regard these experiences as important sources for the formation of an individual
understanding of society.
Treating particular experiences as the ultimate source of general beliefs/knowledge is an
inductive process. Induction is differentiated from deduction in the way that induction is a
process of deriving a general statement from a finite number of observations, while
deduction is a process of deriving conclusions with the same or less logical content with
well-formed inference rules from given premises. Formation of beliefs/knowledge about
social games from individual experiences is typically an inductive process. Thus, we will
call our theory inductive game theory, as was done in Kaneko-Matsui [18]. In fact, economic
theory has had a long tradition of using arguments about learning by experiences to explain
how players come to know the structure of their economy. Even in introductory
microeconomics textbooks, the scientific method of analysis is discussed: collecting data,
formulating hypotheses, predicting, behaving, checking, and updating. Strictly speaking,
Game Theory 84
these steps are applied to economics as a science, but also sometimes, less scientifically, to
ordinary peoples activities.
Our theory formalizes some part of an inductive process of an individual decision maker. In
particular, we describe how a player might use his experiences to form a hypothesis about
the rules and structure of the game. In the starting point of our theory, a player has little a
priori beliefs/knowledge about the structure of the particular game. Almost all
beliefs/knowledge about the structure of the particular game are derived from his
experiences and memories.
A player is assumed to follow some regular behavior, but he occasionally experiments by
taking some trials in order to learn about the game he plays. One may wonder how a player
can act regularly or conduct experiments initially without any beliefs or knowledge. As
mentioned above, many of our activities do not involve high brow analytical thoughts; we
simply act. In our theory, some well defined default action is known to a player, and
whenever he faces a situation he has not thought about, he chooses this action. Initially, the
default action describes his regular behavior, which may be interpreted as a norm in society.
The experimental trials are not well developed experiments, but rather trials taken to see
what happens. By taking these trials and observing resulting outcomes from them, a player
will start to learn more about the other possibilities and the game overall.
Behavioral-Mental Activities
Regular behavior
Experiments
Recording
Construction
(Revision) of
a Personal View
Use of a Personal View
Decision Making
(Early) - Experimental
Stage
Inductive Derivation
Stage
Analysis Stage
Fig. 1.1. Three stages of inductive game theory
The theory we propose has three main stages illustrated in Fig.1.1: the (early) experimentation
stage; the inductive derivation stage; and the analysis stage. This division is made for conceptual
clarity and should not be confused with the rules of the dynamics. In the experimentation
stage, a player accumulates experiences by choosing his regular behavior and occasionally
some alternatives. This stage may take quite some time and involve many repetitions before
a player moves on to the inductive stage. In the inductive derivation stage he constructs a
view of the game based on the accumulated experiences. In the analysis stage, he uses his
derived view to analyze and optimize his behavior. If a player successfully passes through
these three stages, then he brings back his optimizing behavior to the objective situation in
the form of a strategy and behaves accordingly.
Inductive Game Theory: A Basic Scenario 85
In this paper, we should stop at various points to discuss some details of each of the above
stages. Since, however, our intention is to give an entire scenario, we will move on to each
stage sacrificing a detailed study of such a point. After passing through all three stages, the
player may start to experiment again with other behaviors and the experimentation stage
starts again. Experimentation is no longer early since the player now has some beliefs about
the game being played. Having his beliefs, a player may now potentially learn more from
his experiments. Thus, the end of our entire scenario is connected to its start.
While we will take one player through all the stages in our theory, we emphasize that other
players will experiment and move through the stages also at different times or even at the
same time. The precise timing of this movement is not given rigorously. In Section 7.2 we
give an example of how this process of moving through these stages might occur. We
emphasize that experiments are still infrequent occurrences, and the regular behavior is
crucial for a player to gain some information from his experiments. Indeed, if all players
experiment too frequently, little would be learned.
We should distinguish our theory from some approaches in the extant game theory
literature. First, we take up the type-space approach of Harsanyi [10], which has been
further developed by Mertens-Zamir [24] and Brandenburger-Dekel [4]. In this approach,
one starts with a set of parameter values describing the possible games and a description of
each players probabilistic beliefs about those parameters. In contrast, we do not express
beliefs/knowledge either by parameters or by probabilities on them. In our approach,
players beliefs/knowledge are taken as structural expressions. Our main question is how a
player derives such structural expressions from his accumulated experiences. In this sense,
our approach is very different.
Our theory is also distinguished from the fields with the titles of evolution/learning/
experiment (cf., Weibull [31], Fudenberg-Levine [7], Kalai-Lehrer [12], and more generally,
Camerer [5]) and the case-based decision theory of Gilboa-Schmeidler [8]. Those theories are
typically interested in adjustment/convergence of actions to some equilibrium; they do not
address questions on how a player learns the rules/structure of the game. Some of them
extend payoff functions to fit predictions by the theory to observed experimental results.
Case-based decision theory looks more similar to ours. This theory focuses on how a player
uses his past experiences to predict the consequences of an action in similar games. Unlike
our theory, it does not discuss the emergence of beliefs/knowledge on social structures.
Rather than the above mentioned literature, our theory is reminiscent of some philosophical
tradition on induction. Both Francis Bacon [2] and Hume [11] regard individual experience
as the ultimate source of our understanding nature, rather than society. Our theory is closer
to Bacon than Hume in that the target of understanding is a structure of nature in Bacon,
while Hume focussed on similarity. In this sense, the case-based decision theory of Gilboa-
Schmeidler [8] is closer to Hume. Another point relevant to the philosophy literature is that
in our theory, some falsities are inevitably involved in a view constructed by a player from
experiences and each of them may be difficult to remove. Thus, our discourse does not give
a simple progressive view for induction. This is close to Thomas Kuhns [22] discourse of
scientific revolution (cf. also Harper-Schulte [9] for a concise survey of related works).
1.2 Treatments of memories and inductive processes
Here, we discuss our treatment of memory and induction in more detail. A player may,
from time to time, construct a personal view to better understand the structure of some
Game Theory 86
objective game. His view depends on his past interactions. The entire dynamics of a players
interactions in various objective games is conceptually illustrated in the upper diagram of
Fig.1.2. Here, each particular game is assumed to be described by a pair (, m) of an n-person
objective extensive game and objective memory functions m = (m
1
,...,m
n
). Different
superscripts here denote different objective games that a player might face, and the arrows
represent the passing of time. This diagram expresses the fact that a player interacts in
different games with different players and sometimes repeats the same games.
We assume that a player focuses on a particular game situation such as (
1
, m
1
), but he does
not try to understand the entire dynamics depicted in the upper diagram of Fig.1.2. The
situation (
1
, m
1
) occurs occasionally, and we assume that the player behavior depends only
upon the situation and he notices its occurrence when it occurs. By these assumptions, the
dynamics are effectively reduced into those of the lower diagram of Fig.1.2. His target is the
particular situation (
1
, m
1
). In the remainder of the paper, we denote a particular situation
(
1
, m
1
) under our scrutiny by (
o
, m
o
), where the superscript o means objective. We use
the superscript i to denote the inductively derived personal view (
i
, m
i
) of player i about the
objective situation (
o
, m
o
).
) , (
3 3
m ) , (
2 2
m
) , (
1 1
m ) , (
1 1
m ) , (
1 1
m
) , (
2 2
m
) , (
1 1
m ) , (
2 2
m
) , (
1 1
m
) , (
1 1
m
Fig. 1.2. Various social situations
The objective memory function
o
i
m of player i describes how the raw experiences of playing

o
are perceived in his mind. We refer to these memories as short-term memories and
presume that they are based on his observations of information pieces and actions while he
repeatedly plays
o
. The information pieces here correspond to what in game theory are
typically called information sets, and they convey information to the player about the set
of available actions at the current move and perhaps some other details about the current
environment. Our use of the term piece rather than set is crucial for inductive game
theory and it is elaborated on in Section 2.
An objective short-term memory ( )
o
i
x m for player i at his node (move) x consists of
sequences of pairs of information pieces and actions as depicted in Fig.1.3. In this figure, a
single short-term memory consists of three sequences and describes what, player i thinks,
might have happened prior to the node x in the current play of
o
. In his mind, any of these
Inductive Game Theory: A Basic Scenario 87
mi(x) =
(u1,b1), (u2,b2), , (uk,bk), w ,
(v1,c1), (v2,c2), (v3,c3), w ,
(w1,d1), (w2,d2), w
memory threads
memory yarn
Fig. 1.3. Local memory - short-term memory
sequences could have happened and the multiplicity may be due to forgetfulness. We will
use the term memory thread for a single sequence, and memory yarn for the value (set of
memory threads) of the memory function at a point of time.
One role of each short-term memory value ( )
o
i
x m is for player i to specify an action
depending upon the value while playing
o
. The other role is the source for a long-term
memory, which is used by player i to inductively derive a personal view (
i
, m
i
).
The objective record of short-term memories for player i in the past is a long sequence of
memory yarns. A player cannot keep such an entire record; instead, he keeps short-term
memories only for some time. If some occur frequently enough, they change into long-term
memories; otherwise, they disappear from his mind. These long-term memories remain in
his mind as accumulated memories, and become the source for an inductive derivation of a
view on the game. This process will be discussed in Section 3.
The induction process of player i starts with a memory kit, which consists of the set of
accumulated threads and the set of accumulated yarns. The accumulated threads are used to
inductively derive a subjective game
i
, and the yarns may be used to construct his
subjective memory function m
i
. This inductive process of deriving a personal view is
illustrated in Fig.1.4.
v
z b v
z a v
), , (
), , (
Memory Kit
INDUCTION
Personal View
1. Memory Threads
2. Memory Yarns
} {
} ), , ( , ), , ( {
v
z b v z a v
a b
x
z
1
z
2
v
z
Fig. 1.4. Inductive derivation
Game Theory 88
In this paper, we consider one specific procedure for the inductive process, which we call
the initial-segment procedure. This procedure will be discussed formally in Section 4.
1.3 The structure of the present paper
This paper is divided into three parts:
Part I: Background, and basic concepts of inductive game theory. Sections 1 - 3. Section 1 is
now describing the motivation, background, and a rough sketch of our new theory. We will
attempt, in this paper, to give a basic scenario of our entire theory. The mathematical
structure of our theory is based on extensive games. Section 2 gives the definition of an
extensive game in two senses: strong and weak. This distinction will be used to separate the
objective description of a game from a players subjective view, which is derived inductively
from his experiences. Section 3 gives an informal theory of accumulating long-term
memories, and a formal description of the long-term memories as a memory kit.
Part II: Inductive derivation of a personal view. Sections 4 - 6. In Section 4, we define an
inductively derived personal view. We do not describe the induction process entirely.
Rather, we give conditions that determine whether or not a personal view might be
inductively derived from a memory kit. Because we have so many potential views, we
define a direct view in Section 5, which turns out to be a representative of all the views a
player might inductively derive (Section 6).
Part III: Decision making using an inductively derived view. Sections 7 - 9. In this part, we
consider each players use of his derived view for his decision making. We consider a
specific memory kit which allows each player to formulate his decision problem as a 1-
person game. Nevertheless, this situation serves as an experiential foundation of Nash
equilibrium. This Nash equilibrium result, and more general issues of decision making, are
discussed in Sections 7 and 8.
Before proceeding to the formal theory in Section 2, we mention a brief history of this paper
and the present state of inductive game theory. The original version was submitted to this
journal in January 2006. We are writing the final version now two and a half years later in
July 2008. During this period, we have made several advancements in inductive game
theory, which have resulted in other papers. The results of the present paper stand alone as
crucial developments in inductive game theory. Nevertheless, the connection between the
newer developments and this paper need some attention. Rather than to interrupt the flow
of this paper, we have chosen to give summaries and comments on the newer developments
in a postscript presented as Section 9.3.
2. Extensive games, memory, views, and behaviour
To describe a basic situation like (
1
,m
1
) in Fig.1.2, we will use an n-person extensive game
1
and memory functions m
1
= (
1
1
m , ...,
1
n
m ). We follow Kuhns [21] formulation of an
extensive game to represent
1
, except for the replacement of information sets by
information pieces.
1
This replacement is essential for inductive game theory. We use
extensive games in the strong and weak senses to model the objective game situation and
1
There are various formulations of extensive games such as in von Neumann-Morgenstern [32], Selten
[30], Dubey-Kaneko [6], Osborne-Rubinstein [27] and Ritzberger [29]. Those are essentially the same
formulations, while Dubey-Kaneko [6] give a simultaneous move form.
Inductive Game Theory: A Basic Scenario 89
the inductively derived view of a player, which are given in Section 2.1. The memory
functions
1
1
m , ...,
1
n
m will be described in Section 2.2. Then, we formally define an objective
description (
1
,m
1
) and a personal view (
i
,m
i
) of player i in Section 2.2. In Section 2.3 we
give a formal definition of a behavior pattern (strategy configuration) for the players.
2.1 Extensive games
Our definition of an extensive game in the strong sense differs from that of Kuhn [21]
mainly in that the information sets of Kuhn are replaced by information pieces. This
difference is essential from the subjective point of view, though it is less essential from the
objective point of view. An extensive game in the weak sense differs more substantially
from an extensive game of Kuhn.
For notational simplicity, we sometimes make use of a function with the empty domain,
which we call an empty function. When the empty domain and some (possibly nonempty)
region are given, the empty function is uniquely determined.
Definition 2.1 (Extensive games). An extensive game in the strong sense I =
(( , ),( , ), {( , )} ,( , ), )
x x x X
X W A N h t
e
< is defined as follows:
K1(Game Tree): (X,<) is a finite forest (in fact, a tree by K14);
K11: X is a finite non-empty set of nodes, and < is a partial ordering over X;
K12: the set {x e X : x < y} is totally ordered with < for any y e X;
2
K13: X is partitioned into the set X
D
of decision nodes and the set X
E
of endnodes so that every
node in X
D
has at least one successor, and every node in X
E
has no successors;
3
K14: X has the smallest element x
0
, called the root.
4
K2(Information Function): Wis a finite set of information pieces and : X Wis a surjection
with (x) (z) for any x e X
D
and z e X
E
;
K3(Available Action Sets): A
x
is a finite set of available actions for each x e X;
K31: A
x
= C for all x e X
E
;
K32: for all x, y e X
D
, (x) = (y) implies A
x
= A
y
;
K33: for any x e X,
x
is a bijection from the set of immediate successors
5
of x to A
x
;
K4(Player Assignment): N is a finite set of players and t: W 2
N
is a player assignment with
two conditions;
K41: |t(w)| = 1 if w e {(x) : x e X
D
} and t(w) = N if w e {(x) : x e X
E
};
K42: for all j e N, j e t(w) for some w e {(x) : x e X
D
};
K5(Payoff functions): h = {h
i
}
ieN
, where h
i
: {(x) : x e X
E
} R is a payoff function for player
i e N.
Bijection
x
associates an action with an immediate successor of x. Game theoretically, it
names each branch at each node in the tree. When x is an endnode,
x
is the empty function.
Since A
x
is empty, too, by K31,
x
is a bijection.
2
The binary relation < is called a partial ordering on X iff it satisfies (i)(irreflexivity): x 7x; and
(ii)(transitivity): x < y and y < z imply x < z. It is a total ordering iff it is a partial ordering and satisfies
(iii)(totality): x < y, x = y or y < x for all x, y e X.
3
We say that y is a successor of x iff x < y, and that y is an immediate successor of x, denoted by x <
I
y, iff x
< y and there is no z e X such that x < z and z < y.
4
A node x is called the smallest element in X iff x < y or x = y for all y e X.
5
The reason for the bijection from immediate successor to actions, rather than from actions to
immediate successors will be found in K330 below.
Game Theory 90
x
1
a
a
z
1
z
5
z
3
b
b
x2
z
4
a
b
z
2
Fig. 2.1. Violation of condition K33.
When K14 (root) is dropped, and K33 (bijection) and K5 (payoffs) are replaced by the
following weaker requirements, we say that is an extensive game in the weak sense:
K33
0
: for any x eX,
x
is a function from the set of immediate successors of x to A
x
.
K5
0
: h : {(x) : x e X
E
} R is a payoff function for player i.
Since X may not have the smallest element, (X,<) is not necessarily a tree. However, (X,<) is
divided into several connected parts. We can prove that each maximal connected subset of
(X,<) is a tree. Thus, (X,<) is a class of trees, i.e., a forest. For any x e X, there is a unique path
to x, i.e., each maximal set {x
1
, ..., x
m+1
} with x
t
< x
t+1
for t = 1, ...,m and x
m+1
= x. When x is an
endnode, we will call the path to x a play.
In an extensive game in the weak sense, an action a at a node x may not uniquely determine
an immediate successor. See Fig.2.1, which will be discussed as a derived view in Section
4.1. The converse, however, that an immediate successor determines a unique action, does
hold by K33
0
. Thus, we can define: iff and ( ) ,
I I
a x
x y x y y a < < = which means that y is an
immediate successor of x via action a. Then, we define x <
a
y iff there is some y' such that
and ( or ).
I
a
x y y y y y ' ' ' < = <
We will use an extensive game in the strong sense as an objective description of a social
situation we target, e.g.,
o
=
1
in Fig.1.2. An extensive game in the weak sense will be used
for a personal view inductively derived from experiences. The latter differs from the former
in several respects, besides the one mentioned above. First, we take the payoffs as personal
and assume that a players personal view does not include the payoffs of other players.
Hence, condition K5 is weakened to K5
0
for a personal view. Dropping the root assumption
and weakening K33 are more substantial changes. We will see in Section 4 why such
changes are needed when we derive a personal view.
For an extensive game in the weak or strong sense, condition K32 implies that the set of
available actions at a node x is determined by the information piece w = (x). Thus, we may
write A
w
or A
(x)
rather than A
x
.
An extensive game in the strong sense is the same as that given in Kuhn [21], except that we
use information pieces W, rather than information sets. When the structure of is known,
Inductive Game Theory: A Basic Scenario 91
information sets are defined by information pieces, i.e., {x : (x) = w} for w e W. In this sense,
our definition of an extensive game is essentially the same as Kuhns formulation from the
objective point of view. However, the replacement of information sets by information pieces
is substantive from the subjective point of view for our inductive game theory.
For the purpose of comparisons, we first mention the standard interpretation of the theory
of extensive games due to Kuhn [21] (also, cf., Luce-Raiffa [23], Section 3.6). The
interpretation is summarized as follows:
(Full cognizance): each player is fully cognizant of the game structure;
(Ex Ante decision): each player makes a strategy choice before the actual play of the game.
Under (i), when a player receives an information piece w, he can infer the information set {x :
(x) = w} corresponding to piece w. Interpretation (i) is usually assumed so as to make (ii)
meaningful. This will be discussed in the end of this subsection.
In the inductive context described in Section 1, the assumption (i) is dropped. Instead,
players learn some part of the game structure by playing the game. Early on, a player may
not infer at all the set of possible nodes having information piece w. To explain such
differences, we use one small example of an extensive game, which we will repeatedly use
to illustrate new concepts.
Example 2.1. Consider the extensive game depicted in Fig.2.2. It is an example of a 2-person
extensive game. Player 1 moves at the root x
0
, and then at the node x
3
if it is reached. Player
2 moves at x
1
or x
2
depending on the choice of player 1 at x
0
. The information function
assigns (x
0
) = w, (x
1
) = (x
2
) = v, (x
3
) = u. At the endnodes, z
1
, z
2
, z
3
, z
4
, z
5
, the information
function is the identity function, i.e., (z
t
) = z
t
for t = 1, ..., 5. At endnode z
4
the payoffs to
players 1 and 2 are (h
1
(z
4
), h
2
(z
4
)) = (0, 1).
In Kuhns interpretation, each player has the knowledge of the game tree. In Fig.2.2, for
example, when player 2 receives information piece v, he can infer that either x
1
or x
2
is
possible, which means that he knows the information set {x
1
, x
2
}.
P1s move: w
P2s move: v
a b
c
z2
c
d
x
1
x
2
2,3
0,1 1,2
4,3
x
0
P1s move: u
a
b
0,1
x
3
z
4
z1
z
3
z
5
d
Fig. 2.2. 2-person extensive game.
Game Theory 92
Under the inductive interpretation, when player 2 receives information piece v, he may not
come to either of the conclusions mentioned in the previous paragraph. He might not even
be aware of the existence of player 1 - - player 1 may think that the structure could be like
Fig.2.1. In such a case, piece v does not imply the information set {x
1
, x
2
} and the choices by
player 1 either. Thus, in the inductive situation, receiving information piece v may be totally
different from knowing the corresponding information set.
The above consideration suggests that there are multiple interpretations of the knowledge a
player gets from an information piece. Here, we specify the minimal content a player gets
from each information piece w in :
M1: the set A
w
of available actions;
M2: the value t(w) of the player assignment t if w is a decision piece;
M3: his own payoff h
i
(w) (as a numerical value) if w is an endpiece.
These are interpreted as being written on each piece w. These conditions will be discussed
further when we consider some specific memory functions in Section 2.2 and the inductive
derivation of a view in Section 4.
Let us return to (i) and (ii) of the standard interpretation of an extensive game given by
Kuhn [21]. In our inductive game theory, since we drop the cognizance assumption (i), the
ex ante decision making of (ii) does not make sense before an individual constructs a view of
the game. We presume that until he constructs a view, he follows some regular behavior and
makes occasional trials in an effort to learn the game he is playing. At some point of time, he
will try to construct a view based on his accumulated memories of his experiences. Once a
view is constructed, it may then be used by the player to construct an optimal strategy for
future plays.
2.2 Memory functions and views
It is standard in the literature of extensive games to describe the memory ability of a player
in terms of information sets (cf. Kuhn [21]). This does not separate the role of an information
piece (set) as information transmission from the role of an individual memory capability. In
our inductive game theory, the treatment of various types of memories is crucial, and thus,
we need an explicit formulation of individual memories in addition to an extensive game.
For this reason, we introduce the concept of a memory function, which describes short-term
memories of a player within a play of an extensive game.
A memory function expresses a players short-term memory about the history of the current
play of a game. Let = ((X,<), (,W), {(
x
,A
x
)}
xeX
, (t,N), h) be an extensive game in the weak
or strong sense. Recall that for each node x e X, there is a unique path to x which is denoted
by (x
1
, ..., x
m+1
) with x
m+1
= x. Also, the actions taken at x
1
, ..., x
m
on the path to x are uniquely
determined, i.e., for each t = 1, ...,m, there is a unique a
t
e A
x
t
satisfying
x
t
(x
t+1
) = a
t
. We
define the complete history of information pieces and actions up to x by
1 1 1
( ) ( ( ), ),...,( ( ), ), ( ) .
m m m
x x a x a x u
+
= ( ) (2.1)
The history (x) consists of observable elements for players, while the path (x
1
, ..., x
m+1
) to x
consists of unobservables for players. Memories will be defined in terms of these observable
elements.
A short-term memory consists of memory threads, which look somewhat like the historical
sequence (x). However, we allow a player to be forgetful, which is expressed by incomplete
threads or multiple threads. Formally, a memory thread is a finite sequence
Inductive Game Theory: A Basic Scenario 93
1 1 1
( , ),...,( , ), ,
m m m
v a v a v
+
= ( ) (2.2)
where
1
, for all 1,..., and .
t
t t v m
v W a A t m v W
+
e e = e (2.3)
Each component (v
t
, a
t
) (t = 1, ...,m) or v
m+1
in is called a memory knot. A finite nonempty set
of memory threads is called a memory yarn. See Fig.1.3 for an illustration of these concepts.
Now, we have the definition of a memory function.
Definition 2.2 (Memory Functions). We say that a function m
i
is a memory function of player
i iff for each node x e X
i
= {x e X : i e t (x)}, m
i
(x) is a memory yarn satisfying:
( ) for all , ( ).
i
w x w x = ( ) em (2.4)
The memory function m
i
gives a memory yarn consisting of a finite number of memory
threads at each node for player i. The multiplicity of threads in a yarn describes uncertainty
at a point in time about the past.
In Fig.1.3, the memory yarn m
i
(x) consists of three memory threads. The first one is a long
one, the second and third are memory threads of short lengths. Condition (2.4) states that
the tails of any memory threads at a node x are identical to the correct piece w = (x). This is
interpreted as meaning that the player correctly perceives the current information piece.
Here, we mention four classes of memory functions and one specific one. In the first
memory function, which is the self-scope perfect-recall memory function, player i recalls
what information he received during the current game and what actions he took, but
nothing about the other players. For this example, we define player is own history: For a
node x e X
i
, let (x) = ((x
1
), a
1
), ..., ((x
m
), a
m
), (x
m+1
)), and let (x
k
1
, ..., x
k
l
, x
k
l+1
) be the i-part
of (x
1
, ..., x
m
, x
m+1
), i.e., the maximal subsequence of nodes in the path (x
1
, ..., x
m
, x
m+1
) to x
satisfying i e t (x
k
t
) for t = 1, ..., l+1. Then we define player is (objective) history of
information pieces and actions up to x by
1 1 1
( ) ( ( ), ),...,( ( ), ), ( ) .
l l l
i k k k k k
x x a x a x u
+
= ( ) (2.5)
(1) Self-scope
6
perfect-recall memory function: It is formulated as follows:
( ) { ( ) } for each .
spr
i i i
x x x X u = e m (2.6)
With the memory function
spr
i
m , player i recalls his own information pieces and actions
taken in the current play of the game. This memory function will have a special status in the
discourse of this paper. In the following, we call
spr
i
m the SPR function.
In Fig.2.2, the SPR function
1
spr
m for player 1 is given as:

0 3 1 1
3 3 1 1
4 4 5 5 1 1
( ) { }, and ( ) { ( , ), };
( ) { ( , ), } for 1, 2, and ( ) { ( , ), };
( ) { ( , ),( , ), } and ( ) { ( , ),( , ), }.
spr spr
spr spr
t t
spr spr
x w x w b u
z w a z t z w b z
z w b u a z z w b u b z
= ( ) = ( )
= ( ) = = ( )
= ( ) = ( )
m m
m m
m m
(2.7)
6
We have chosen the name self-scope to mean that he has only himself in his his scope. Of course we
allow for perfect recall memory functions where the player has other players in his scope.
Game Theory 94
At node x
3
, player 1 receives piece u and recalls his choice b at w. By the minimal
requirement M1, he knows the available actions A
w
= {a, b} and A
u
= {a, b}. Without adding
any other source than
1
,
spr
m player 2 does not appear in the scope of player 1. It will be
discussed that Fig.2.1 is an inductively derived view in this example.
The next example is the Markov memory function. As its name suggests, a player recognizes
only the present piece and forgets all after he moves.
(2) Markov memory function: It is formulated as
( ) { ( )} for each .
M
i i
x x x X = e m (2.8)
It gives only the present information piece. Nonetheless, by the minimal requirement M1,
the player can extract his available action set A
(x)
whenever he receives an information piece
(x).
For both
r
i
sp
m and ,
M
i
m we would have no difficulty in presuming that each player only
receives his own information pieces and gets the minimal information described by M1, M2
and M3. As we will see now, some other memory functions provide a player with
information about some other players information pieces and actions. The first such
memory function is the perfect-information memory function.
(3) Perfect-information memory function: This is formulated as
( ) { ( )} for each .
PI
i i
x x x X u = e m (2.9)
Recall that (x) is given by (2.1). Thus, if player i has this memory function, he recalls the
perfect history even including the other players pieces. By M1 and M2, he also knows the
available actions and the player who moves at each decision piece.
There are at least two possible interpretations of how he comes to know the perfect history.
One interpretation is that player i observes other players moves as the game is played.
Another interpretation is that player is information pieces contain the complete history, i.e.,
(x) is written on piece (x). Under either interpretation, a player gets more than the
minimal amount of information described in M1-M3.
The next memory function typically gives a player less information than the perfect
information memory function.
(4) Classical memory function: This memory function is formulated as
( ) { ( ) : and ( ) ( )} for each .
C
i i i
x y y X y x x X u = e = e m (2.10)
Observe that this function gives player i the set of complete histories up to nodes with his
current information piece. The multiplicity of memory threads can be interpreted as some
ambiguity about the past. This memory function can also be interpreted in the ways
suggested for
I
i
P
m . We should mention yet another interpretation which is the motivation
for the name classical memory function. In this interpretation, player i knows the
structure of the extensive game. Consequently, he can infer the set of possible complete
histories compatible with the present information piece. The classical memory function
together with this interpretation is less compatible with our inductive game theory than the
other memory functions. Since it is still mathematically allowed and is closer to the classical
game theory, we consider it.
Inductive Game Theory: A Basic Scenario 95
1 2
a b
z z
x
3 /
Fig. 2.3. False memory
The general definition of a memory function allows it to even involve false components. We
give one example of false memories using the following simple extensive game. Consider
the 1-person extensive game (, m
1
) depicted as Fig.2.3 with the identity information
function.
A false memory function m
1
is given as:
1 1 1 1 1 2 2
( ) { }, ( ) { ( , ), } and ( ) { ( , ), }. x x z x a z z x a z = ( ) = ( ) = ( ) m m m (2.11)
This m
1
takes a false value at z
2
, at which player 1 incorrectly recalls having chosen a at x
though he actually chose b at x.
Having described an extensive game and memory functions, we now have the basic
ingredients for objective descriptions and subjective personal views.
(Objective description): A pair (
o
,m
o
) is called an objective description iff
o
is an extensive
game in the strong sense and
1
( ,..., )
o o o
n
= m m m is an n-tuple of memory functions in
o
.
We use the superscript o to denote the objective description. We will put a superscript i to
denote a personal view of player i.
(Personal view): A pair (
i
,m
i
) is a personal view for player i iff
i
is an extensive game in the
weak sense specifying only the payoff function of player i, and m
i
is a memory function for
player i in
i
.
A personal view (
i
,m
i
) of player i describes the game player i believes he is playing. Since
his belief is based on his experiences, we do not include the memory functions or payoffs of
other players. We regard payoff values and memory values as personal.
7
2.3 Behavior patterns
Let (( , ),( , ), {( , )} ,( , ), )
x x x X
X W A N h t
e
I = < be an an extensive game in the weak or strong
sense and let m
i
be a memory function for player i e N. The extensive game and memory
function may be either the objective description or a personal view. We give a definition of a
behavior pattern to be applied to both cases.
We say that a function
i
on the set of nodes : { : ( )}
D D
i
X x X i x t = e e is a behavior pattern
(strategy) of player i iff it satisfies conditions (2.12) and (2.13):
for all , ( ) { : ( ) for some };
D
i i x x
x X x a A y a y X o e e e = e (2.12)
for all , , ( ) ( ) implies ( ) ( ).
D
i i i i i
x y X x y x y o o e = = m m (2.13)
7
As stated several times, we regard this as an alternative assumption adopted in the present discourse.
This can be extended to include other players as we have done in Kaneko-Kline [17].
Game Theory 96
Condition (2.12) means that a behavior pattern
i
prescribes an action leading to some
decision node. This slightly complicated statement is required since may be of the weak
sense
8
. Condition (2.13) means that a strategy depends upon local memories.
These are standard conditions for the definition of a strategy. We denote, by
i
, the set of all
behavior patterns for player i in . We say that an n-tuple of strategies = (
1
, ...,
n
) is a profile
of behavior patterns.
We use the term behavior pattern (strategy) to acknowledge that the behavior of a player
may initially represent some default behavior with no strategic considerations. Once, a
player has gathered enough information about the game, his behavior may become
strategic. This will be discussed in a remark in the end of Section 3.2.
In order to evaluate a behavior pattern, we introduce the concepts of compatible endnodes
and compatible endpieces. All evaluations of strategies in this paper will be done in terms of
compatible endpieces. Each behavior profile = (
1
, ...,
n
) determines the set of compatible
endnodes:
1 1 1
1 1
( ) { : ( ) ( ( ), ( )),...,( ( ), ( )), ( )
for the path ,..., , to }.
E
k k k
k k
z z X z x x x x x
x x x z
o u o o
+
+
= e = ( )
( )
(2.14)
Thus, the actions in the history (z) were prescribed by the behavior profile = (
1
, ...,
n
).
Each behavior profile also determines the set of compatible endpieces:
( ) { : ( ) for some ( )}. w x w x z o o = = e (2.15)
When is an extensive game in the strong sense, z() and () are singleton sets. However,
for extensive games in the weak sense, these sets may have multiple elements.
3. Bounded memory abilities and accumulation of local memories
In this section, we first define a domain of accumulation of short-term memories. This
definition is based on the presumption that a player has a quite restricted memory
capability. Theoretically, however, there are still many other possibilities. In Section 3.2, we
will give one informal theory about the accumulation of short-term memories as long-term
ones. This informal theory suggests a particular domain which we call the active domain,
which turns out to be linked to Nash equilibrium behavior, as will be shown in Section 7.2.
Informal and premathematical discussions of this type are intended to provoke further
discussions and debates over the appropriate domain(s) for consideration.
3.1 The objective recurrent situation and domains of accumulation of memories
Let an extensive game (( , ),( , ),{( , )} ,( , ), )
o o o o o o o o o o
x x x X
X W A N h t
e
I = < in the strong sense
and a profile
1
( ,..., )
o o o
n
= m m m of memory functions be the description of the objective
situation. The present purpose is to consider the accumulation of memories from playing in
(
o
, m
o
) repeatedly.
From the objective point of view, an individual player i has been experiencing short-term
memories:
8
If
x
is a surjection, then {a e A
x
:
x
(y) = y for some y e X} = A
x
. However, since a personal view may
satisfy only K330, we require this condition.
Inductive Game Theory: A Basic Scenario 97
1
1 1
1 1
... ( ),..., ( ) ( )
( ,
,..., ( ) ...
) at ( , ) at 1
t t
t t t t
i i i i
o o o o
x x
t
x x
t
+
+ +
I I +
A A
m m m
m
m
m
(3.1)
where
1
,...,
t
t t
x x ( )
A
is the realized sequence of player is nodes in the occurrence of (
o
, m
o
) at
time t. Due to bounded memory, player i will only accumulate some part of these as long-
term memories.
In the extensive game (
o
, m
o
), the domain of accumulation for player i is a nonempty subset D
i
of the set { : ( ) }
o o o
i
X x X x i t = e of nodes for player i. Player i is relevant in his own domain
D
i
iff D
i
contains at least one decision node for player i. This definition will be important
later in this paper.
A memory kit (T
D
i
, 1
D
i
) for domain D
i
is given by
( ); and { ( ) : }.
i i
i
o o
D i D i i
x D
T x x x D
e
= = e
*
m m 1 (3.2)
A memory kit is determined by both the domain of accumulation D
i
and the objective
memory function
o
i
m of player i. It will be the source for an inductive construction of a
personal view. The set T
D
i
of memory threads is used to construct a skeleton of the tree for a
personal view. The set 1
D
i
of yarns is used to construct a perceived memory function.
Mathematically speaking, the latter set gives the former, but we keep those two sets to
emphasize that they have different usages.
For a memory kit, we assume that player i has accumulated some incidences of short-term
memories as both threads and yarns. However, a kit includes neither a full record of short-
term memories nor frequencies. In Section 3.2, we will discuss one rationale for this
treatment.
Here, we give three domains of accumulation. The first two are trivial ones, and the third
example is the one we are going to explore in this paper.
(1): Full domain: This is simply given as the entire set
F o
i i
D X = of player is nodes. When
the game is small, is repeated often enough and also when the accumulation ability of player
i is strong enough, this domain may be appropriate.
(2): Cane domain: A cane domain is a complete set of nodes for player i on one play.
Formally, let
0
,...,
m
x x ( ) be the path to an endnode x
m
. Then the cane domain of player i to x
m
is
given as
0
{ ,... . , }
o
m i
x x X A cane domain may arise if every player behaves always
following some regular behavior pattern with no deviations.
Now, let
1
( ,..., )
o o o
n
o o o = be a profile of behavior patterns in the extensive game (
o
, m
o
).
Then, this
o
determines a unique path to an endnode. Hence, the cane domain for player i is
uniquely determined, which is denoted by ( ).
c o
i
D o Using this concept, we can define the
active domain relative to a profile of behavior patterns.
(3): Active domain: The active domain relative to a profile
1
( ,..., )
o o o
n
o o o = of behavior
patterns for player i is given as
( ) ( , ).
o
i i
A o c o
i i i i
D D
o
o o o

eE
=
*
(3.3)
Game Theory 98
Here,
o
i
E is the set of all behavior patterns for player i in (
o
, m
o
) and ( , )
o
i i
o o

is the profile
obtained from
o
by substituting
i
for
o
i
o in
o
. That is, the active domain ( )
A o
i
D o is the set
of nodes for player i that are reached by unilateral deviations of player i.
For a unified treatment of the above domains, we introduce one definition. We say that a
domain D
i
for player i is closed iff D
i
is expressed as some union of cane domains of player i.
The above three examples of domains are closed. A domain which is not closed is the set X
oE
of endnodes.
Example 3.1. Let us continue with the example of Fig.2.2. Let the regular behavior be given
by
1 0 1 3 1 2 2 2
( ) ( ) and ( ) ( ) .
o o o o
x x a x x c o o o o = = = = The cane domain and active domain of player
1 determined by
o
are given as
1 0 1 1 0 1 3
( ) { , } and ( ) { , , }.
c o A o
D x z D x z z o o = = (3.4)
The full domain is simply given as
1 1 0 3 1 2 3 4 5
{ , , , , , , }.
F o
D X x x z z z z z = =
The memory kit of player 1 depends also on his objective memory function
1
.
o
m For the
three domains mentioned above, the Markov and SPR memory functions, we have a total of
six memory kits. We mention two and leave the reader to consider the other four.
For the SPR function
1 1
spr o
= m m and the cane domain, we have
1
1
( )
{ , ( , ), },
c o
D
T w w a z
o
= ( ) ( )
and
1
1
( )
{{ },{ ( , ), }}.
c o
D
w w a z
o
= ( ) ( ) 1
For the Markov memory function
1 1
o M
= m m and the active domain, we have
1
( )
A o
D
T
o
=
1
1 3 1 3
( )
{ , , } and {{ },{ },{ }}.
A o
D
w z z w z z
o
( ) ( ) ( ) = ( ) ( ) ( ) 1
3.2 An informal theory of behavior and accumulation of memories
Our mathematical theory starts with a memory kit. Behind a memory kit, there is some
underlying process of behavior and accumulation of short-term memories. We now describe
one such underlying process informally, which justifies the active domain of accumulation.
This description is given in terms of some informal postulates.
(1): Postulates for behavior and trials: The first postulate is the rule-governed behavior of
each player in the recurrent situation ..., (
o
, m
o
), ..., (
o
, m
o
), ....
Postulate BH1 (Regular behavior): Each player typically behaves regularly followinghis
behavior pattern .
o
i
o
Player i may have adopted his regular behavior for some time without thinking, perhaps
since he found it worked well in the past or he was taught to follow it. Without assuming
regular behavior and/or patterns, a player may not be able to extract any causal pattern
from his experiences. In essence, learning requires some regularity.
To learn some other part than that regularity experienced, the players need to make some
trial deviations. We postulate that such deviations take place in the following manner.
Postulate BH2 (Occasional deviations): Once in a while (infrequently), each player
unilaterally and independently makes a trial deviation
o
i i
o eE from his regular behavior
o
i
o and then returns to his regular behavior.
Early on, such deviations may be unconscious or not well thought out. Nevertheless, a
player might find that a deviation leads to a better outcome, and he may start making
deviations consciously in the future. Once he has become conscious of his behavior-
deviation, he might make more and/or different trials.
Inductive Game Theory: A Basic Scenario 99
The set of trial deviations for a player is not yet well specified. In the remainder of this
paper, we explore one extreme case where he tries every possible behavior. The following
postulate is made for simplicity in our discourse and since it connects our theory to standard
game theory.
Postulate BH3 (All possible trials): Each player experiments over all his possible behaviors.
Postulate BH3 is an extreme case that each player tries all his alternative behaviors. We do
not take this as basic. The choice of a smaller set of trial deviations is very relevant, since a
player might not have prior knowledge of his available behaviors.
(2): Epistemic postulates: Each player may learn something through his regular behavior
and deviations. What he learns in an instant is described by his short-term memory. For the
transition from short-term memories to long-term memories, there are various possibilities.
Here we list some postulates based on bounded memory abilities that suggest only the
active domain of accumulation.
The first postulate states that if a short-term memory does not occur frequently enough, it
will disappear from the mind of a player. We give this as a postulate for a cognitive bound
on a player.
Postulate EP1 (Forgetfulness): If experiences are not frequent enough, then they would
disappear from a players mind.
This is a rationale for not assuming that a player has a full record of short-term memories, as
well as for the term short-term memory. This explains also the assumption that he cannot
keep the relative frequency of a short-term memory: It may remain for some short periods,
but if it is not reinforced by other occurrences or the player is very conscious, they may
disappear from his mind, i.e., many disappear. This means that a memory remaining after
some time loses relative positions with other memories and is isolated. Hence, it is difficult
to calculate its frequency relative to others.
In the face of the cognitive bound, only some memories become lasting. The first type of
memories that become lasting are the regular ones since they occur quite frequently. The
process of making a memory last by repetition is known as habituation.
Postulate EP2 (Habituation): A short-term (local) memory becomes lasting as a long-term
memory in the mind of a player by habituation, i.e., if he experiences something frequently
enough, it remains in his memory as a long-term memory even without conscious effort.
By EP2, when all players follow their regular behavior patterns, the short-term memories
given by them will become long-term memories by habituation.
The remaining possibilities for long-term memories are the memories of trials made by some
players. We postulate that a player may consciously spend some effort to memorize the
outcomes of his own trials.
Postulate EP3 (Conscious memorization effort): A player makes a conscious effort to
memorize the result of his own trials. These efforts are successful if they occur frequently
enough relative to his trials.
Postulate EP3 means that when a player makes a trial deviation, he also makes a conscious
effort to record his experience in his long-term memory. These memories are more likely to
be successful if they are repeated frequently enough relative to his trials. Since the players
are presumed to behave independently, the trial deviations involving multiple players will
occur infrequently, even relative to one players trials. Thus, the memories associated with
multiple players trials do not remain as long-term memories. This has the implication that
our experiential foundation is typically incompatible with the subgame perfect concept of
Selten [30], which will be discussed again in Section 9.
Game Theory 100
In sum, postulates EP1 to EP3 and BH1 to BH3 suggest that we can concentrate on the active
domain of a player.
Some other domains such as a cane domain and the full domain might emerge as candidates
in slightly different situations. For example, if no trials are made, then EP2(Habituation)
gives the cane domain corresponding to
o
. Alternatively, if the game is small enough and if
it is repeated enough, then each player has experienced every outcome. And if he has an
ability to recall all the incidences, then we would get the full domain. The additional
assumption of full recall seems plausible for very small games.
Remark (Default decision and all the possible behaviors): One may criticize our
treatments in that:
(1)
o
i
o has the total domain
o
i
X and
(2)
i
o varies over the entire
o
i
E of (3.3),
since these might conflict with the assumption of no a priori knowledge of the structure of
the game for player i.
We can answer (1) by interpreting one action at every decision node as a default action.
When a player receives an unknown (unfamiliar) information piece, he just takes the default
action. This assumption avoids a players need to plan for his behavior over the entire
domain.
We take (2) as a legitimate criticism, particularly, when the game is large. We have chosen
(3.3) as a working assumption in this paper.
4. Inductively derived views
In this section, we give a definition of an inductively derived (personal) view, which we
abbreviate as an i.d.view. Here, player i uses only his memory kit (T
D
i
, 1
D
i
) as a summary of
his experiences to construct an i.d.view. Before the definition, we talk about our basic
principles to be adopted in this paper. After the definition, we will consider various
examples to see the details of the definition.
4.1 Observables, observed, and additional components
The central notion in inductive game theory is the process of inductive inferences. An
inductive inference is distinguished from a deductive inference in that the former allows
some generalization of observations by adding some hypothetical components, while the
latter changes expressions following well-formed inference rules and keeps the same or less
contents. A player, i, having a memory kit (T
D
i
, 1
D
i
) may add some hypothetical components
to the kit in his inductive process to develop a personal view.
The need for this addition of hypothetical components may be found in the assumption that
a player can only observe some elements of the objective extensive game
o
. As remarked in
Section 2.2, only information pieces and actions are observable for each player, while nodes
are hypothetical and unobservables. In addition, many or some pieces and actions do not
end up in the memory kit. Pieces and actions only along some of the paths in a game tree are
more likely observed for players. Moreover, the bounds on their memory capabilities will
allow them to accumulate memories of only some of what they have observed. The memory
kit (T
D
i
, 1
D
i
) for player i is the collection of observed parts effectively remaining in the mind
of player i.
Inductive Game Theory: A Basic Scenario 101
Since player i describes his view (
i
, m
i
) as an extensive game in the weak sense with a
memory function, he needs to invent a tree structure by adding hypothetical nodes. In this
sense he already goes beyond deductive inferences. To construct a coherent view, a player
may add other components, e.g., more information pieces, actions, and possible histories to
his memories. In this paper, however, we adhere to the basic principle that only elements in
the memory kit (T
D
i
, 1
D
i
) can be used as the observables in (
i
, m
i
). In Section 4.2, we will
adopt a specific inductive process called the initial-segment procedure and use this procedure
to define an i.d.view. With this procedure, a player forms the underlying skeletal structure
of his view by adding hypothetical nodes.
4.2 Definition and examples
Now, consider the recurrent situation of (
o
, m
o
) illustrated in Fig.1.2. Here,
o
I =
(( , ), ( , ), {( , )} , ( , ), { } )
o o
o o o o o o o o o
x x i
x X i N
X W A N h t
e e
< is an extensive game in the strong
sense and
1
( ,..., )
o o o
n
= m m m is an n-tuple of memory functions. Recall that a personal view is
given as a pair (
i
, m
i
), where (( , ),( , ), {( , )} , ( , ), )
i
i i i i i i i i i i
x x
x X
X W A N h t
e
I = < is an
extensive game in the weak sense specifying only the payoff function h
i
of player i and m
i
is a
memory function for player i in that game. We assume that player i uses his memory kit
(T
D
i
, 1
D
i
) in the sense of (3.2) to construct his personal view (
i
, m
i
).
Strictly speaking, we will not consider the precise process of inductive derivation of a view
(
i
, m
i
). Instead, we consider possible candidates of (
i
, m
i
) for the result of inductive
derivation. For the definition of such a candidate, we need a bridge between (T
D
i
, 1
D
i
) and
(
i
, m
i
). We can think of various procedures to have such bridges, but we will use one
procedure, called the initial-segment procedure, as stated in Section 4.1. It will become clear
shortly why we have chosen this name.
First, for a given candidate (
i
, m
i
), we define the set (
i
) of possible histories in
i
:
( ) { ( ) : },
i i i
y y X u O I = e (4.1)
where
i
(y) = ((w
1
, a
1
), ..., (w
m
, a
m
),w
m+1
) is the complete history up to y in
i
. With the initial-
segment procedure, we will connect (
i
) with T
D
i
.
For the sake of rigor, we make the following definitions. First, a subsequence of [(w
1
, a
1
), ...,
(w
m
, a
m
)] is simply defined in the standard manner by regarding each (w
t
, a
t
) as a component
of the sequence. Second, ((w
1
, a
1
), ..., (w
m
, a
m
),w
m+1
) is said to be a subsequence of ((v
1
, b
1
), ...,
(v
k
, b
k
), v
k+1
) iff [(w
1
, a
1
), ..., (w
m
, a
m
), (w
m+1
, a)] is a subsequence of [(v
1
, b
1
), ..., (v
k
, b
k
), (v
k+1
, a)]
for some a. A supersequence is defined in the dual manner. We say that ((w
1
, a
1
), ..., (w
m
, a
m
),
w
m+1
) is a maximal sequence in a given set of sequences iff there is no proper supersequence in
that set. An initial segment of ((w
1
, a
1
), ..., (w
m
, a
m
),w
m+1
) is a subsequence of the form ((w
1
, a
1
),
..., (w
k
, a
k
),w
k+1
) and k m.
Now, we can define the set of initial segments of memory threads in T
D
i
as:
: { , : , is an initial segment of some maximal sequence in }.
i i
D D
T w w T
-
= ( ) ( )
(4.2)
We require (
i
) to be the same as
i
D
T
-
for
i
to be inductively derived from T
D
i
. This is why
the following is called the initial-segment procedure. A player uses all his initial segments in
T
D
i
to construct the histories in
i
.
Game Theory 102
We now give the full set of requirements for an inductively derived personal view based on
the initial-segment procedure. As mentioned above, we will give a more general definition
of an i.d.view in another paper, which will allow for other inductive procedures (see Section
9.3). In the following definition, we assume that player i is relevant in his own domain D
i
,
i.e., D
i
contains at least one decision node of player i.
Definition 4.1 (Inductively derived view). A personal view (
i
, m
i
) for player i is inductively
derived from the memory kit (T
D
i
, 1
D
i
) iff
P1(Construction of an extensive game):
i
is an extensive game in the weak sense satisfying:
(a)(Preservation of the informational structure): ( ) ;
i
i
D
T
-
O I =
(b)(Action sets):
( )
for each ;
i
i o i
x
x
A A x X

= e
(c)(Player assignment at decision nodes): ( ) ( ) for all ;
i i o i iD
x x x X t t = e
(d)(Own Payoffs): ( ) ( ) for each ;
i i o i iE
i
h x h x x X = e
P2(Construction of a memory function): m
i
is a memory function on {
i i
i
X x X = e
: ( )}
i i
i x t e satisfying:
(a)(Preservation of memory yarns): { ( ) : } ;
i
i i
i D
x x X e _ m 1
(b)(Internal consistency): ( ) ( ) for any ;
i i i
i
x x x X u e e m
(c)(Dependence up to observables): if ( ) ( ), then ( ) ( ).
i i i i
x y x y u u = = m m
We abbreviate an inductively derived view as an i.d.view.
For an i.d.view, the extensive game
i
is constructed based on the set
i
D
T
-
of initial segments
of maximal memory threads in T
D
i
. P1a states that the game tree is based on
i
D
T
-
. Conditions
P1b, P1c, P1d are the minimum requirements M1, M2, M3 stated in Section 2.1. By P1c and
K42, the player set for
i
is determined as
{ : ( ) for some }.
i o i i iD
N j N j x x X t = e e e
Since
i
is a surjection from X
i
to W
i
by K2, and since ( )
i
i
D
T
-
O I = by P1a, we have W
i
_ W
o
.
Hence, P1b and P1c are well-defined. For the well-definedness of P1d, it should hold that for
any x e X
iE
, the associated piece
i
(x) is an endpiece in the objective game
o
.
The personal memory function m
i
is constructed based on the set 1
D
i
of memory yarns. This
principle explains condition P2a, while player i is not required to use all of them. Condition
P2b states that each yarn m
i
(x) should contain the complete history
i
(x). The reason for this
is that (
i
,m
i
) is now in the mind of player i and can be seen by player i as the objective
observer. Still, P2b is one alternative among several possible internal consistency
requirements. Condition P2c is more basic, stating that his subjective memory yarns should
include no elements additional to what, he believes, have been observed in the play in his
view
i
.
An analogy with a jigsaw puzzle may help understand the above definition of an i.d.view.
Treating the memory threads as the picture on each piece and memory yarnsas pieces in a
jigsaw puzzle, a player tries to reconstruct an extensive game, though his memory kit may
be very incomplete and does not allow him to reach a meaningful view.
To see how an i.d.view is obtained, we look at several examples.
Example 4.1 (SPR function
1
).
spr
m For this memory function, any i.d.view will be a 1-person
game played by player 1, even if the objective game (
o
, m
o
) involves multiple players.
Inductive Game Theory: A Basic Scenario 103
Consider this memory function on the cane domain described in Example 3.1. The memory
kit is given as
1 1
1 1
( ) ( )
{ , ( , ), }, and {{ }, { ( , ), }}.
c o c o
D D
T w w a z w w a z
o o
= ( ) ( ) = ( ) ( ) 1
Then
1 1
( ) ( )
,
c o c o
D D
T T
o o
-
= and an i.d.view is given as Fig.4.1. It consists of the set of nodes
1 1 1 1 1 1
0 1 0 1 1 1 1
{ , }, ( ) , ( ) , ( ) ( ) {1}, ( ) 2 X y y y w y z w z h z t t = = = = = = and his memory function
is given as
1 1
0 1 1
( ) { } and ( ) { ( , ), }. y w y w a z = ( ) = ( ) m m Since
0
1
{ , }
o
y w
A A a b = = by P1b, condition
K33 (bijection requirement) is violated, but K33
0
is satisfied.
1
0
a
y
y
|
1 1
0 0
a a
y y
y y
'
| |
'
Fig. 4.1. Cane. Fig. 4.2. Duplicated.
Now, let us observe that some multiplicity of i.d.views is involved in Definition 4.1, which is
caused by the use of hypothetical elements of nodes. In the original game (
o
, m
o
) as well as
in the derived game (
1
, m
1
), the nodes are unobservable and auxiliary. We can use different
symbols for y
0
and y
1
without changing the informational structure of the game; the cane
with nodes
0
y' and
1
y' differs from the cane of Fig.4.1. This causes also another type of
multiplicity; the game having the duplication of (
1
, m
1
) described in Fig.4.2 satisfies all the
requirements of Definition 4.1. We will introduce the concept of a game theoretic p-
morphism in Section 6 as a means for dealing with those types of multiplicity.
The definition of an inductive derivation based on the initial-segment procedure may not
work to deliver an i.d.view. Here, we give two negative examples and one positive one.
Example 4.2 (Markov memory function :
M
i
m General failure). Let player i have the
Markov memory function .
M o
i i
= m m . Suppose that player i is relevant in his domain D
i
in
o
,
i.e., D
i
has at least one decision node y. Let
o
(y) = w. Since
M
i
m is the Markov memory
function, we have { : ( ) and }.
i i
o
D D i
T T v x v x D
-
= = ( ) = e This prevents player i from having
an i.d.view, since all elements in
i
D
T
-
have no successors but
o
(y) = w cannot have a payoff,
i.e., P1d cannot be satisfied.
Example 4.3 (Perfect information memory function
1
:
PI
m Full recoverability). Let player 1
have the perfect-information memory function
1
PI
m and let the domain be the full domain
1 1
F o
D X = in the game of Fig.2.2. In this case, player 1 can reconstruct the objective game
o
from his memory kit, except for player 2s payoffs and memory function. This full-
recoverability result can be generalized into any game.
When player i has the classical memory function
C
i
m and the full domain ,
F
i
D we have also
the full-recoverability result. When the domain D
i
is smaller than ,
F
i
D we may encounter
some difficulty.
Example 4.4 (Classical memory
1
C
m with the cane domain: failure). Let player 1 have the
classical memory function
1
C
m on the cane domain
1 1 0 1
( ) { , }
c c o
D D x z o = = of (3.4) in Example
3.1. Then
1
{ , ( , ), , ( , ), };
c
D
T w w a v w b v = ( ) ( ) ( ) one candidate for an i.d.view is described as
Fig.4.3, which violates conditions K2 and K31. Thus, there is no i.d.view in this case.
Game Theory 104
1
1 2
0
: :
:
c
a b
z
y v y v
y w
|
3 /
Fig. 4.3. Failure with
1
C
m
5. Direct views
In Section 4, we gave the definition of an inductively derived view for a given memory kit
(T
D
i
, 1
D
i
) and found that there may be many i.d.views for each (T
D
i
, 1
D
i
). In this section, we
single out one of those views which we call the direct view. We will argue that it has a
special status among i.d.views or simply among views. Here, we give some results for a
direct view to be an i.d.view. In Section 6, we will show that our analysis of direct views is
sufficient to describe the game theoretic contents of any i.d.view.
A direct view for a given memory kit (T
D
i
, 1
D
i
) is constructed by treating each thread in
i
D
T
-
as a node in the derived game. As in Section 4, we assume that player i is relevant in his own
domain D
i
.
Definition 5.1 (Direct view). A direct view ( , ) (( , ), ( , ), {( , )} ,
d
d d d d d d d d
x x
x X
X W A
e
I = < m
( , ), %), )
d d d d
N h t m from a memory kit (T
D
i
, 1
D
i
) is defined in the following manner:
d1: ;
i
d
D
X T
-
=
d2: , , iff ,
d
v w v q ( ) < ( ) ( ) is a proper initial segment of , ; w q ( )
d3 (Information function): , for all , ;
d d
v v v X ( ) = ( ) e and { : ,
d d
W v v X = ( ) e for some };
d4 (Action sets):
,
for all , ; and if , ,
d o d dD
v v
A A v X v X


( )
= ( ) e ( ) e then
,
,( , ),
d
v
v a u a


( )
( ) =
for each immediate successor ,( , ), of , ; v a u v ( ) ( )
d5 (Player assignment): ( ) ( ) for all , ; and ( )
d o dD d d
v v v X v N t t t = ( ) e = for all , ,
dE
v X ( ) e
where { : ( )
d o
N j j v t = e for some , };
dD
v X ( ) e
d6 (Payoff function): for any , , if ( )
dE o
v X x v ( ) e = for some , then ( ) ( );
oE d o
i
x X h v h v e =
and otherwise, h
d
(v) is arbitrary;
d7 (Memory function): for any node , in , if some
i
d
i D
v X ( ) e y 1 contains , , then ,
d
v v ( ) ( ) m
is such a ;
i
D
e y 1 and otherwise, , { , }.
d
v v ( ) = ( ) m
In the following, (( , ),( , ),{( , )} ,( , ), )
d
d d d d d d d d d d
x x
x X
X W A N h t
e
I = < defined by d1to d6 is
called a direct structure, and m
d
defined by d7 is a direct memory function.
Condition d6 has an arbitrariness if some ,
dE
v X ( ) e does not come from an endpiece in
o
.
If this is avoided, i.e., a direct structure is an extensive game in the weak sense, it is uniquely
determined. Condition d7 may still allow multiple memory functions.
A direct view (
d
, m
d
) for (T
D
i
, 1
D
i
) may not be a personal view; specifically, conditions K2
and K31 may be violated. Example 4.4 violates K2 and K31, and also, when the objective
Inductive Game Theory: A Basic Scenario 105
w
1
), , ( z a w
a
Fig. 5.1. Unique direct view.
memory function is the Markov, a direct view always violates K31. In Theorem 5.2, we will
give a condition for a direct view to be a personal view as well as an i.d.view.
Another important comment is about the avoidance of additional hypothetical components
such as nodes. It is directly constructed from the components in the memory kit, focusing
the initial segments of memory threads in T
D
i
. Consequently, the complete history up to
each node x e X
d
is the same as x itself, which is stated as Lemma 5.1.
Lemma 5.1. For any direct structure
d
,
d
(x) = x for all x e X
d
.
Proof. Let x e X
d
. By d1, x = (, v) = ((w
1
, a
1
), ..., (w
k
, a
k
), v) is an initial segment of a maximal
thread in T
D
i
. The path to (, v) is (w
1
), ((w
1
, a
1
),w
2
), ..., ((w
1
, a
1
), ..., (w
k1
, a
k1
),w
k
), (, v). The
complete history up to (, v) is the sequence ((w
1
, a
1
), ..., (w
k1
, a
k1
), (w
k
, a
k
), v), which is x
itself.
Let us now look at an example of a direct view.
Example 4.1 (continued): In Fig.4.1 and Fig.4.2, we gave two examples of i.d.views for
player 1. This example has a unique direct view, which is given in Fig.5.1 and is an i.d.view
with the memory function m
i
(x) = {x} for all x e X
d
.
Now, we give conditions for a direct view to be an i.d.view. Recall the assumption that
player i is relevant for his own domain D
i
.
Theorem 5.2 (Conditions for a direct view to be I.D.): Let (T
D
i
, 1
D
i
) be a memory kit.
(i): The direct structure
d
for (T
D
i
, 1
D
i
) is uniquely determined and is an extensive game in
the weak sense satisfying P1a-P1d if and only if for any maximal (, v) in ,
i
D
T
-
v =
o
(x) for
some x e X
oE
.
(ii): Let
d
be a direct structure for T
D
i
. There there is a direct memory function m
d
for
d
satisfying P2a-P2c if and only if for any (, w) e
i
D
T
-
with i et
o
(w),
there is an such that , ( ).
o
i i
x D w x e ( ) em
(5.1)
This theorem will be proved at the end of this section. The part (i) states that a condition for
the unique determination of a direct structure is that every maximal thread in
i
D
T
-
occurs at
an endnode in the objective game. The part (ii) gives a necessary and sufficient condition for
a direct memory function prescribed by d7 to satisfy P2a-P2c. When both of these conditions
are satisfied, there is a direct view that is i.d., but there is still, however, some arbitrariness
in the memory function, which allows for multiple direct views. This is shown by Example 5.1.
Example 5.1. Consider the objective 1-person sequential move game of Fig.5.2. Here, the
information function is given by
o
(y
t
) = v for t = 1, 2, and it is the identity function
everywhere else. Suppose that the domain of accumulation is the full domain
1 1
.
F o o
D X X = =
Game Theory 106
3
y
4
y
5
y
6
y
v y :
1
v y :
2
0
y
b a
a a b b
Fig. 5.2. 1-person game.
Let the objective memory function
1

o
m be defined by:
1 0 0
0 0
{ ( )} if 1, 2;

( ) { ( , ), , ( , ), , } if 1;
{ ( , ), , ( , ), } if 2.
o
t
o
t
y t
y y a v y b v v t
y a v y b v t
u

= ( ) ( ) ( ) =

( ) ( ) =

m (5.2)
In this example, the direct structure
d
is uniquely determined, which has the same structure
as Fig.5.2 consisting of nodes
o
(y
1
), ...,
o
(y
6
). However, a memory function has some
arbitrariness at the nodes
o
(y
1
) and
o
(y
2
). For example, assigning the memory m
d
(
o
(y
1
)) =
1
o
m (y
2
) and m
d
(
o
(y
2
)) =
1
o
m (y
1
), together with m
d
(
o
(y
t
)) =
o
i
m (y
t
) for t 1 and t 2, gives one
i.d.direct view. In this view, the player mixes up his memories at y
1
and y
2
. In Section 8.2, we
will see how this mixing up may create some difficulties. Another view is where he assigns
his memory yarns correctly. Still two other views are obtained if he assigns one memory
yarn to each of those nodes.
We now introduce two conditions on a memory function, that we will use in combination
with Theorem 5.2 to provide a sufficient condition for the uniqueness of a direct view.
(Recall of past memories - RPM): for all x, y e ,
i
o
X if (, w)e
o
i
m (x) and x <
o
y, then (, w) is
a proper initial segment of some (, v) e
o
i
m (y).
(Single thread yarns - STY): |
o
i
m (x)|= 1 for all x e .
i
o
X
The first condition states that every memory thread occurring at a node x of player i will
occur as a subsequence of a thread at any later node y of player i. This is interpreted as
meaning that player i recalls what past memories he had in the current play of the game.
The second condition is simply that each yarn consists of a single thread.
The following corollary gives a sufficient condition for the unique determination of a direct
view, which guarantees that it is an i.d.view.
Corollary 5.3. Let D
i
be a closed domain, and let (T
D
i
, 1
D
i
) be a memory kit determined by a
memory function
o
i
m satisfying RPM and STY. Furthermore, suppose the latter part, (5.1),
of Theorem 5.2.(ii). Then, the direct view (
d
, m
d
) is uniquely determined by d1-d7, and
m
d
(x) = {x} for all x e .
d
i
X Moreover, (
d
, m
d
) is an i.d.view.
Inductive Game Theory: A Basic Scenario 107
It is straightforward to check that the SPR function
spr
i
m and the perfect-information
memory function
PI
i
m on a closed domain satisfy the conditions of Corollary 5.3. Thus, in
those cases, we can speak of a unique direct view. We prove this corollary after proving
Theorem 5.2.
Proof of Theorem 5.2.(i) (If): Suppose that for any maximal (, v) in ,
i
D
T
-
v =
o
(x) for some
x e X
oE
. Under this supposition, we first show that the direct structure is a uniquely
determined extensive game in the weak sense.
Let
d
be a direct structure satisfying d1 to d7. First, observe that the verification of each of
K11 to K13 is straightforward by d1, d2, the non-emptiness of D
i
and the finite number of
threads for each yarn of the memory function .
i
o
m Condition K2 follows from K2 for
o
, d1,
d2, d3, condition (2.3) for ,
i
o
m and the supposition of the if part. Condition K31 also follows
from the supposition of the if part together with K31 on
o
and d4. Conditions K32 and K33
0
follow from d1, d2, d3, and d4. K4 uses d5 and d6. Finally, condition K5
0
follows from d6.
The supposition of the if part implies the payoff function
d
i
h is uniquely determined by d6.
Thus, we have shown that the direct structure
d
is determined uniquely as an extensive
game in the weak sense.
Next we show that P1a holds. By Lemma 5.1, (
d
) = X
d
, and by d1, X
d
= .
i
D
T
-
Hence, (
d
) =
.
i
D
T
-
The other parts of P1 follow immediately from the definition of a direct structure.
(Only-if): Suppose that there is a maximal (, v) in
i
D
T
-
and v =
o
(x) for some x e X
oD
. By K33
for
o
, .
o
x
A = C By d4, we have
,
.
d o
x v
A A

= = C However, (, v) e X
dE
since (, v) is maximal
in .
i
D
T
-
Hence, K31 is violated for
d
, and thus
d
is not an extensive game in the weak sense.
(ii)(If): Suppose that for any (, w) e
i
D
T
-
with i e t
o
(w), there is an x e D
i
such that (, w) e
( ).
i
o
x m Then we can define m
d
(, w) = ( ).
i
o
x m This is a direct memory function of player i for
the direct structure
d
, since it associates a memory yarn from 1
D
i
to each (, w) e
i
D
T
-
= .
d
i
X
Then, P2a and P2b are satisfied since by Lemma 5.1,
d
(, w) = (, w). Finally, m
d
satisfies P2c,
since by Lemma 5.1,
d
(, w)=
d
(, v) implies (, w) = (, v).
(Only-if): If m
d
is a direct memory function for
d
, then the result follows by P2a and P2b
for m
d
.
Proof of Corollary 5.3. The right-hand side of Theorem 5.2.(i) is equivalent to that if (, w) e
i
D
T
-
and
o
(x) = w for some decision node x e D
i
, then (, w) is not maximal in
i
D
T
-
. Let (, w)
e
i
D
T
-
and suppose that
o
(x) = w for some decision node x e D
i
. Then either (, w) is a
proper initial segment of some (, v) e
i
D
T
-
, or (, w) e T
D
i
. In the first case, (, w) cannot be
maximal in
i
D
T
-
. Suppose that (, w) e T
D
i
. Then, (, w) e ( )
i
o
x' m for some x e D
i
. By K2,
(2.4), and the supposition that
o
(x) = w for some decision node x e D
i
, it follows that x must
also be a decision node in D
i
. Then, by closedness we have a z e D
i
with x <
o
z. By RPM,
there is a (, v) e ( )
i
o
z m such that (, w) is a proper subsequence of (, v). Thus, (, w) is not
maximal in
i
D
T
-
.
By Theorem 5.2.(i), the direct structure
d
is uniquely determined and is an extensive game
in the weak sense satisfying P1a-P1d. It remains to show that the memory function m
d
(x) =
{x} is the only memory function for
d
that satisfies P2. By the supposition in the corollary
that for any (, w) e
i
D
T
-
with i e t
o
(w), there is an x e D
i
, it follows by Theorem 5.2.(ii) that
there is a direct memory function for
d
that satisfies P2. By STY, m
d
(x) = {x} is the only
possible memory function for
d
.
Game Theory 108
6. Game theoretical p-morphisms: comparisons of views
In this section, we will show that for any i.d.view (
i
, m
i
), there is a direct i.d.view (
d
, m
d
)
having the same game theoretical structure. This result reduces the multiplicity of i.d.views,
and allows us to concentrate on the direct views for our analysis of i.d.views. For example,
the existence of an i.d.view is equivalent to the existence of a direct i.d.view. This
consideration will be possible by introducing the concept of a game theoretical p-morphism,
which is a modification of a p-morphism in the modal logic literature (cf. Ono [26] and
Blackburn-de Rijke-Venema [3]). We call it simply a g-morphism.
6.1 Definition and results
In the following definition, we abbreviate the superscript i for each component of (
i
, m
i
) and

( , )
i i
I m to avoid unnecessary complications.
Definition 6.1 (Game theoretical p-morphism): Let (, m) and ( , )
i i
I m be personal views of
player i. A function fromX to

X is called a g-morphism (game theoretical p-morphism) iff


g0: is a surjection from X to

X ;
g1: for all x, y e X and a e A
x
, x <
a
y implies (x)

<
a
(y);
g2: for all


, x y X e , y e X and

,
x
a A e


a
x y < and

y = (y) imply x <


a
y and

x = (x) for some x e X;


g3 (Information pieces):

( ) ( ) x x = for all x e X;
g4 (Action sets):
( )

x x
A A = for all x e X;
g5 (Player assignment):

( ) ( ) x h x t = for all x e X;
g6 (Payoff function):

( ) ( ) h x h x = for all x e X
E
;
g7 (Memory function):

( ) ( ) x x = m m for all x e X
i
.
We say that (, m) is g-morphic to

( , ), I m denoted by (, m)

( , ), I m iff there is a g-
morphism from (, m) to

( , ). I m
A g-morphism compares one personal view to another one. When a g-morphism exists
from (, m) and

( , ), I m the set of nodes in is mapped onto the set of nodes in

, I while the
game theoretic components of (, m) are preserved. Since is a surjection from X to

, X we
cannot take the direct converse of g1, but we take a weak form, g2, which requires that the
image

( , ) I m should not have any additional structure. In sum, the mapping embeds
(, m) into

( , ) I m without losing the game structure. Nevertheless, a g-morphism allows a


comparison of quite different games.
In the modal logic literature, the concept of a p-morphism is used to compare two Kripke
models and their validities. As mathematical objects, Kripke models and extensive games
have some similarity in that their basic structures are expressed as some graphs (or trees)
(cf., Ono [26] and Blackburn at el [3]). In our case, the other game theoretical components
including a memory function are placed on the basic tree structure. Therefore, we require
our g-morphism to preserve those components, i.e., g3-g7. It will be seen that this concept is
useful for comparisons of i.d.views for a given memory kit.
Let us consider a few examples to understand g-morphisms.
Example 6.1 (Infinite number of p.v.s g-morphic to a given p.v.). Given a personal view (,
m), we can construct a larger personal view by simply replicating (, m). The replicated
game with twice as many nodes is g-morphic to (, m); for example, Fig.4.2 is obtained from
Inductive Game Theory: A Basic Scenario 109
Fig. 6.1. Non-trivial g-morphism.
Fig.4.1 by replication. By this method, we can construct personal views of any size that are
g-morphic to (, m). Thus, there are an infinitely many personal views g-morphic to (, m).
The following is a less trivial example than the above.
Example 6.2. Fig.6.1 gives a g-morphism between two 1-person games, where the memory
function for each personal view is assumed to be the perfect-information memory function
m
PI
. Define as the identity mapping everywhere except
1
( ) x' = x
1
and
2
( ) x' = x
2
. This
is a g-morphism from the left game to the right game.
Here, we give an example where two i.d.views have no g-morphisms. The fact is caused by
attached memory functions.
Example 6.3 (Negative example). Consider the objective description of Example 5.1. In this
case, the player has four distinct direct views, each of which is an i.d.view. The direct
structure is uniquely determined, but there are four possible direct memory functions. No g-
morphisms are admitted between each pair of direct views.
Now, we show that a g-morphism fully preserves the i.d.property. All the results presented
here will be proved in Section 6.2.
Theorem 6.1 (Preservation of the i.d. property). Suppose that (, m) is g-morphic to

( , ). I m
Then, (, m) is an i.d.view for (T
D
i
, 1
D
i
) if and only if

( , ) I m is an i.d.view for (T
D
i
, 1
D
i
).
It follows from this theorem and Example 6.1 that if a given memory kit (T
D
i
, 1
D
i
) admits at
least one i.d.view, then there are, in fact, an infinite number of i.d.views for (T
D
i
, 1
D
i
). Thus,
we should consider which i.d.views are more appropriate than others. We will see that the
direct views have a special status among the i.d.views. Before that, we give the following
simple but basic observations, which can be proved just by looking at the definitions
carefully.
Lemma 6.2.(1): The g-morphic relation satisfies reflexivity and transitivity.
(2): Suppose that (, m)

( , ), I m i.e., (, m)

( , ) I m and (, m)

( , ). I m Then the g-
morphism from (, m) to

( , ) I m satisfies
g0*: is a bijection from X to

X ;
g1*: for all x, y e X and a e A
x
, x <
a
y if and only if (x)

<
a
(y).
By (1), the relation is an equivalence relation over personal views. We can use this
relation to consider the equivalence classes of personal views. Any two views in one
equivalence class are isomorphic in the sense of g0*, g1* and g3-g7, where g2 is included in
g1*. These two views are identical in our game theoretical sense except for the names of nodes.
Game Theory 110
In the next theorem we show that every i.d.view is g-morphic to a direct view.
Theorem 6.3. (g-Morphism to a direct personal view). Let (T
D
i
, 1
D
i
) be a memory kit. For
each i.d.view (, m), there is a direct view (
d
, m
d
) such that (
d
, m
d
) is a personal view and (,
m) is g-morphic to (
d
, m
d
).
The direct view (
d
, m
d
) given in Theorem 6.3 is also an i.d.view for (T
D
i
, 1
D
i
) by Theorem
6.1. This has the implication that we can focus our attention on direct views without loss of
generality. The following corollary states that the existence of an i.d.view is characterized by
the existence of a direct i.d.view which in turn is characterized by Theorem 5.2.
Corollary 6.4. (Existence of an i.d.view). Let (T
D
i
, 1
D
i
) be a memory kit. There is an i.d.view
for (T
D
i
, 1
D
i
) if and only if there is a direct view that is an i.d.view for (T
D
i
, 1
D
i
).
6.2 Proofs of the results
First, we start with giving a simple observation.
Lemma 6.5. Let be a g-morphism from (, m) to

( , ) I m . Then x e X
D
if and only if
(x) e

.
D
X
Proof. Let x e X
D
. Then x has an immediate successor. Thus,
x
A = C by K33
0
, which implies
( )

x
A = C by g4. By K31,

( ) .
D
x X e The converse follows by tracing back this argument
starting with

( ) .
D
x X e
The next lemma translates g1 and g2 into the corresponding
1
g and
2
g in terms of the
immediate successor relation .
I
a
<
Lemma 6.6. Suppose that is a g-morphism from (, m) to

( , ). I m Then:

g1 : for all , and , implies ( ) ( );




g2 : for all , , and ,

and ( ) imply and ( ) for some .
I I
x a a
x
I I
a a
x y X a A x y x y
x y X y X a A
x y y y x y x x x X
e e < <
e e e
< = < = e
Proof. g1: Let
I
a
x y < for some x, y e X. Now, on the contrary, suppose that

( ) ( )
a b
x z y < <
for some

z and b. Then, by g2, there is some z e X such that

( ) and z .
b
z z y = < . By K12 for
, we have
a b
x z y < < or .
b a
z x y < < The first case, ,
a b
x z y < < is impossible since it
contradicts .
I
a
x y < In the second case, we have

( )
b
z y < by g1, and then, by

( )
a
x z, < we
have

z z < by the transitivity of K11 for

I , which contradicts the irreflexivity of K11 for

I .
Thus, we must have

( ) ( ).
I
a
x y <
g2: Let

I
a
x y < and

( ) y y = for some



, , and .
x
x y X y X a A e e e By g2, there is some x e X
such that x <
a
y and

( ). x x = Now, on the contrary, suppose that x <


a
z <
b
y for some z and
b. Then, by g1, we have

( ) ( ) ( ),
a b
x z y < < which is a contradiction to

.
I
a
x y < Thus, we
must have .
I
a
x y <
The next lemma makes use of the previous one.
Lemma 6.7. Suppose that is a g-morphism from (, m) to

( , ). I m Then:
(1): If (x
1
, ..., x
m
) is a path in (, m), then ((x
1
), ..., (x
m
)) is a path in

( , ) I m and (x
t
) =

u (x
t
) for t = 1, ...,m.
(2): If (

x
1
, ...,

x
m
) is a path in

( , ), I m then there is a path (x


1
, ..., x
m
) in (, m) such that
(x
t
) =

x
t
and (x
t
) =

u (

x
t
) for t = 1, ...,m.
Inductive Game Theory: A Basic Scenario 111
Proof.(1) Let (x
1
, ..., x
m
) be a path in (, m). Then there are a
1
, ..., a
m1
such that
1
t
I
t a t
x x
+
< for
t = 1, ..., m1. Thus, (x
t
)

t
I
a
< (x
t+1
) for t = 1, ...,m1 by g1 of Lemma 6.6. This means that
((x
1
), ..., (x
m
)) is a path in

( , ) I m and, by g3, (x
t
) =

u (x
t
) for t = 1, ...,m.
(2) Let (

x
1
, ...,

x
m
) be a path in

( , ). I m Then there are a


1
, ..., a
m1
such that
1

t
a t
I
t
x x
+
< for
t = 1, ...,m 1. Then, by g0, we can choose an x
m
e X with (x
m
) =

x
m
. Then, applying g2 of
Lemma 6.6 to the last pair (

x
m1
,

x
m
) and (x
m
) =

x
m
, there is anx
m1
e X such that (x
m1
) =

x
m1
and
1
1
.
m
m
I
m a
x x

< Repeating this argument (exactly speaking, by mathematical


induction), we construct (x
1
, ..., x
m
) with
1
t
I
t a t
x x
+
< for t = 1, ...,m 1 and (x
t
) =

x
t
for
t = 1, ..., t. This is a path in (, m) having the required properties.
We have the immediate result from Lemma 6.7 that the mapping preserves the complete
histories of information pieces and actions, and the values of the memory yarns.
Lemma 6.8. Suppose that is a g-morphism from (, m) to

( , ). I m Then:

(a) : ( ) ( );

(b) : { ( ) : } { ( ) : }.
i i i i
i i
x x X x x X
O I = O I
e _ e m m
Proof. (a) Lemma 6.7.(1) states that (x) =

u (x) for all x e X. Thus,

( ) ( ). O I _ O I
Conversely, take any

x e

. X Lemma 6.7.(2) states that there is an x such that

( ) ( ). x x u u =
Thus,

( ) ( ). O I _ O I
(b) By g7, we have

{ ( ) : } { ( ) : }.
i i i i
i i
x x X x x X e = e m m The converse inclusion follows from the
surjectivity of by g0.
Now, we prove Theorem 6.1. Actually, we prove a more precise claim than the theorem:
when there is a g-morphism from (, m) to

( , ), I m each of P1a-P1d and P2a-P2c for (, m)


is equivalent to the corresponding one for

( , ). I m
Proof of Theorem 6.1. Suppose that there is a g-morphism from (, m) to

( , ). I m As
stated above, we prove that each requirement of P1a-P1d and P2a-P2c for (, m) is
equivalent to the corresponding one for

( , ). I m
P1a: By Lemma 6.8.(a), we have

( ) ( ). O I = O I P1a holds for , i.e.,


i
D
T
-
= (), if and only if
i
D
T
-
=

( ), O I i.e., P1a for

. I
P1b: Let P1b hold for , i.e.,
( )
.
o
x x
A A

= Consider any

x e

. X Then we have some x e X with


(x) =

. x By g4,

.
x x
A A = Thus,
( )

.
o
x x
A A

= Since (x) =

( ) x by g3, we have

( )

.
o
x
x
A A

=
The converse can be proved similarly.
P1c: Suppose P1c holds for


, i.e., ( ) ( )
o
x x t t I = for any

. Let . x X x X e e By g3, g5 and


P1c for

, I we have

( ) ( ) ( ) ( ).
o o
x x x x t t t t = = = Thus, we have P1c for . The
converse is similar.
P1d: Suppose P1d for . Consider any

x e

. X We should show


( ) ( ).
o
i
h x h x = By g3, g6
and P1d for , we have


( ) ( ) ( ) ( ) ( ) ( ).
o o o
i i i
h x h x h x h x h x h x = = = = = Thus,
P1d for

. I The converse is similar.


P2a: By Lemma 6.8.(b),


{ ( ) : } { ( ) : }.
i i
x x X x x X e = e m m Hence, m satisfies P2a if and only if

m does.
P2b: By g7 and Lemma 6.7, m satisfies P2b if and only if

m does.
P2c: Suppose P2c for

m. Let


( ) ( ). x y u u = Since is a surjection, we have some x, y e X such
that (x) =

x and (y) =

. y By Lemma 6.7,


( ) ( ) and ( ) ( ). x x y y u u u u = = Hence m(x) = m(y) by
Game Theory 112
P2c for m. Then, by g7,

( ) ( ) and ( ) ( ). x x y y = = m m m m Thus, P2c holds for

m. The converse is
similar.
The next target is to prove Theorem 6.3. We take two steps to have the assertion of the
theorem: Under the supposition that (, m) is an i.d.view for memory kit (T
D
i
, 1
D
i
), (1) we
can find a direct view so that it is a personal view; and (2) it is g-morphic to (, m). The first
part is given as a lemma, and the second is given as the proof of the theorem.
Lemma 6.9. Suppose that (, m) is an i.d.view for memory kit (T
D
i
, 1
D
i
). Then (
d
, m
d
) is a
personal view where
d
is the unique direct structure for (T
D
i
, 1
D
i
) and m
d
is defined by:
for all , ( ) ( ) for some satisfying ( ) .
d d
i x x i x
x X x y y X y x u e = e = m m (6.1)
Proof. Let (, m) be an i.d.view for memory kit (T
D
i
, 1
D
i
). We first show the right hand side
of Theorem 5.2.(i). This implies that
d
is the unique direct structure for (T
D
i
, 1
D
i
) and
d
is
an extensive game in the weak sense. We next show that (6.1) defines a memory function for

d
, from which it follows that (
d
, m
d
) is personal view.
Suppose, on the contrary, that there is some maximal thread (, v) e
i
D
T
-
such that v =
o
(x)
for some x e X
oD
. Then,
o
v
A = C by K33 for
o
. Since (, m) is an i.d.view for memory kit
(T
D
i
, 1
D
i
), we have () =
i
D
T
-
by P1a. Also, since (, v) is maximal in ,
i
D
T
-
there exists y e X
E
such that (y) = (, v). Then, by P1b, .
o
y v
A A = = C This contradicts that y is an endnode in .
Hence, the right hand side of Theorem 5.2.(i) holds.
Now let us see that m
d
is defined by (6.1) is a memory function for
d
. By P1a, W= W
d
. Then
by c4 and P1b, m
d
is a memory function for
d
since m is a memory function for .
Proof of Theorem 6.3. Let (, m) be an i.d.view for (T
D
i
, 1
D
i
). By Lemma 6.9, (, m) is a
personal view, where
d
is the unique direct structure for (T
D
i
, 1
D
i
) and m
d
is defined by (6.1).
First we show that (
d
, m
d
) is a direct view. Since
d
is the unique direct structure, we need
only to show that m
d
satisfies d7. Let .
d
i
x X e By (6.1) and P2b for m, x = (y
x
) e m(y
x
) =
m
d
(x) for some y
x
e X
i
.
We define the function from (, m) to (
d
, m
d
) by:
( ) ( ) for all . x x x X u = e (6.2)
The proof will be completed if we show that is a g-morphism from (, m) to (
d
, m
d
).
g0: We have X
d
=
i
D
T
-
by d1, and also () =
i
D
T
-
by P1a for (, m). Thus, X
d
= () and so
is a surjection from X to X
d
.
g1: Let x < y. Then, (x) is an initial segment of (y), i.e., (x) = (x) <
d
(y) = (y) by d2.
g2: Let

x <
d

y and

y = (y). Then,

x and

y can be written as (, v) and (, w) respectively,


and by d2, (, v) is an initial segment of (, w). Since

y = (y) = (y) = (, w), and (, v) is an


initial segment of (, w), we can find a unique x on the path to y with (x) = (, v). Thus, x < y
and (x) = (x) =

x.
For g3-g7 we will use the generic history (x) = (, v) for the node x in question.
g3: Let x e X. Then (x) = (x) = (, v). Hence,
d
(x) =
d
(, v) = v where the last equality
follows from d3. Hence, we have shown that
d
(x) = (x).
g4: Let x e X. Then, by d4,
,
.
c o
v v
A A
( )
= By P1b, we have
( )
.
o o
x x v
A A A

= = Hence,
( )
.
c
x x
A A =
Inductive Game Theory: A Basic Scenario 113
g5: Let x e X. By d3, ( ) ( ).
d d c
x v t t = If x e X
D
, then by P1c, ( ) ( ) ( ).
o o
x x v t t t = =
Also, since x e X
D
, it follows by Lemma 6.2 that (, v) e X
dD
. Hence, by d5, ( ) ( ).
d o
v v t t =
Thus, for x e X
D
we have the desired result that ( ) ( ).
d d
x x t t = Next consider x e X
E
.
Then by K42, ( ) { : ( ) for some }.
D
x j j y y X t t = e e By Lemma 6.5, g0 and d5, it follows
that this set is equivalent to ( ).
d
v t
g6: Let x e X
E
. By P1a and P1d, v = (x) =
o
(y) for some y e X
oE
, and ( ) ( ).
o
i
h v h v = By Lemma
6.2 and g3, (x) e X
dE
and (x) = v =
o
(y) for some y e X
oE
. So, by d6, ( ) ( ).
d o
i
h v h v = Hence,
we have shown that h
d

d
(x) = h (x).
g7: Let x e X
i
. Then by the definition of , (6.1) and P2c for m, it follows that m
d
(x) = m
d

(x) = m(y) = m(x).


7. Decision making and prescribed behavior in IGT
The inductive derivation of an individual view from past experiences is not the end of the
entire scenario of our theory. The next step is to use an i.d.view for decision making and to
bring the prescribed (or modified) behavior back to the objective situation. This is the third
stage of Fig.1.1. Because this paper aims to present a basic and entire scenario of our theory,
we will here concentrate on a clear-cut case. Specifically, we assume in this and next sections
that the objective memory function
o
i
m for each player i is given as the SPR function ,
spr
i
m
and that player i has the active domain ( ).
A o
i
D o Then, we will discuss how he can use the
inductively derived view for his decision making as well as how the prescribed behavior
helps his objective behavior. This gives an experiential foundation for Nash equilibrium.
7.1 Decision making using a personal view
Fig.7.1 describes the steps from experimentation (trial and error) to decision making using
an i.d.view. One basic question is whether the i.d.view helps the player for his decision
making, as well as whether the decision can be used in the objective situation when he
brings it back there. In this and next sections, we will discuss these questions.
We assume that each player i:
(7a): is relevant in his own domain;
(7b): has the SPR function ;
spr o
i i
= m m
(7c): follows a behavior pattern ;
o
i
o
(7d): accumulates memories over his active domain ( ).
A o
i
D o
(7e): adopts the direct view (
d
, m
d
).
Under these assumptions, it is already proved in Corollary 5.3 that there is a unique direct
i.d.view for each player i. Now, we consider the case where player i adopts this direct
i.d.view (
d
, m
d
).
Nevertheless, the direct structure
d
may not be an extensive game in the strong sense,
which may create some complications in the following discourse. Thus, we make the
following assumption to avoid it: for each player i,
(7f): for all , , ( )
A oE o
i i
x y D X x u e is not a proper subsequence of
o
(y)
i
.
Under this assumption, the direct view (
d
, m
d
) is an extensive game in the strong sense,
which will be stated in Lemma 7.1.
Game Theory 114
), , (
o o
m
Objective
situation
o

trial & error


i
D
Inductive Derivation:
personal View
) , (
i i
D D
Y T
Accumulation:
memory kit
) , (
i i
m
Decision Making
subjective strategy
i

Objective
situation
) , , (
o o
m ) , (
i o
i

Fig. 7.1. Various Phases


Condition 7f is implied by Kuhns [21] condition that each information piece for player i
occurs at most once in each play in
o
, which was stated in terms of information sets in [21].
Fig.7.2.A, called the absent-minded driver game in Piccione-Rubinstein [28], with the SPR
function
spr
i
m violates Condition 7f. In this case, ((E,e), 1) belongs to T
D
1
, but not to
1
D
T
-
since ((E,c), (E,e), 1) is a proper supersequence of ((E,e), 1). Fig.7.2.B is the direct view but is
not an extensive game in the strong sense.
The proofs of the results will be given in the end of this subsection.
Lemma 7.1. The direct view (
d
, m
d
) for (T
D
i
, 1
D
i
) =
( ) ( )
( , )
A o o
i
D DA
T
o o
1 is uniquely determined
and is an i.d.view satisfying:
(a):
d
is a 1-person extensive game in the strong sense with N
d
= {i};
(b): m
d
satisfies P2a with equality, i.e.,
( )
{ ( ) : } .
A o
i
d d
D
x x X
o
e = m 1
For the consideration of utility maximization of a behavior pattern
i
, player i needs to
consider the sets of compatible endnodes for various behavior patterns. Recall from (2.15)
that () denotes the set of compatible endpieces for a profile of behavior patterns
= (
1
, ...,
n
). Since
d
is a 1-person extensive game in the strong sense, the set of compatible
endpieces will be a singleton set for each behavior pattern
i
of player i. Consequently, we
will use
d
(
i
) here to denote the compatible endpiece in
d
for
i
.
Then, player i has a subjective strategy
d
i
o in
d
to maximize h
d
in the following sense:
( ) ( ) for all .
d d d d d d
i i i i
h h o o o > eE (7.1)
Once again, we emphasize that this decision is made in the personal view (
d
, m
d
) of player i,
i.e., in the mind of player i. This conceptually differs from the payoff maximization in the
objective situation, which is now the subject to be considered.
After the choice of the subjective strategy in (7.1), player i brings back
d
i
o to the objective
situation (
o
, m
o
), adjusting his behavior pattern
o
i
o with
d
i
o . The adjustment from his
objective behavior
d
i
o into
1
i
o is as follows: for all ,
o
i
x X e
Inductive Game Theory: A Basic Scenario 115
E
E
c
c
e
e
1
1
2
c
c
e
e
E
E c), (E,
e),1 (E, c), (E,
c),2 (E, c), (E,
A
B
Fig. 7.2. Violation of condition 7f and the direct view
( )
1
( )
.
, if ( ) ( { , }) ;
( )
( ) i f ( )
A o
i
A o
i
d o
i i
D
i
o o
i i
D
v x v
x
x x
o
o
o
o
o

( ) = ( ) e

m
m
1
1
(7.2)
That is, player i follows
d
i
o whenever a memory yarn in
( )
A o
i
D o
1 occurs; and otherwise, he
keeps the old behavior pattern. This adjustment produces a behavior pattern for player i in

o
, i.e.,
1
.
o
i i
o eE The next theorem states that the modified strategy
1
i
o of player i defined by
(7.2) is objectively utility maximizing for player i in
o
when the other players follow their
regular behavior
o
i
o

in
o
.
Before the next theorem, we give a small remark. Since the objective game
o
is also an
extensive game in the strong sense, the set of compatible endpieces
o
(
i
,
o
i
o

) will also be a
singleton for player is behavior pattern
i
and the other players behavior patterns
o
i
o

. We
follow the convention of using
o
(
i
,
o
i
o

) to denote the compatible endpiece, not the set of


compatible endpieces.
Theorem 7.2 (One-person utility maximization in the n-person game): The strategy
1
i
o
defined by (7.2) satisfies the objective payoff maximization for player i, i.e.,
1
( , ) ( , ) for all .
o o o o o o o
i i i i i i i i
h h o o o o o

> eE (7.3)
We emphasize that this is not the utility maximization obtained directly in the objective
situation. Instead, the utility maximization is made in his i.d.view (
d
, m
d
), and then the
modified strategy
1
i
o is brought to the objective situation (
o
, m
o
). It happens that it
maximizes his objective utility function. This process of obtaining the objective utility
maximization occurs only after many repetitions of collecting data to construct his view.
Thus, we have succeeded in having individual utility maximization in the well-defined form
in both subjective and objective senses. Nevertheless, once we leave the case of 7a-7f, player
i would have many difficulties at various steps in Fig.7.1. These problems will be discussed
in Section 8.2 and in separate papers.
Game Theory 116
Proof of Lemma 7.1.(a): The condition N
d
= {i} follows immediately since .
spr o
i i
= m m By
Corollary 5.3, it suffices to show that
d
satisfies K14 and K33.
K14: Since
o
is an extensive game in the strong sense, each strategy combination determines
a unique play. Let (x
1
, ..., x
m
) be the unique play determined by
o
, and let x
t
be the first node
of player i in this play, i.e., ( ) and ( )
o o o o
t s
i x i x t t e e for all s < t. Then
1 1
1 1 1 1
( ) ( ( ), ( )),...,( ( ), ( )), ( )
t
o o o o o o
t j t j t t
x x x x x x u o o


= ( ) where j
1
, ..., j
t1
denote the players
moving at x
1
, ..., x
t1
respectively. Let (
i
,
o
i
o

) be any other strategy combination where all


the players other than player i choose according to o. Then, the first t nodes in the play
determined by this strategy combination must also be x
1
, ..., x
t
. Hence, for any play
determined on the active domain, x
t
is the first node of player i. Thus, x
t
determines the
smallest node
o
(x
t
) in X
d
.
K33: We show that for each (, v) e X
dD
, the function
,
d
v

( )
defined in d4 is a bijection. Let
(, v) e X
dD
and let a be an arbitrary action in A
(, v)
. Since (, v) e X
dD
and the memory
function is ,
spr
i
m we have (, v) =
o
(x)
i
for some x e X
oi
, and x is on the path determined by
some (
i
,
o
i
o

). Consider the strategy


i
o

' defined by:


( ) if ( ) ( );
( )
if ( ) ( ).
o o
i i i
i
o o
i i
y y x
y
a y x
o
o

' =

m m
m m
Since ,
spr o
i i
= m m it follows that ( ) ( )
o o
i i
y x = m m for any
oD
i
y X e with y <
o
x. Hence x is on the
play determined by ( , ).
o
i i
o o

' Since the other players follow their strategies in


o
, the action
a determines a unique immediate successor x of x with ( ) { ,( , ), }.
spr
i
x v a u ' = ( ) m Then we
find also an endnode z coming from x. Then, ,( , ), v a u ( ) is an initial segment of ( ).
o
i
z u By
condition 7f, ( )
o
i
z u is a maximal sequence in T
D
i
. These mean that ,( , ), .
i
d
D
v a u T X
-
( ) e =
We can show similarly that a different action a e A
(, v)
determines a different immediate
successor ,( , ), v a u ' ' ( ) e X
d
, so the mapping
,
d
v

( )
from ,( , ), v a u ( ) to a is a bijection.
(b): Let x e D
i
. We show that ( ) { ( ) : }.
o d d
i
x y y X e e m m Since ,
spr o
i i
= m m we have
T
D
i
=
i
D
T
-
. Since ( ) { ( ) },
o o
i i
x x u = m it follows that ( ) .
i
o d
i D
x T X u
-
e = Corollary 5.3 states that
the direct view (
d
, m
d
) exists uniquely and m
d
(y) = {y} for all .
d
i
y X e Hence,
( ( ) ) { ( ) } ( ).
d o o o
i i i
x x x u u = = m m
Proof of Theorem 7.2. Consider any .
o
i i
o eE Recall that the endnode determinedby (
i
,
o
i
o

)
in
o
is denoted by z(
i
,
o
i
o

). Let x = z(
i
,
o
i
o

). Consider the history of player i at x, i.e.,


o
(x)
i
= ((w
1
, a
1
), ..., (w
m
, a
m
),w
m+1
) with w
m+1
=
o
(x), and also, let the corresponding history of nodes
be given as (x
1
, ..., x
m
, x
m+1
) with x
m+1
= x. Then,
o
(x
t
) = w
t
and
i
(x
t
) = a
t
for all t = 1, ...,m.
Hence, we choose a strategy
d
i
t having the property that
d
i
t ((w
1
, a
1
), ..., (w
t1
, a
t1
),w
t
) =
i
(x
t
)
for t = 1, ...,m. Then, the compatible endpiece ( , ) { } is the same as ( ).
o o d d
i i i
v o o t

= Hence,
1
( , ) ( ).
o o d d
i i i
o o t

= If we apply this procedure to


1
,
i
o then we have
d
i
o satisfying (7.1).
Hence, we have
1 1
( , ) ( ).
o o d
i i i
o o o

=
By d7 and using the above result, we have
1
( , ) ( ) ( )
o o o d d d d d d
i i i i i
h h h o o o t

= > =
( , ).
o o o
i i i
h o o


Inductive Game Theory: A Basic Scenario 117
7.2 An experiential foundation for Nash equilibrium
It is straightforward to extend Theorem 7.2 to all players relevant in their own domains and
to obtain a Nash equilibrium. Here, we still state this theorem, since it gives one explanation
of Nash equilibrium from the experiential viewpoint. For it, however, we need some more
notation and one more definition.
First, since our discussion involves more than one i.d.view, we put subscript i to the direct
i.d.view of player i, i.e., ( , ).
d d
i i
I m Second, for each player i who is relevant in his own
domain, we define the induced strategy
d
i
o of
o
to the direct i.d.view ( , )
d d
i i
I m for
( ) ( )
( , ) by: for all , ,
A o A o
i
d
i
D D
T w X
o o
( ) e 1
, ( ) for any with ( ) , .
d o o o
i i i i
w x x X x w o o u ( ) = e = ( ) (7.4)
The well-definedness of (7.4) is verified as follows. First, by the properties of the SPR
function, for each (, w) e ,
d
i
X there is an x e
o
i
X such that ( )
i
o
x u = (, w). Secondly, since
( ) ( )
o
i
o
i
x y u u = implies ( ) ( ),
spr spr
i i
x y = m m the strategy defined by (7.4) does not depend upon
the choice of x. Finally, we verify (2.12) and (2.13) for .
d
i
o The condition (2.12) follows from
d4. Condition (2.13) is also satisfied since by Corollary 5.3, the direct memory function of
player i is uniquely determined as
d
i
m (, w) = {(, w)}.
Then we have the following theorem, which is a straightforward implication of Theorem 7.2
Theorem 7.3 (Experimental foundation for Nash equilibrium): A profile
o
of behavior
patterns is a Nash equilibrium in (
o
, m
o
) if and only if for each player i e N
o
who is relevant
in his domain ( ),
A o
i
D o the induced strategy
d
i
o of
o
to the direct view ( , )
d d
i i
I m for the
memory kit
( ) ( )
( , )
A o A o
i i
D D
T
o o
1 satisfies condition (7.1).
Recall that we have adopted the assumptions 7a-7f. Under these assumptions, each player
makes his decision in his 1-person derived view. The theorem states that the behavior pattern
o
is a Nash equilibrium in the the objective game (
o
, m
o
) if and only the induced strategy for
each player i maximizes his utility in the direct view ( , ).
d d
i i
I m Thus, this theorem decomposes
the Nash equilibrium in (
o
, m
o
) into utility maximizations in n one-person games.
As discussed in Section 3, the accumulation of
( ) ( )
( , )
A o A o
i i
D D
T
o o
1 and the inductive derivation
of ( , )
d d
i i
I m need many repetitions of the game (
o
, m
o
). Also, in the present scenario, each
player revises his behavior over ( ),
A o
i
D o and other players may be influenced by his
revision, and may change their personal views. This revision process may continue. The
above theorem describes a stationary state in the revision process.
The revision process may take a long time to reach a Nash equilibrium or even may not
reach a Nash equilibrium. Furthermore, we did not explicitly consider the case where the
players trials and errors are restricted. If we take these limitations over experimentations,
the above Nash equilibrium is understood as a Nash equilibrium relative to the restricted
domains of actions.
In the above senses, Theorem 7.3 is one characterization of Nash equilibrium from the
experiential viewpoint. In separate papers, we will discuss other characterizations of Nash
equilibrium and/or difficulties arising for them. Finally, we give one example to suggest the
nonconvergence of the process of revising behavior via constructed personal views. If the
objective game (
o
, m
o
) has no Nash equilibria, then the above process does not converge.
The following example has a Nash equilibrium.
Game Theory 118
1 2
s
2 3
s
2 2
s
the entire game
the active domain
1 1
s
1 3
s
2 1
s
2 3
s
Fig. 7.3. Nonconvergence example
Example 7.1. (Nonconvergence): Consider the 2-person simultaneous game which is
described as Fig.7.3 and its payoffs are given in Fig.7.4. The bold arrow is the regular path
(s
12
, s
22
) and each player is presumed to have the SPR function.
21 22 23
11
12
13
(3, 3) (2, 2) (2, 2)
(2, 2) (4, 2) (2, 4)
(2, 2) (2, 4) (4, 2)
NE
s s s
s
s
s
Fig. 7.4.
Player 1s direct i.d.view is the 1-person game summarized by the matrix form of Fig.7.5,
and player 2s i.d.view is the 1-person game summarized in Fig.7.6.
11
21 22 23
12
13
2
4
2 2 4
2
s
s s s
s
s
Fig. 7.5. Fig. 7.6.
In this case, player 1 maximizes his utility in his i.d.view by choosing s
12
. Thus, he has no
incentive to change his objective behavior from the regular pattern. However, player 2
maximizes his utility in his i.d.view by changing from s
22
to s
23
.
By this revision, the regular behavior becomes (s
12
, s
23
). After experiencing this pair as well
as some trials, the personal views of the players will be revised to the 1-person games
summarized by the matrices of Fig.7.7 and Fig.7.8
11
21 22 23
12
13
2
2
2 2 4
4
s
s s s
s
s
Fig. 7.7 Fig. 7.8
Inductive Game Theory: A Basic Scenario 119
With this new view, player 1 now finds that he should change his behavior, while player 2
does not. The revised behavior becomes (s
13
, s
23
). In this manner, the players move cyclically
through the four regular behaviors depicted in the bottom right corner of Fig.7.9, and never
converge to the Nash equilibrium (s
11
, s
21
).
21 22 23
11
12
13

(3, 3) (2, 2) (2, 2)
(2, 2) (4, 2) (2, 4)
(2, 2) (2, 4) (4, 2)
!
|
s s s
s
s
s
Fig. 7.9
8. g-Morphism analysis of decision making
In Section 6, we showed, using the concept of a g-morphism, that the direct view can be
regarded as a representative one. On the other hand, in Section 7, we assumed that a player
makes a decision using the direct view (
d
, m
d
). Here, we apply the g-morphism analysis to
the decision making of a player. The concept of a g-morphism helps us analyze decision
making within some class of i.d.views. Here we do not restrict ourselves to the memory kits
based on the SPR function
spr
i
m and on the active domain ( *).
A
i
D o Although the g
morphism analysis works well, we still find some difficulties in decision making with
personal views and in transitions from subjective optimality to objective behavior.
8.1 Subjective optimality and g-morphism analysis
Let (, m) be a personal view of player i. We assume that satisfies N = {i}, i.e., it is a
1-person game. We call such a view a purely personal view.
We compare subjective optimality across g-morphic views of player i. For this purpose, let
(, m) and

( , ) I m be two purely personal views of player i, and let


i
e
i
and

.
i i
o e E Here,
we follow the convention that each notion in

( , ) I m is distinguished from the


corresponding one in (, m) by the cap, e.g.,
i
and

i
E are the sets of strategies of (, m)
and

( , ), I m respectively. We say that


i
and

i
o are endpiece-equivalent iff

( ) ( ).
i i
o o = (8.1)
Recall that (
i
) is the set of compatible endpieces for
i
, defined in (2.15). Endpiece-
equivalent strategies
i
and

i
o lead to the same endpieces in (, m) and

( , ). I m When we
have a g-morphism from (, m) to

( , ), I m we can carry over any strategy in (, m) to

( , ) I m keeping endpiece-equivalence; and the converse needs one additional condition on


(, m).
The additional condition on (, m) is as follows:
K33
S
: for any x e X,
x
is a surjection from the set of immediate successors of x to A
x
.
Condition K33
S
is a weakening of K33, which requires
x
to be a bijection. Under this
condition on (, m), we will have the converse that an endpiece-equivalent strategy is carried
over from

( , ) I m to (, m). The proofs will be given in the end of this subsection.


Theorem 8.1 (g-morphism and behavior). Let (, m) and

( , ) I m be two purely personal


views of player i, and let be a g-morphism from (, m) to

( , ). I m
Game Theory 120
(a): Let (, m) satisfy condition K33S. For each

,
i i
o e E the function
i
defined by (8.2) is a
strategy in
i
and is endpiece-equivalent to

:
i
o for all ,
i
D
x X e

( ) ( ).
i i
x x o o = (8.2)
(b): For each
i
e
i
, the function

i
o defined by (8.3) is a strategy in

i
E and is endpiece-
equivalent to
i
: for each

,
i
D
x X e

( ) ( ) for some with ( ) .
D
i i i
x x x X x x o o = e = (8.3)
In general, a g-morphism embeds a larger game to a smaller game preserving certain
game theoretical properties described in Definition 6.1. Assertion (a) converts a strategy
from the smaller game to the larger game. A larger game may be too sparse to allow this
conversion. Condition K33
S
requires the larger game to be appropriately dense to allow it.
On the other hand, (b) has no difficulty since the conversion of a strategy is along the g-
morphism in the direction from a larger game to a smaller game.
Condition K33
S
itself may appear to be simply a mathematical condition for (, m), though
we already mentioned its game theoretical interpretation that each action leads to some
consequence. In fact, this condition corresponds to one non-basic axiom called N3 (History-
Independent Extension) in the theory of information protocols in Kaneko- Kline [16]. There,
an information protocol with three non-basic axioms and two basic axioms is shown to be
equivalent to an extensive game in the strong sense of the present paper. The other
condition, K33
I
, obtained from K33
S
by replacing surjection by injection corresponds to
another non-basic axiom in [16] called N2 (Determination). This axiom was shown, in
Kaneko-Kline [17], to also have some important behavioral implications. Thus, these
conditions, K33
S
and K33
I
are not only mathematically clear-cut, but also essential in the
theory of extensive games in the strong and weak senses.
We should consider the implications of Theorem 8.1 in two respects. One is in terms of
subjective optimality, and the other is about when player i brings back his modified
behavior in the objective situation. From the viewpoint of g-morphisms, everything works
well even in these respects. However, there are still some remaining difficulties in those two
respects that are not captured by g-morphisms. These will be discussed in Section 8.2.
(1): g-morphism and subjective optimality: Since we do not assume that
spr o
i i
= m m and
( *),
o A
i i
D D o = some i.d.views may be extensive games only in the weak sense. In such cases,
the utility maximization (7.3) in Section 7 needs some modification. Here, we give one
possible modification.
Let (, m) be a purely personal view of player i. A strategy
i
is subjectively optimal in (, m) iff
( ) ( )
min ( ) min ( ) for all .
i i
i i
w w
h w h w
o o
o
' ' e e
' ' > eE (8.4)
This is the maximin criterion for his decision making: The worst outcome compatible with
this strategy is better than or equal to the worst outcome of any other strategy.
Corollary 8.2 (g-morphism and subjective optimality). Let (, m) and

( , ) I m be two purely
personal views of player i, and let be a g-morphism from (, m) to

( , ). I m
(a): Let (, m) satisfy condition K33
S
. If

i
o satisfies (8.4) in

( , ), I m then the endpiece-


equivalent strategy
i
defined by (8.2) satisfies (8.4) in (, m).
Inductive Game Theory: A Basic Scenario 121
(b): If
i
satisfies (8.4) in (, m), then the endpiece-equivalent strategy

i
o defined by (8.3)
satisfies (8.4) in

( , ). I m
Again, we talk about the corollary in the context of i.d.views. By the results of Section 6, we
can regard

( , ) I m as a direct one. By this result, we lose nothing in terms of subjective


optimality by focusing on a direct view.
(2): g-morphism and objective behavior: After his decision making in an i.d.view, a player
modifies his behavior pattern with his subjective strategy, and brings it back to the objective
situation. This modification might depend upon the particular i.d.view of the player. In fact,
we will show that the prescriptions for objective strategies are not different across g-morphic
i.d.views. This implies that we can focus on the direct view even in the step of taking the
prescription back to the objective world.
For the above consideration, we first modify (7.2) in the following way. Let (, m) be a
purely personal view of player i and let
i
satisfy (8.4). We define the prescribed behavior of
player i in (
o
, m
o
) by: for all ,
i
o
x X e
1
( ) if ( ) ( ) for some ;
( )
( ) if ( ) ( ) for any .
o
i i
i
o o
i i
x x x x X
x
x x x x X
o
o
o

' ' ' = e

' ' = e

m m
m m
(8.5)
This strategy prescribes the same behavior as (7.2) in the case of Section 7. The next corollary
states that g-morphic views give the same prescriptions for behavior in the objective situation.
Corollary 8.3 (g-morphism and modified behavior). Let (, m) and

( , ) I m be two purely
personal views of player i, and let be a g-morphism from (, m) to

( , ). I m
(a): Let (, m) satisfy condition K33
S
. Let

i
o be a strategy in

( , ), I m and let
i
be the
endpiece-equivalent strategy defined by (8.2). Then
i
and

i
o prescribe the same behavior to
player i in (
o
, m
o
).
(b): Let
i
be a strategy in (, m), and let

i
o be the endpiece-equivalent strategy defined by
(8.3). Then
i
and

i
o prescribe the same behavior to player i in (
o
, m
o
), that is, the modified
behaviors defined by (8.5) with
i
and

i
o are the same.
In this corollary, we did not refer to the optimization condition (8.4). Of course, we can
assume that
i
in (a) or

i
o in (b) satisfies (8.4). Although Corollary 8.2 states that subjective
optimality is invariant with personal views, subjective optimality may not guarantee, in
general, the objective optimality of the prescribed behavior in contrast to Theorem 7.2.
Now we prove Theorem 8.1 and the corollaries. To prove (a) of Theorem 8.1, we first present
the following lemma.
Lemma 8.4. Suppose that (, m) satisfies K33
S
. Let be a g-morphism from (, m) to

( , ). I m Then satisfies: for all



, , , and , if and ( ), then
I I
a a x
x y X x X a A x y x x x y e e e < = <
for some . y X e
Proof. Let



, , , and with ( ) and . By ( ) and g4, we have
I
a x
x y X a A x X x x x y x x e e e = < =


. Thus, . So, by 33 on , there is some such that .
S I
x x a x x
A A a A A K y X x y = e = I e <
Proof of Theorem 8.1.(a): Let

.
i i
o e E Consider
i
defined by (8.2). First, we show that
i
is a
function over
D
i
X and satisfies (2.12) and (2.13) on (, m).
Consider .
i
D
x X e By Lemma 6.5, we have

( . )
i
D
x X e Thus, (8.2) assigns one action

( )
i
x o
as
i
(x). Hence,
i
is a function over .
i
D
X
Game Theory 122
Next, we show (2.12) for
i
. Let (x) =

x and

( ) .
i
x a o = Then,

( ) ( )
i i
x x a o o = = by (8.2). It
suffices to show that
x
(y) = a some y e X. By (2.12) for

,
i
o we have


( ) ( )
i x
x y a o = = for
some

,
i
y X e i.e.,

.
I
a
x y < By Lemma 8.4, we have
I
a
x y < for some y e X, which implies

x
(y) = a.
To prove (2.13) for
i
defined by (8.2), consider ,
D
i
x y X e with m(x) = m(y). Then, by g7,

( ) ( ) ( ) ( ). x x y y = = = m m m m Since

i
o satisfies (2.13), we have

( ) ( )
i i
x x o o = =

( ) ( ).
i i
y y o o =
Next we show that the two strategies are endpiece-equivalent. This has two parts,


( ) ( ) and ( ) ( ).
i i i i
o o o o _ _ We show the former. The latter is proved in the same way.
First, let ( ).
i
w o e Then, there is a play
1 1
,..., ,
k k
x x x
+
( ) in with
1
( ) and
k
x w
+
=
1 1 1 1
( ) ( ( ), ( )),...,( ( ), ( )), ( ) .
k i k i k k
x x x x x x u o o
+ +
= ( ) We denote

( ) by
t t
x x for t = 1, ..., k + 1.
By Lemma 6.7,
1 1

,..., ,
k k
x x x
+
( ) is a play in
1 1

and ( ) ( ).
t k
x x u u
+ +
I = By g3,

( ) ( ) for
t t
x x =
1,..., 1, t k = + and by (8.2),

( ) ( ) for 1,..., .
i t i t
x x t k o o = = Hence,
1 1 1


( ) ( ( ), ( ))
t i
x x x u o
+
= (
1 1


,...,( ( ), ( )), ( ) ,
i k k
x x x o
+
) which means

( ).
i
w o e
(b): Let .
i i
o eE We start by showing that

i
o defined by (8.3) is well-defined and satisfies
(2.12) and (2.13) on

( , ). I m
Consider

.
D
i
x X e Since is a surjection by g0, (x) =

x for some x e X. By Lemma 6.5, we


have .
i
D
x X e Observe that there may be distinct . ,
D
i
x y X e satisfying (x) = (y) =

. x
Nevertheless, we can show that (x) = (y) implies
i
(x) =
i
(y), so that

i
o defined by (8.3) is
well defined. To see this fact, observe that if (x) = (y), then by g7, m(x) = m(y), which
together with (2.13) for
i
implies
i
(x) =
i
(y).
By (2.12) for
i
, we have a y e X so that
x
(y) =
i
(x). Let
i
(x) = a. Then, ,
I
a
x y < so by Lemma
6.6,

( ) ( ).
I
a
x y < Thus,

( ( )) ,
x
y a = which implies (2.12) for

.
i
o
Consider (2.13) for

.
i
o Let


, and ( ) ( ).
D
i
x y X x y e = m m By g0 (surjection), we can find x and y
so that (x) =

x and (y) =

. y By g7, m(x) = m(y). Hence, by (2.13) for


i
and (8.3), we have

( ) ( ).
i i
x y o o =
It remains to check that

i
o and
i
are endpiece-equivalent, which is shown in almost the
same way as in the proof of (a) using (8.3) in place of (8.2).
Proof of Corollary 8.2. We prove only (b). Let
i
satisfy (8.4) in (, m), and let

i
o be the
endpiece-equivalent strategy defined by (8.3). By g3, g6, and endpiece-equivalence of
i
and

,
i
o we have

( ) ( )

min ( ) min ( ).
i i
w w
h w h w
o o e e
= For each

,
i i
o' eE Theorem 8.1 guarantees that there
is an endpiece-equivalent strategy
i i
o' eE defined by (8.2) and

( ) ( )

min ( ) min ( ).
i i
w w
h w h w
o o
' ' e ' ' e
' ' =
Hence, since
i
satisfies (8.4) in (, m), we have,
( ) ( )


min ( ) min ( ) for all .
i i
i i
w w
h w h w
o o
o
' ' e e
' ' > eE
Proof of Corollary 8.3.(b): Let
i
satisfy (8.4) in (, m), and let

i
o be the strategy defined by
(8.3). By Corollary 8.2,

i
o satisfies (8.4) in

( , ). I m We let
1 1

( ) and ( )
i i
x x o o denote the
behavior prescribed by (8.5) in (, m) and

( , ). I m respectively. Let .
i
o
x X e If ( ) ( )
o
i
x x' = m m
for some x e X, then by g0 there is an

x X ' e where
( )

. x x ' = ' By (8.3),



( ) ( ),
i i
x x o o ' ' =
1 1

so ( ) ( ).
i i
x x o o = If, alternatively, ( ) ( )
o
i
x x' = m m for any x e X, then
1 1

( ) ( ) ( ).
o
i i i
x x x o o o = =
Part (a) is proved in almost the same way as (b).
Inductive Game Theory: A Basic Scenario 123
8.2. Difficulties involved in subjective thinking and in playing in the objective situation
In Section 7, we assumed that player i has the memory function
spr o
i i
= m m and the active
domain ( ).
A o
i
D o Then, he succeeds in having the unique direct view, in finding an optimal
strategy in (
d
, m
d
) as well as in bringing it back to the objective situation. However, if we
drop these assumptions, then a subjectively optimal strategy may not help him behave
properly in the objective situation. We can find many difficulties in decision making here,
but we restrict ourselves to only some of them.
(1): Difficulty in subjective thinking: We start with a difficulty involved in subjective
thinking. In Corollary 5.3, we gave a necessary and sufficient condition for a direct view to
be unique and inductively derived. When the direct view is uniquely determined, the
results of Section 6 state that it is essentially the smallest i.d.view. Also, the results of Section
8.1 imply that decision making is invariant to the choice of a personal view.
Problems may arise because of multiplicity of direct views for a given memory kit (T
D
i
,1
D
i
).
In this case, player i faces a difficulty first in choosing an i.d.view.
In Example 5.1 there are four direct i.d.views, which all differ in terms of the memory
function. Fig.8.1 gives two of those direct i.d.views with only the relevant memory yarns
listed, and the payoffs are now attached. In Fig.8.1.A, the memory yarns are mixed up at the
nodes ((y
0
, a), v) and ((y
0
, b), v) as
1 1 2 1
( ) and ( ),
o o
y y m m while the objective game has the same
structure with the opposite assignment of
1 1 2 1
( ) and ( ).
o o
y y m m In Fig.8.1.B, he expects the
same memory yarn
1 1
( )
o
y m at each of his second decision nodes. In the view A, he does not
use the memory yarn
2 1
( )
o
y m in 1
D
1
. This multiplicity of views causes some difficulty for
the player in deciding which view to use for his decision making. His choice of a view may
influence his decision making since, e.g., in the view A he can make different choices at
((y
0
, a), v) and ((y
0
, b), v), while in view B, he is required to make the same choice.
(2): Difficulty in objective optimality: Suppose that player 1 has chosen an direct i.d.view
and a behavior pattern for it that is subjectively optimal in the sense of (8.4). Consider the
direct view of Fig.8.1.A. One subjectively optimal strategy is defined by
1
choosing action a
at the root node and the left node with
2 1
( ),
o
y m while choosing b at the right node with
1 1
( ).
o
y m When he modifies his regular behavior in the objective game by this strategy
1
and
brings it back to the objective situation, he receives the payoff 0. Thus he fails to behave
optimally in the objective situation.
Next, consider the view B. In this view, he has a subjectively optimal strategy prescribing
the choice of b at all the decision nodes. If he takes this strategy to the objective world, he
will receive the memory yarn
2 1
( ),
o
y m which he does not expect and, indeed, is not
contained in his constructed personal view. Thus, the player finds a further difficulty with
his view and a reason to revise his behavior or his view.
This difficulty is caused by the weak inclusion condition of P2a, allowing the possibility of
{ ( ) : } .
i
i i
i D
x x X e m w1 By strengthening P2a to equality, this difficulty could be avoided as in
the view B. Nevertheless, the multiplicity of views remains, and so does the difficulty that a
subjectively optimal strategy may not be objectively optimal.
Thus, when there are multiple direct i.d.views, player i may meet some difficulties both
subjectively and objectively. Either of these difficulties gives a player a reason to revise his
behavior or his view. In this paper, however, we do not consider those revisions.
Game Theory 124
a
a
a
b
b b
0 0
) (
2 1
y m
o
) (
1 1
y m
o
1 1
a
a
a
b
b b
0 0
) (
1 1
y m
o
) (
1 1
y m
o
1 1
A
B
Fig. 8.1. Difficulty in objective optimality
9. Concluding comments
We have given a discourse of inductive game theory by confining ourselves to clear-cut
cases. It would be, perhaps, appropriate to start this section with comments on our
discourse. Then we will discuss some implications for extant game theory.
9.1 Comments on our discourse
We have made particular choices of assumptions and definitions for our discourse. One
important methodological choice is to adopt extensive games in the strong and weak senses
for objective and subjective descriptions. First, we will give some comments on this choice,
and then, we will discuss the definition of an inductively derived view given in Section 4
based on the initial segment procedure.
As pointed out in Section 4, an extensive game contains observable and unobservable
elements. The nodes with the successor relation are unobservable for the players and even
for the outside observer, in which sense those are highly hypothetical. The components in a
memory kit are all observables and actually observed. Thus, our definition of the inductive
derivation of a personal view from a memory kit extends the observed observables by
adding hypothetical elements. This may be interpreted as an inductive process of adding
unobservable elements to observed data. However, this freedom of adding hypothetical
elements leads us a proliferation of possible views. To prevent this proliferation, we need
some criterion to choose a view from many possible ones. In this paper, we have used the
concept of a g-morphism (game theoretical p-morphism) to choose a smallest one.
Conceptually speaking, the choice of a personal view is supposed to be done by a player,
rather than us. While the definition of an inductive derivation allows many views, a player
cannot construct a large one because of his bounded cognitive ability. Thus, the criteria of
smallness and constructiveness are important from this point of view. The direct view
defined in Section 5 has a constructive nature as well as being a smallest one for a given
memory kit. In this sense, the direct view has a special status among those possible views.
Nevertheless, Definition 4.1 may admit no inductively derived views for a given memory
kit, as characterized by Theorem 5.2. In fact, the initial segment procedure adopted in
Inductive Game Theory: A Basic Scenario 125
Definition 4.1 still gives a strong restriction on the addition of hypothetical elements. If we
allow more freedom in using hypothetical elements in an inductive derivation, we could
avoid the nonexistence result. For example, if we allow a player to add nature nodes to his
personal view, we could even avoid the use of an extensive game in the weak sense. On the
other hand, this creates vast arbitrariness in inductive derivations; and we expect serious
difficulties in finding natural criteria to narrow down the use of nature nodes. Until we
find natural criteria, we should refrain from the cheap use of nature nodes.
The above conclusion may sound negative to any extension of our definition of an inductive
derivation, but we have different opinions. We could actually have a more general
procedure to construct a personal view than the initial segment procedure. Since this paper
is intended to provide an entire scenario, we have chosen the initial segment procedure as a
clear-cut case. In separate papers, we will discuss less restrictive definitions. See Section 9.3.
Another comment should be given on the choice of extensive games. In fact, we can avoid
the adoption of extensive games; instead, the present authors ([16]) have developed a theory
of information protocols, which avoids the use of nodes and describes game situations directly
in terms of information pieces and actions together with a history-event relation. If we adopt
this theory, then we could avoid a proliferation of personal views generated by the use of
hypothetical nodes. In the theory of information protocols it may be easier to discuss
extensions of inductive derivations. One reason for our adoption of extensive games here is
their familiarity within our profession. The choice of extensive games makes the distinction
between observables and unobservables explicit, which is another reason for our choice.
We expect gradual developments of inductive game theory to come about by deeper
analysis and alternative approaches to the various stages mentioned in the diagram of
Fig.1.1. By such gradual developments, we may find natural criteria for steps such as the use
of nature nodes, and some experimental tests of inductive game theory.
9.2 Implications to extant game theory
It is a main implication of our discourse that a good individual view on society is difficult to
construct from the experiential point of view: There are many places for a player to get stuck
in his inductive process and analysis process. Nevertheless, we gave a characterization
theorem of Nash equilibrium in Section 7. Here, we discuss some other implications to
extant game theory and economics chiefly with respect to Nash equilibrium.
There are various interpretations of Nash equilibrium (cf. Kaneko [14], Act 4). Nash [25]
himself described his concept from the viewpoint of purely ex ante decision making, but in
economic applications, it is typically more natural to interpret Nash equilibrium as a
strategically stable stationary state in a recurrent situation. The characterization given in
Section 7 is along this line of interpretations, including also ex ante decision making in a
players constructed personal view.
To reach Nash equilibrium, which may not be the case, it takes a long time. Also, the process
of trial and error may not allow all possible available actions. The Nash equilibrium reached
should be regarded as a Nash equilibrium in the game with respect to the actually
experienced domains. Thus, the characterization of Nash equilibrium in Section 7 should not
merely be interpreted as a positive result. It means that the characterization would be
obtained if all those processes go through well and if reservations about restrictions on trials
are taken into account.
From the same point of view, the subgame perfect equilibrium of Selten [30] involves even
deeper difficulties from our experiential point of view, which was already pointed out in
Game Theory 126
Kaneko-Matsui [18]. The reason is that subgame perfection requires higher order
experimentations. When one player deviates from his regular behavior, other players in turn
need, again, to make experimentations from regular behavior. This second or higher order
experimentation is already problematic and violates some principles discussed in the
informal theory in Section 3.2. In fact, a similar criticism is applied to Nash equilibrium, as
already stated. Nash equilibrium itself is regarded as one limit notion, and subgame
perfection is a higher limit one.
Taking the above criticism seriously, one important problem arises. The complexities, in a
certain sense, of an inductively derived view as well as of experimentations are measured
and restricted. In the epistemic logic context, Kaneko-Suzuki [20] introduced the concept of
contentwise complexity, which measures complexity of a single instance of a game. This
notion can be converted to our inductive game theory. Then, we will be able to give
restrictions on individual views as well as experiments. In this manner, our inductive game
theory will be developed in the direction of bounded rationalities.
We have restricted our attention to the purely experiential sources. In our society, usually,
we have different sources of beliefs/knowledge such as from other people or through
education. These suggest that a player may get more beliefs/knowledge on the social
structure, but do not suggest that he can guess other peoples thinking, which has usually
been assumed in the standard game theory (cf., Harsanyi [10] for incomplete information
game and Kaneko [13] for the epistemic logic approach). At least, the assumption of
common knowledge is far beyond experiences. If we restrict interpersonal thinking to very
shallow levels, deductive game theory may have some connections to inductive game
theory (cf. Kaneko-Suzuki [19] for such a direction of deductive game theory).
9.3 Postscript
By now, several new developments along the line of the scenario given in this paper have
been made in Kaneko-Kline [15], [16], [17], and Akiyama-Ishikawa-Kaneko-Kline [1]. We
use this postscript section to present some small summaries of those papers to help the
reader catch up to the present state of inductive game theory.
The main concern of Kaneko-Kline [15] is the size of an inductively derived view for a
player with bounded cognitive abilities. If the objective situation is too large, a player may
have difficulty: 1) analyzing it strategically; and 2) accumulating enough experiences to
have a rich view. The premise of that paper is that the number of experiences and the size of
a view must be small for it to be managed by a player. The concept of marking some parts
and actions as important was introduced in that paper and shown to be successful in
allowing a player to obtain a manageable, though potentially biased, view.
As already mentioned in Section 9.1, Kaneko-Kline [16] introduced a new construct called an
information protocol, based on actions and information pieces as tangible elements for
each player rather than hypothetical non-tangible concepts such as nodes. This approach gives
a more direct and simpler description of a game situation from the perspective of a player. It
has another merit to classify extensive games in a more clear-cut manner. With an appropriate
choice of axioms, it fully characterizes an extensive game in the weak and strong senses. It also
enables us to avoid g-morphisms, since we have no multiplicity in i.d.views caused by
hypothetical nodes and branches. The theory of information protocols has been adopted in our
more recent research including Kaneko-Kline [17].
Kaneko-Kline [17] took up that task of constructing i.d.views with more partiality in a
players memory. Accordingly, the definition of an i.d.view had to be weakened to admit a
Inductive Game Theory: A Basic Scenario 127
view. By these generalizations, the induction becomes less deterministic and we meet some
multiplicity of consistent views with a given set of memories. The interactions between a
players i.d.view, his future behavior, and future views become the topics of this paper and
also serve as potential sources for resolving the multiplicity problem.
Finally, Akiyama et al. [1] took a computer simulation approach in order to look into the
process of experiencing and memorizing experiences in a one-person problem called
Mikes bike commuting. That paper tries to clarify the informal theory of behavior and
accumulation of memories discussed in Section 3.2 of this paper. The simulation approach is
based on finite experiences and accumulations of memories. The use of marking
introduced in Kaneko-Kline [15] was found to be crucial for obtaining a rich enough view.
These developments are, more or less, consistent with the scenario spelled out in this paper
and give more details into each step in the basic scenario. We are presently continuing our
research along those lines making progress into experiential foundations of
beliefs/knowledge on other players thinking.
10. Acknowledgement
This paper appeared in Journal of Mathematical Economics 44 (2008) 1332-1362. It has some
mathematical error, which is corrected in a Corrigendum (the same journal, 46 (2010) 620-
622). Section 8 of the present paper incorporates this correction. We thank the Elsvier for
allowing to publish this paper in this volume.
We thank Chih Chang, Takashi Ikegami and Ryuichiro Ishikawa for comments on earlier
drafts of this paper. Also, we are grateful for hospitality of the institute of economics at
Academia Sinica, Taiwan: Some part of this paper was done during the visit of the authors
to the institute.
The authors are partially supported by Grant-in-Aids for Scientific Research No.18330034,
Ministry of Education, Science and Culture, and Australian Research Council Discovery
Grant DP0560034.
11. References
[1] Akiyama, E., R. Ishikawa, M. Kaneko, J. J. Kline, (2008), A Simulation Study of Learning a
Structure: Mikes Bike Commuting, SSM.DP.1190. University of Tsukuba.
http://www.sk.tsukuba.ac.jp/SSM/libraries/pdf1176/1190.pdf
[2] Bacon, F., (1889; 1589), Novum Organum, edited with Introduction, Notes, etc., by
Thomas Fowler, 2nd ed., Oxford.
[3] Blackburn, P., M. De Rijke, and Y. Venema, (2002), Modal Logic, Cambridge University
Press, Cambridge.
[4] Brandenburger, A., and E. Dekel (1993), Hierarchies of Beliefs and Common Knowledge,
Journal of Economic Theory, 59, 189-198.
[5] Camerer, C., (2003), Behavioral Game Theory, Princeton University Press, Princeton.
[6] Dubey, P., and M. Kaneko, (1984), Information Patterns and Nash Equilibria in Extensive
Games I, Mathematical Social Sciences 8, 111-139.
[7] Fudenberg, D., and D.K. Levine, (1993), Self-confirming Equilibrium, Econometrica 61,
523-545.
[8] Gilboa, I., and D. Schmeidler (1995), Case-based decision theory, Quarterly Journal of
Economics 110, 605-639.
Game Theory 128
[9] Harper, W.L. and O. Schulte (2005), Scientific Method, McMillan Encyclopaedia of
Philosophy.
[10] Harsanyi, J. C., (1967/68), Games with Incomplete Information Played by Bayesian
Players, Parts I,II, and III, Management Sciences 14, 159 182, 320-334, and 486-502.
[11] Hume, D., (1889; 1759), An Enquiry Concerning Human Understanding, Logmans, Green
and Co. London.
[12] Kalai, E., and E. Lehrer, (1993), Subjective Equilibrium in Repeated Games, Econometrica
61, 1231-1240.
[13] Kaneko, M., (2002), Epistemic Logics and their Game Theoretical Applications:
Introduction. Economic Theory 19, 7-62.
[14] Kaneko, M., (2004), Game Theory and Mutual Misunderstanding, Springer, Heidelberg.
[15] Kaneko, M., and J. J. Kline, (2007a), Small and Partial Views derived from Limited
Experiences, SSM.DP.1166, University of Tsukuba.
http://www.sk.tsukuba.ac.jp/SSM/libraries/pdf1151/1166.pdf.
[16] Kaneko, M., and J. J. Kline, (2008a), Information Protocols and Extensive Games in
Inductive Game Theory, Game Theory and Applications 13, 57-83.,
[17] Kaneko, M., and J. J. Kline, Partial Memories, Inductively Derived Views, and their
Interactions with Behavior, to appear in Economic Theory.
[18] Kaneko, M., and A. Matsui, (1999), Inductive Game Theory: Discrimination and
Prejudices, Journal of Public Economic Theory 1, 101-137. Errata: the same journal 3
(2001), 347.
[19] Kaneko, M., and N.-Y. Suzuki, (2002), Bounded interpersonal inferences and decision
making, Economic Theory 19 (2002), 63-103.
[20] Kaneko, M., and N.-Y. Suzuki, (2005), Contentwise Complexity of Inferences in
Epistemic Logics of Shallow Depths I: General Development. University of
Tsukuba, Mimeo.
[21] Kuhn, H. W., (1953), Extensive Games and the Problem of Information, Contributions to
the Theory of Games II, Kuhn, H. W. and A. W. Tucker, eds. 193-216. Princeton
University Press.
[22] Kuhn, T. (1964), The Structure of Scientific Revolutions, Chicago University Press, Chicago.
[23] Luce, R. D., and H. Raiffa (1957), Games and Decisions, John Wiley and Sons Inc., Boston.
[24] Mertens, J., and S. Zamir (1985), Formulation of Bayesian analysis for games with
incomplete information, International Journal of Game Theory 14, 1-29.
[25] Nash, J. F., (1951), Noncooperative Games, Annals of Mathematics 54, 286295.
[26] Ono, H., (1994), Logic in Information Sciences (in Japanese), Nihon-hyoron-sha. Tokyo.
[27] Osborne, M., and A. Rubinstein, (1994), A Course in Game Theory, MIT Press, Cambridge.
[28] Piccione, M., and Rubinstein, A. (1997), On the Interpretation of Decision Problems with
Imperfect Recall, Games and Economic Behavior 20, 3-24.
[29] Ritzberger, K., (2002), Foundations of Non-cooperative Game Theory, Oxford
University press, Oxford.
[30] Selten, R., (1975), Reexamination of the Perfectness Concept of Equilibrium Points in
Extensive Games, International Journal of Game Theory 4, 25-55.
[31] Weibull, J. W., (1995), Evolutionary Game Theory, MIT Press. London.
[32] von Neumann, J., and O. Morgenstern, (1944), Theory of Games and Economic Behavior,
Princeton University Press, Princeton.
6
Cooperative Logistics Games
Juan Aparicio
1
, Natividad Llorca
1
, Joaquin Sanchez-Soriano
1
,
Julia Sancho
2
and Sergio Valero
3
1
Center of Operations Research (CIO). University Miguel Hernandez of Elche
2
Consejeria de Educacion, Region de Murcia
3
Dept. of Engineering of Industrial Systems. University Miguel Hernandez of Elche
1,2,3
Spain
1. Introduction
Roughly speaking, Game Theory deals with analysing conflict and cooperation situations in
which two or more rational and intelligent agents are involved. There are many real and
theoretical situations which can be examined from the point of view of Game Theory.
Therefore it is not difficult to find in the literature a rich variety of applications of Game
Theory to many and very diverse fields of knowledge. In particular, Game Theory plays a
significant role in Economics, but we can also find applications to Computer Science and
Engineering.
Game Theory can be roughly divided into two main areas: cooperative and non cooperative
games. The basic key for distinguishing between these two areas is whether it is possible or not
to reach binding agreements. When binding agreements are possible, we are then faced with a
cooperative situation. Thus, in a cooperative environment the concept of coalition plays an
important role and very often the main goal is to achieve the cooperation of all agents. In this
chapter we will assume that binding agreements among the agents are possible and therefore
we will use the cooperative approach for analysing some logistics problems.
On the other hand, there are a number of theoretical and conceptual connections between
Game Theory and Operations Research (OR). For example, we should mention the
connection between the duality in mathematical programming and the minimax theorems
for zero-sum games (see Raghavan, 1994); the linear complementary theory and the bi-
matrix games (see Lemke, 1965), or the optimal control theory and the differential games
(see Friedman, 1994) among others. Furthermore we can find applications of OR to Game
Theory, for example the characterization of balanced games using the duality concept
(Bondareva, 1963 and Shapley, 1967). Likewise, Game Theory contributes to completing the
analysis of OR problems when there is more than one agent involved in the corresponding
situation. Thus, after optimising a particular system by means of OR techniques, in which
there are two or more agents involved, who have to collaborate in order to be able to
achieve that optimal result, saying something about how to distribute the extra benefits or
the costs saved by cooperation among those agents seems reasonable and necessary. Hence
cooperative games can play a role in the complete analysis of the situation.
In the literature, not only we can find many OR problems studied from the point of view of
cooperative games in the sense mentioned previously, but also OR problems analysed from
Game Theory 130
a strategic or non-cooperative approach. However, in this chapter we are more interested in
the cooperative approach. Some of the first OR situations studied using cooperative games
are assignment problems (Shapley & Shubik, 1971), linear production problems (Owen,
1975), network flow problems (Kalai & Zemel, 1982) and minimum cost spanning tree
problems (Claus & Kleitman, 1973 and Bird, 1976), obtaining the so-called assignment
games, linear production games and so on. The games obtained from OR problems are
usually called OR-games (see Borm et al., 2001 for a survey on this topic).
In general, the methodology to analyse an OR problem from a cooperative approach
consists of associating a coalitional game to each problem or characteristic function form
game summarising the gains or savings from cooperation for each possible coalition of the
agents involved and, thereby, analysing different topics of Game Theory such as solution
concepts, stability, etc. Thus we can try to answer the question posed before, namely, How
to distribute the extra benefits or the costs saved by achieving cooperation among the
different agents involved.
Logistics include the analysis and management of many different situations which can be
formulated or modelled as OR problems. Thus problems related to transportation,
inventory, supply chain, distribution, location, routing or storage among others, arise
frequently in logistics. One can also consider that all of these problems may have more than
one agent involved, so a game theoretical approach could be used to tackle them either from
a cooperative point of view or from a non cooperative point of view. In the literature we can
find both approaches for the different logistics problems but we will concentrate our
attention on the cooperative approach.
In this chapter we will only analyse two logistics problems from a cooperative point of view:
transportation situations and some related problems and supply chain situations. The two
problems selected are representative of a particular problem in logistics, such as the
transportation of goods from stores or production sources to points of sale or distribution
and a general problem, such as the supply chain which embraces many (or all) logistics
tasks. Therefore we have selected one particular problem and a more general problem. In
this sense it is possible to consider logistics as being a part of supply chain management but
we have considered the supply chain inside logistics in order to be able to analyse
separately different interesting optimisation problems under the same umbrella. On the
other hand, we are aware that these two problems do not cover all possible logistics
situations but we believe that the analysis of these problems together with the references
provided throughout the chapter can provide a good starting point for the reader interested
in this topic.
Finally, since we will use the cooperative approach to analyse the different problems and
hence are interested in cooperation between the agents, then we will study the concept of
coalitional stability represented by the core of the game. To this end, we will analyse the
non-emptiness of the core of the corresponding game and therefore the existence of
coalitional stable distributions. Likewise, we will explore other possible solution concepts
and their relationship to the core of the game.
The rest of the chapter is organised as follows. In Section 2 we provide the basic definitions,
concepts and solutions of cooperative games. We also describe the methodology for defining
a cooperative OR game and introduce logistics games. Section 3 analyses the cooperative
approach for transportation situations and some related problems which can arise in
logistics situations. In Section 4 we review the literature for the cooperative approach for
Cooperative Logistics Games 131
supply chain situations and explore the possibility to analyse from a cooperative standpoint
supply chain situations without storage through two particular examples. Finally, in Section
5 we briefly revise the literature for other logistics games.
2. Preliminaries
In this section we formally introduce some basic definitions, concepts and solutions for
cooperative games in order to provide the reader with all the necessary background to
follow this chapter. Likewise, we present what we mean for Operations Research Games
and the definition of logistics games.
2.1 Basic notions on cooperative games
First, a cooperative game in characteristic function form is a pair (N, v) where N is a finite set of
agents called players and v is a function that associates to each set S c N a real value v(S)
satisfying v(C)=0. This value v(S) represents the joint gain that the agents in S can guarantee
by themselves if they cooperate independently of what the agents in N\S could do.
Therefore, in some sense, v(S) measures the worth of coalition S. On the other hand, when
the characteristic function represents costs instead of gains or benefits then we will denote it
by c and we refer to cost games. Of course, it is possible to transform a cost game (N, c) in a
benefit game through the so-called savings game. The definition of a savings game (N, v
c
)
associated with a cost game (N, c) is the following:
( ) ( ) ( )
c
i S
v S c i c S
e
=
_
. (1)
Therefore the savings game is simply the saved costs from cooperation with respect to all
the individual costs. Thus, the savings game represents the gains of cooperation as opposed
to acting separately.
We will denote by G
N
the set of all (benefit or profit) games with set of players N and by
CG
N
the set of all cost games with set of players N. Furthermore, we will denote by G the set
of all (benefit or profit) games and by CG the set of all cost games.
There are some properties of the characteristics function which, at first glance, if a game
satisfies them, then it seems that cooperation is profitable for the agents and hence the
possibility of cooperation exists. However, a more careful analysis is necessary as we will
see later.
For profit or benefit games the properties are the following:
- Monotonicity: if v(S) s v(T) for all S c T c N.
- Superadditivity: if v(S T) > v(S) + v(T) for all S, T c N such that S T = C.
- Convexity: if v(S T) + v(S T) > v(S) + v(T) for all S, T c N.
For cost games their counterparts can be written as:
- Monotonicity: if c(T) s c(S) for all S c T c N.
- Subadditivity: if c(S T) s c(S) + c(T) for all S, T c N such that S T = C.
- Concavity: if c(S T) + c(S T) s c(S) + c(T) for all S, T c N.
Given a game (N, v) (resp. cost game (N, c)) a distribution or allocation for it is a vector ze9
N
such that ( )
i
i N
z v N
e
s
_
(resp. ( )
i
i N
z c N
e
>
_
). We will denote by ( )
i
i S
z S z
e
=
_
. A distribution z
is called efficient if z(N)=v(N) (resp. z(N)=c(N)).
Game Theory 132
A solution for G (resp. CG) is a map :
N
G 9 (resp. :
N
CG 9 ) such that ( )
N
N, v c 9
for all (N, v)eG (resp. ( )
N
N, c c 9 for all (N, c)eCG) and z(N)=v(N) (resp. z(N)=c(N)) for all
ze( ) N, v . If o is always a single point then it is called value, otherwise it is called a set-
valued solution or simply a solution. A solution for a game is a set of efficient distributions
of the total gain or cost. One of the most outstanding solutions is the core. The core of a game
is the set of all coalitional stable distributions and, therefore, any coalition obtains at least
what the members of it can achieve by themselves. In formulas for benefit/profit games and
cost games respectively:
( ) { : ( ) ( ) ( ) ( )}
N
Core N, v z z S v S for all S N and z N v N = e9 > c = . (2)
( ) { : ( ) ( ) ( ) ( )}
N
Core N, c z z S c S for all S N and z N c N = e9 s c = . (3)
The distributions in the core of a game are interesting because there is no incentive for any
coalition to reject them. However, the core of a game can be empty. The games with non-
empty core are called balanced. (Shapley, 1971) proved that all convex games (resp. concave
for the case of cost games) have a non-empty core and hence they are balanced.
On the other hand, another interesting set of distributions is the imputation set. It is defined
as the set of all efficient and individually stable (or rational) distributions. In formulas for
benefit/profit games and cost games respectively:
( ) { : ( ) ( ) ( )}
N
i
I N, v z z v i for all i N and z N v N = e9 > e = . (4)
( ) { : ( ) ( ) ( )}
N
i
I N, c z z c i for all i N and z N c N = e9 s e = . (5)
Given a game (N, v) the marginal contribution of player i to coalition S (ieS) is given by
v(Si)v(S) (resp. c(Si)c(S)). Based on this concept another outstanding solution for
cooperative games is defined: the Shapley value (Shapley, 1953). For each player the Shapley
value is the average of all her possible marginal contributions. The mathematical expression
of the Shapley value is the following:
( ) ( ) ( ) ( )
!( 1)!
( ) ( )
!
i n
S N,i S
n
Sh N, v S v S i v S , i N
s n s
where S and s card S .
n

c e
= e (


= =
_
(6)
The Shapley always exists but does not belong to the core in general. However, (Shapley,
1971) proved that if the game is convex (resp. concave for cost games) then the Shapley
value is always in the core of the game.
(Schmeidler, 1969) introduced a value, called nucleolus, which always belongs to the core of
the game when it is non-empty. The definition of the nucleolus is based on the concept of
excess (or complaint) of a coalition with regard to a distribution. Given a game (N, v) (resp. (N, c)),
a coalition ScN and a distribution z, the excess of coalition S with regard to distribution z is
given by ( ) ( ) ( ) e S; z v S z S = (resp. ( ) ( ) ( ) e S; z z S c S = ). Likewise, we define u(z) as the vector
of all excesses with regard to z written in decreasing order. The nucleolus of a game (N, v)
(analogously for a cost game (N, c)) is defined as
Cooperative Logistics Games 133
{ }
( ) ( ) : ( ) ( ) ( )
L
nu N, v z I N, v z x for all x I N, v , u u = e s e (7)
where s
L
is the lexicographic order. Therefore, the nucleolus is the distribution that
minimises the maximal excess or complaint of all coalitions.
There are a number of different solutions for cooperative games in characteristic function
form. For this reason it is necessary to know which solutions are more suitable for a particular
situation. One way to understand the solutions better is through the properties they satisfy.
The main objective is to know which reasonable properties characterise each solution. Thus,
depending on which properties are meaningful or important in a particular situation, we
would be able to find out which solutions fit better too. Therefore, we can find many papers in
the literature characterising solutions for cooperative games using different sets of properties.
2.2 Cooperative Operations Research Games (ORGs)
Consider a system where there are one or more agents interested in optimising it. One way
to deal with this situation is to have recourse to Operations Research and we are then faced
with an operations research problem. The simpler situation is when there is only one agent
or decision-maker involved in the problem and, therefore, there is no conflict of interests. In
that case the analysis of the system is completed on the procurement of one optimal solution
for it using the appropriate optimisation techniques. However, it is not difficult to find that,
on many occasions, there would be more than one agent or decision-maker involved in the
system and, consequently, some kind of conflict of interests could arise. In that case, each
agent could own or control one or more parts of the system and if they wanted to optimise
the system then they should cooperate but, perhaps, they should agree on how to distribute
the profits/benefits or saved costs among themselves. Therefore, the analysis of the systems
does not end with the procurement of one optimal solution but it is necessary to go a step
further in order to convince the agents involved to cooperate, most likely, via a good
distribution of the profits or saved costs. One way to tackle this last step in the analysis is
using cooperative games.
Given an operations research problem A in which there is a finite set N of agents involved,
we define an associated cooperative game in characteristic function form (N, v
A
) in the
following way:
v
A
(C)=0,
v
A
(N)=Optval(A) and
v
A
(S)=Optval(A
S
) for all ScN,
(8)
where Optval(A) is the optimal value for problem A and Optval(A
S
) is the optimal value for
problem A
S
, where A
S
is the problem obtained using only the parts of problem A owned or
controlled by the agents in coalition S. In the case that problem A is a cost problem we can
analogously define the cost game (N, c
A
). These games are called (cooperative) operations
research games. Furthermore, if the operations research problems are related to logistics
situations then we will call them cooperative logistics games.
Once we have defined a cooperative game associated with an operations research problem,
then we could obtain different answers to the question of how to distribute the
profits/benefits or saved costs among the agents involved using the solutions defined for
cooperative games, such as the core, the Shapley value, the nucleolus, etc. Note that if we
only use the characteristic function of the game then we may lose some of the essence of the
Game Theory 134
problem. However, it would also be possible to think of the primal and dual optimal
solutions of the operations research problem to obtain distributions of the profits/saved
costs among the agents involved. Thus, in the latter approach, we would be considering, in
some manner, the particular features of the operations research problem. Of course, the
choice of one approach or another will depend on the particular situation.
Two examples of solutions based on the primal optimal solutions of the corresponding
operations research problems are the Bird solution for minimum cost spanning tree games
(Bird, 1976) which is based on the application of the Prim algorithm (Prim, 1957) and the
pairwise solutions for transportation games (Sanchez-Soriano, 2003 and 2006). The first is a
solution based on an algorithm while the second are solutions based directly on the optimal
solutions of the problem. Therefore, we have two different examples of how to use the
Operations Research techniques to obtain the distribution of the total profits/saved costs
among the agents taking part in the problem. In both cases the relationship between the
solution and the core of the game is studied.
Another possibility is to deal with the optimal solutions of the dual problem. Two examples
of this approach are (Shapley & Shubik, 1971) for assignment problems and (Owen, 1975) for
linear production problems. In the first paper, the authors proved that the core of the game
and the set of dual optimal solutions coincide. In the second paper, the inclusion of the set of
distributions based on the dual optimal solutions in the core of the game is demonstrated.
The set of distributions based on the dual optimal solutions is called the Owen set (van
Gellekom et al., 2000).
3. Transportation, distribution and warehouse sharing games
In this section we will study some transportation problems from the point of view of
cooperative games. We will start with the simplest transportation situation with only two
types of agents (suppliers and demanders) which we call two-sided transportation problem.
A problem of this kind describes three possible logistics situations of transportation of
goods: producers-retailers, producers-wholesalers or wholesalers-retailers. In each case, the
mathematical treatment of these is essentially the same. Secondly, we will analyse
transportation situations with three types of agents (suppliers, intermediates and
demanders) which we call three-sided transportation problems. A situation of this kind
corresponds to producers-wholesalers-retailers distribution problems. Finally, we will study
warehouse sharing problems in which the agents involved in the situation must share the
warehouses in order to optimise their transportation profits/costs.
3.1 Two-sided transportation games
Basically, a two-sided transportation problem consists of two sets of agents, called
producers and retailers, which produce and demand goods. Each producer produces a
quantity of goods and each retailer demands a certain amount of goods. The transport of the
goods from the producers to the retailers is costly (profitable) and, therefore, the main
objective is to transport the goods from the producers to the retailers at minimum cost (at
maximum profit). The way to achieve this objective is by means of cooperation, otherwise if
each agent would make decisions on their own, then the final result of the transportation
would be unpredictable and, perhaps, far from the optimal situation. Therefore, if
cooperation is profitable then this should be promoted through a good distribution of the
extra profits or saved costs.
Cooperative Logistics Games 135
Let P and R be the sets of producers and retailers respectively. We denote by p
i
the
production of goods of producer ieP and by d
j
the demand of goods of retailer jeR. The
unitary cost (resp. benefit) of transportation from producer i to retailer j is denoted by c
ij
(resp. b
ij
). The mathematical model of this problem can be described by:
min
0
ij ij
i P j R
ij i
j R
ij j
i P
ij
c x
s.t. : x p , i P
x d , j R
x , i P, j R
e e
e
e
s e
> e
> e e
__
_
_
(9)
where x
ij
is the number of units transported from producer i to retailer j.
Problem (9) has feasible solutions if it satisfies that
i j
i P j R
p d
e e
>
_ _
. However, if we consider
that each transported unit lead up to a benefit b (large enough to compensate any unitary
cost) then we can consider a maximisation problem with coefficients b
ij
=bc
ij
and relax the
second block of constraints by changing the direction of the inequalities. This new problem
has always got feasible solutions and that drawback is avoided. Therefore, from now on, we
will consider transportation problems with benefits instead of costs. Consequently, the
corresponding mathematical program is given by
max
0 .
ij ij
i P j R
ij i
j R
ij j
i P
ij
b x
s.t. : x p , i P
x d , j R
x , i P, j R
e e
e
e
s e
s e
> e e
__
_
_
(10)
Now, we can define a cooperative game in characteristic function form associated with each
(benefit) transportation problem T. The set of players N = PR and the characteristic
function v
T
is defined following the general formulas given in (8). The game (N, v
T
) is called
transportation game. Transportation games are superadditive but not convex in general.
Furthermore, the core of these games is always non-empty. On the other hand, if (u; w) is an
optimal solution for the dual problem of (10), then ((p
i
u
i
)
ieP
; (d
j
w
j
)
jeR
)eCore(N, v
T
). Therefore,
the Owenset(N, v
T
)={((p
i
u
i
)
ieP
; (d
j
w
j
)
jeR
)e9
P

R
: (u; w) is a dual optimal solution} is contained in
the core of the game. However, the core and the Owen set of transportation games do not
coincide in general (see Sanchez-Soriano et al., 2001). In (Thompson, 1980) the extreme
points of the Owen set that the author called core are studied.
In (Sanchez-Soriano, 2003 and 2006) the pairwise solutions for transportation games are
introduced. These solutions are based directly on the optimal solutions of the corresponding
transportation problem. Since transportation problems can have more than one optimal
solution, the pairwise solutions are set-valued (but discrete). However, on many occasions,
transportation problems have only one optimal solution and, hence, we could consider that
pairwise solutions are essentially values. The philosophy behind the pairwise solutions is
Game Theory 136
simply that the benefit obtained by each pair producer-retailer in an optimal solution is
distributed between them in some way. The proportion of benefit achieved for a player in a
pair producer-retailer will depend on the bargaining abilities of both or on their relative
weight (power) in the whole transportation system. When we assume that nothing is known
about the relative weights of the agents and, therefore, we could consider that they all have
the same weight, then we obtain the pairwise egalitarian solution. Given a weight vector t,
such that t
k
>0 for all keN, and an optimal solution x
*
for the corresponding problem (10),
the pairwise solution associated with t and x
*
is defined as follows:
* *
* *
(
(
i
i ij ij
i j j R
j
j ij ij
i j i P
ps ,x ) b x , i P
ps ,x ) b x , j R.
t
t
t t
t
t
t t
e
e
= e
+
= e
+
_
_
(11)
The pairwise solution with weight vector t for the game (N, v
T
) is defined as
PS
t
(N, v
T
)={ps(t,x
*
)e9
P

R
: x
*
eOpt(T)}, (12)
where Opt(T) is the set of all optimal solutions for the corresponding transportation problem
T.
On the other hand, we could use a more general concept as the weight systems (Kalai &
Samet, 1987) instead of a simple weight vector. A weight system on a set N is a pair (_, t)
where _ is a partition of N, (N
1
, N
2
, , N
q
), and t is a weight vector, whose coordinates are
ordered in the same order as the partition. Such that the weight of agents in N
h
is zero with
respect to the agents in N
k
if h<k. Inside of each N
h
each agent has a positive weight. In this
situation we can define the pairwise solution with weight system (_, t) for the game (N, v
T
),
PS
(
_
,
t
)
(N, v
T
), analogously to (11) and (12). The pairwise solutions do not belong to the core
of the game in general, but in (Sanchez-Soriano, 2006) it is proved that
( )
( )
( ) ( )
, T T
,
Core N, v PS N, v .
t
t
E
E
c
*
(13)
Therefore, each core allocation can be seen as a pairwise solution for particular weight
systems but there are, in general, pairwise solutions which do not belong to the core of the
corresponding transportation game.
Let us consider a transportation situation T with two producers (called A and B) and three
retailers (called 1, 2, and 3). The productions of A and B are 12 and 15 units respectively and
the demand of each retailer is 10 units. The unitary costs of transportation are c
A1
=3, c
A2
=5,
c
A3
=6, c
B1
=5, c
B2
=4 and c
B3
=3. And the unitary benefit obtained by each good is 9. Solving the
corresponding transportation problem (10), we obtain that the only optimal solution for the
(benefit) transportation problem is x
A1
=10, x
A2
=2, x
B2
=5, x
B3
=10 and x
ij
=0 otherwise. The
characteristic function of the game (N, v
T
) is the following:
v
T
(N)=153; v
T
(A123)=68, v
T
(B123)=85, v
T
(AB12)=110, v
T
(AB13)=120, v
T
(AB23)=100;
v
T
(A12)=68, v
T
(A13)=66, v
T
(A23)=46, v
T
(B12)=70, v
T
(B13)=80, v
T
(B23)=85, v
T
(AB1)=60,
v
T
(AB2)=50, v
T
(AB3)=60; v
T
(A1)=60, v
T
(A2)=40, v
T
(A3)=30, v
T
(B1)=40, v
T
(B2)=50,
v
T
(B3)=60; otherwise v
T
(S)=0.
Cooperative Logistics Games 137
In this case, Owenset(N, v
T
)={(48,75;20,0,10)}. We know that this allocation is in the core of
the game but it seems unfair with retailer 2 since this player contributes significantly to the
benefit of the grand coalition, in particular v
T
(N)v
T
(AB13)=33. As for the core of the game,
the segment comprised between the allocations (68,85;0,0,0) and (7,17;53,33,43) is contained
in the core of the game. Therefore, Core(N, v
T
) is larger than Owenset(N, v
T
). Likewise, if we
consider the following two weight systems (_
1
,t
1
)=({1,2,3},{A,B};(1,1,1,1,1)) and
(_
2
,t
2
)=({A,B,1,3},{2};(1,1,53/7,43/7,1)), then we obtain the following two pairwise solutions
1 1 2 2
( , ) ( , )
( ) {68,85;0,0,0)} and ( ) {7,17;53,33,43}
T T
PS N, v PS N, v
t t
= = . On the other hand, if we
simply consider t=(1,1;1,1,1), then we obtain the pairwise egalitarian solution
PS
(1,1;1,1,1)
(N,v
T
)={(34,42.5;30,16.5,30)} which, in this example, belongs to the core of the game.
Finally, if we consider the vector of weights t=(1,2;3,4,5), then we obtain the pairwise
solution PS
(1,2;3,4,5)
(N, v
T
)={(16.60,25.48;45.00,23.07,42.86)} which does not belong to the core
of the game.
3.2 Three-sided transportation games
A three-sided transportation problem consists of three sets of agents, called producers,
wholesalers and retailers, which produce, store and demand goods. Each producer
produces an amount of goods, each wholesaler has a capacity of storage and each retailer
demands a certain amount of goods. The transport of the goods from the producers to the
retailers via a wholesaler is costly (profitable) and, therefore, the main objective is to
transport the goods from the producers to the retailers via the wholesalers at minimum cost
(at maximum profit). We will call this situation the distribution problem. The same reasoning
about the interest of cooperation and the benefit approach holds for these problems.
Let P, W and R be the sets of producers, wholesalers and retailers respectively. We denote
by p
i
the production of goods of producer ieP, c
j
the capacity of storage of wholesaler j and
by d
k
the demand of goods of retailer keR. The unitary benefit of transportation from
producer i to retailer k via wholesaler j is denoted by b
ijk
. The mathematical program that
models this problem is the following:
max
0
ijk ijk
i P j W k R
ijk i
j W k R
ijk j
i P k R
ijk k
i P j W
ijk
b x
s.t. : x p , i P
x c , j W
x d , k R
x , i P, j W, k R
e e e
e e
e e
e e
s e
s e
s e
> e e e
_ _ _
_ _
__
_ _
(14)
where x
ijk
is the number of units transported from producer i to retailer k via wholesaler j.
Now, we can define a cooperative game in characteristic function form associated with each
distribution problem D. The set of players N = PWR and the characteristic function v
D
is
defined following the formulas in (8). The game (N, v
D
) is called distribution game.
On the one hand, in (Quint, 1991) it is shown that the core of m-sided assignment games can
be empty, therefore if we consider that the goods are indivisible then distribution games can
have empty cores. In this sense, there will be many distribution situations in which a core
Game Theory 138
allocation is not possible. Furthermore, the Owen set could consist of non efficient
allocations because the duality gap. However, we can always find reasonable allocations
based on the primal optimal solutions of problem (14), defined analogously as pairwise
solutions, which we call triplewise solutions.
On the other hand, if we consider that the goods are perfectly divisible then distribution
games have non-empty cores since the Owen set of these games is always non-empty and it
is contained in the core of the game. Of course, in distribution situations with perfectly
divisible goods, it is also possible to consider the triplewise solutions as reasonable
solutions.
We would like to point out that, in the case of two-sided transportation situation, we have
not distinguished between indivisible and perfectly divisible goods because the constraint
matrix in problem (10) is totally unimodular and therefore we can relax the indivisibility
condition when necessary.
Finally, (Perea et al., 2008) study from a cooperative standpoint a class of distribution
problems and prove that the corresponding cooperative games have non-empty core.
Likewise, the authors introduce two new solutions which satisfy certain interesting
properties related to fairness.
3.3 Warehouse sharing games
Now, we consider another situation, also related to transportation problems, in which there
are two or more distribution systems, each of them consisting of producers, warehouses and
retailers. In principle several producers and retailers could belong to different distribution
systems but the warehouses can only belong to one distribution system. In this situation the
distribution systems involved in the problem could share their warehouses in order to
increase the efficiency of all systems considered as a whole. Therefore, if cooperation is
profitable then this should be promoted through a good distribution of the extra profits or
saved costs. A similar reasoning about the benefit approach holds for these problems. We
will call these optimisation situations warehouse sharing problems.
Each distribution system faces the same optimization problem which is modelled as (14).
Likewise, if two or more distribution systems collaborate then the corresponding
optimisation problem is also modelled as (14). Therefore, we can approach this situation as
an operations research game.
Let D be the set of distribution systems and P
i
, W
i
and R
i
the sets of producers, warehouses
and retailers in distribution system ieD. We denote by p
if
the production of goods of
producer feP
i
, c
ig
the capacity of storage of warehouse geW
i
and by d
ih
the demand of goods
of retailer heR
i
. The unitary benefit of transportation from producer
i
i D
f P
e
e
*
to retailer
i
i D
h R
e
e
*
via warehouse
i
i D
g W
e
e
*
is denoted by b
fgh
. If one producer (resp. retailer)
belongs to more than one distribution system then, when these distribution systems
collaborate, the production (resp. demand) to take into account for that producer (resp.
retailer) is the sum of its productions (resp. demands). As it is not difficult to see, the
mathematical formulation of this problem is as (14).
Next, we can define a cooperative game in characteristic function form associated with each
warehouse sharing problem WS. In this case, the set of players N = D and the characteristic
function v
WS
is defined following the formulas in (8). The game (N, v
WS
) is called warehouse
sharing game.
Cooperative Logistics Games 139
In this kind of situation we can observe two levels of cooperation. On the one hand, we find
the cooperation among producers, warehouses and retailers inside of a distribution system.
And, on the other hand, we have the cooperation among the different distribution systems.
It is obvious that if we are only interested in the warehouse sharing game then similar
comments as in Sections 3.1 and 3.2 regarding the allocation of the extra benefits among the
agents involved can be done. However, if we are interested in the two levels simultaneously
considering the problem as a whole system then, perhaps, we may be dealing with a game
with a priori unions or restricted cooperation and, consequently, we should take into
account this fact in order to analyse this situation.
Finally, this situation can resemble the cooperation among supply chains with deterministic
productions/demands and without penalties and, therefore, it could be considered within
of the literature of supply chain games. However, we have considered its analysis more
appropriate as an operations research game because the mathematical model describing this
problem is close related to a three-sided transportation situation as we have shown. On the
other hand, several papers, in which different levels of cooperation (horizontal, vertical or
lateral) are analysed for transportation or supply chain situations, are (Cruijssen et al., 2007),
(Mason et al., 2007) and (Simatupang & Sridharan, 2002).
4. Supply chain games
For researchers in Operations Research and Economics, supply chains represent one of the
key issues which can be relied on. This section brings together a series of works, which
present different paradigms and results related to cooperative game theory as applied to
supply chain management. This comprises review oriented papers that look at the kind of
methodologies that have been applied, in addition to theoretical papers discussing new
developments and results. As a direct consequence of this, we hope that this section will
serve as a source for current and future researchers in this field.
Moreover, another aim of this part is to show the applicability of cooperative game theory as
a tool with which to analyse supply chains since a main feature of any supply chain is
cooperation. In particular, the central contribution of cooperative game theory is related to
determine a suitable allocation rule among the agents of that supply chain. However, we
would like to point out that the use of cooperative game theory to analyse problems in
supply chain management is a very recent development.
4.1 Definition of a supply chain
There are numerous definitions for the term supply chain. For example, (Christopher,
1998) defined this notion as network of organizations that are involved, through
upstream and downstream linkages, in the different processes and activities that produce
value in the form of products and services in the hands of the ultimate consumer. Whereas
(Ganeshan et al., 1999) define a supply chain as a system of suppliers, manufacturers,
distributors, retailers and customers where materials flow downstream from suppliers to
customers and information flows in both directions. On the other hand, supply chain
management is defined as a set of management processes (Leng & Parlar, 2005). However,
all definitions in the literature share the idea that supply chains are based on cooperation in
order to obtain a higher benefit. In fact, (Thun, 2005) claims that, in the future, competition
will take place between supply chains instead of between individual firms. In order to yield
Game Theory 140
the benefits related to cooperation, contracts for vertical cooperation must be established
within supply chains.
Nevertheless, the main drawbacks for the right supply chain management are two. First,
trust can be seen as the most critical factor of cooperation between firms (Poirer, 1999). In
this way, modelling supply chains via cooperative games can be important to analyse the
impact of rationality on the final allocation (Thun, 2005). Secondly, there is a phenomenon
commonly referred to as the bullwhip effect, which was first observed at P&G concerning
disposable diapers (Lee et al., 1997). Sharing information across the supply chain is a way to
mitigate its negative effects (Thun, 2005).
4.2 Examples of supply chain games
In this section we show two examples of situations related to supply chain management.
The first example is based on (Mller et al., 2002), while the second one is based on (Granot
& Sosic, 2003).
Example 1. We consider the usual newsvendor game where each agent (store) faces a
stochastic demand (of newspapers, for example). These demands are actually correlated,
although this fact has usually been ignored in the literature seeking simplicity. We will take
into account this feature of the game. So, any coalition of agents that faces a demand x and
orders a quantity y of newspapers incurs a cost as follows,
( )
( )
( )
, if
,
, if
h y x y x
y x
x y y x
|
t
>

<

, (15)
where h is the holding cost per unit of stocking more newspapers than are actually
demanded, and t is the opportunity cost related to not ordering enough newspapers.
Following with the description of the game, each agent i experiences a random demand X
i
.
For coalition S c N, we define the total demand as
S i
i S
X X
e
=
_
. For technical purposes, we
focus on random demands such that
( )
, E y X | ( <

. In this way, the optimal quantity
ordered by S is
( )
*
argmin ,
S S
y
y E y X | ( =

and coincides with the
( ) h t t + quantile of the
distribution of the random variable X
S
. Consequently, the value (cost) of the characteristic
function of coalition S in this kind of game is defined as ( )
( )
*
,
S S
C S E y X |
(
=
(

. Finally, let N
be the finite set of agents. In this way, we are able to define a cooperative game as (N, C).
Example 2. In this example we briefly show a three-stage game of a supply chain consisting
of n retailers, each of whom experiences a random demand for an identical product. Next
we explain the different steps of the game. Before the demand is realised, each player orders
her initial inventory in an independent way (first stage). After the demand is actually
realised, each player decides how much of their residual stock they wish to share with the
other retailers (second stage). In the final stage, a total profit should be allocated among the
players due to the fact that residual stocks are transhipped to meet the joint demand. In this
way, in the third (cooperative) stage, residual inventories are transhipped to meet residual
demands, and the additional profit has to be allocated among the retailers. Obviously, this
example excludes the possibility of storing at one or several shared warehouses.
Cooperative Logistics Games 141
4.3 Review of the literature on supply chain games
Many articles on supply chain management point towards the relevance of cooperation
among the supply chain members in order to increase the supply chain benefits and the
overall performance. However, only a few researchers so far have deployed cooperative
game theory to analyse the stability and rationality of collaboration within a supply chain.
Authors such as (Cachon & Netessine, 2004) have reviewed the literature describing supply
chain and game theory concluding that papers employing cooperative game theory have
been scarce, but are becoming more popular. Something similar has been pointed out in
other reviews such as (Leng & Parlar, 2005) and (Nagarajan & Sosic, 2008). This section is
partially based on these good reviews. Nevertheless, we have added very recent
publications on this issue which were not mentioned in those three reviews. On the other
hand, for a specific review of the literature on inventory centralization we refer to (Meca &
Timmer, 2008).
In 1961 (Chacko, 1961) analysed the impact of coalition formation between a multi-plant
multi-product manufacturing company, two suppliers and several customers.
Unfortunately, this paper did not become the starting point for the use of cooperative game
theory in supply chain. Twenty years later, one can find a paper mixing supply chain
management and cooperative game theory. (Jeuland & Shugan, 1983) explored the problem
of coordination of the members of a channel, which includes as a particular case the
manufacturer-retailer-consumer channel. They also proposed the form of the quantity
discount schedule that results in optimum channel profits. (Kim & Hwang, 1989) studied
how the supplier can formulate the terms of a quantity-discount pricing schedule, under the
assumption that the supplier behaves in an optimal way. In particular, they show the
formula for price and order size that maximises the sum of the profits of both agents and the
corresponding allocation between the parties.
(Gerchak & Gupta, 1991) analysed the effectiveness of four popular schemes of cost
allocation in the context of a continuous review order quantity reorder point (Q, r) inventory
system with complete back ordering. They also proposed a proportional method that has the
notable feature that any customers post-centralization share of overheads does not exceed
its costs without consolidation. Inspired by this paper, (Robinson, 1993) showed that the
best allocation rule proposed in (Gerchak & Gupta, 1991) does not necessarily belong to the
core. Furthermore, he also showed the formulation of the Shapley value for this game and
proved that this allocation rule does actually belong to the core.
(Wang & Parlar, 1994) proposed a single-stage game to model a particular inventory
problem where three retailers try to determine their optimal order amount. They assume
stochastic demands and substitutable products. In this context, they determine the
conditions that assure that the core of this game is non-empty.
So far the papers reviewed focus on horizontal cooperation in a supply chain. Nevertheless,
there are papers devoted to vertical cooperation. One example is the paper by (Li & Huang,
1995). They explored the simple (monopolistic) buyer-seller channel from a cooperative
approach. The authors showed the common incentives and the individual disincentives for
cooperation. A rule, based on quantity discount, is also proposed to implement a profit
sharing mechanism for achieving equal division of additional cooperative system profits.
In (Hartman & Dror, 1996) the cost allocation problem for the centralized and continuous-
review inventory system is studied. They proposed three necessary criteria (stability,
justifiability and polynomial computability) for appropriating selection of an allocation rule.
Game Theory 142
They showed that common allocation schemes may not meet the three criteria and
introduced a method that meets them all. Following this line, (Hartman et al., 2000)
considered a set of n stores with centralized ordering and inventory with holding and
penalty costs. They showed the (restrictive) condition under such a cooperative game has a
non-empty core and conjectured that the core is non-empty at least for independent
demands. (Hartman & Dror, 2003) proved the non-emptiness of the core for a single period
inventory game with n retailers experiences normally distributed, correlated individual
demands. On the other hand, (Mller et al., 2002) proved a stronger result than that
conjectured by (Hartman et al., 2000). In particular, they showed that the core of this type of
games is always non-empty regarding the joint distribution of the stochastic demands.
(Slikker et al., 2005) studied a more complex situation, called the general newsvendor game,
where the agents could use transhipments after demand is satisfied. Their main result states
that the general newsvendor game has a non-empty core.
(Anupindi et al., 2001) analysed a supply chain problem with n independent retailers of an
identical item for consumption. Each agent experiences a random demand and must order
their inventory before the demand is realised. After realising such a demand, some retailers
might meet their residual demand by means of the other retailers residual supplies. This
game is very similar to example number 2 above. Nevertheless, it is played as a
decentralised two-stage distribution model, whereas example 2 consists of three stages. In
addition, (Anupindi et al., 2001) assumed that all retailers will share all their residual
supply/demand in the second stage. Regarding the allocation schemes, these authors
suggested an allocation rule based on a dual solution for the transhipment problem. This
solution is always in the core of the game and, hence, it encourages the retailers not to form
coalitions. Later, (Granot & Sosic, 2003) extended the two-stage model of (Anupindi et al.,
2001) allowing each retailer to decide how much of their residual supply/demand they
would like to share with others in a third and final stage. They found that allocations based
on dual solutions will not induce the retailers to share their total residuals with others.
Furthermore, they proved that the Shapley value is a value-preserving allocation scheme,
i.e., it induces all the retailers to share their residual supply/demand in quantities that do
not result in a decrease in the total additional profit.
We now turn to vertical cooperation in supply chain problems and consider the paper of
(Raghunathan, 2003). This author studied a situation where a manufacturer and n retailers
share demand information. The author used the Shapley value to analyse the expected
manufacturer and retailer shares of the surplus generated from the cooperative game.
Mainly, (Raghunathan, 2003) showed that higher demand correlation increases the
manufacturers allocation and has the opposite result on the retailers.
Under horizontal cooperation, (Meca et al., 2004) studied a simple inventory model with n
retailers who experience deterministic demand. The firms can cooperate to reduce their
ordering costs. This approach is called the basic inventory model because it forms the basis
for a wide variety of inventory models. Also, the authors developed a proportional rule to
allocate joint ordering cost among the retailers. They showed that this rule leads to an
allocation in the core. For a more general study of holding games see (Meca, 2007).
(Hartman & Dror, 2005) studied the problem faced by the management of independent
stores, with a similar product, of cost management for a centralised operation of their
inventory. They modelled the centralised cost as a metric space obtained from the Cholesky
factorisation of the corresponding covariance matrix. They considered two cooperative
Cooperative Logistics Games 143
games, one based on optimal expected costs and another based on demand realisations. For
the first game, they showed that when holding and penalty shortage costs are identical and
normally distributed demands, the corresponding game has a non-empty core.
Unfortunately for the second game, they showed that even in the case of identical holding
and penalty costs the game might have an empty core.
(Klijn & Slikker, 2005) analysed a location-inventory model with m customers and n
distribution centres. Under this context, they proved the emptiness of the corresponding
cooperative game when demand processes are identically and independently distributed.
(Reinhardt & Dada, 2005) considered a problem with n firms who collaborate by pooling
their critical resources in order to make their cost structure more efficient. They proposed to
use the Shapley value as the allocation scheme among the players. For coalition symmetric
games, i.e., situations where the pooled savings depend on the sum of each players
demand, they introduced a pseudo-polynomial algorithm for its computation.
In a vertical cooperation framework, (Leng & Parlar, 2005) analysed an information-sharing
cooperative game involving a supplier, a manufacturer and a retailer. They derived the
necessary conditions for stability of each coalition. They also studied the implications of
using the Shapley value and the nucleolus as allocation schemes for this type of games.
More recently, (Dror & Hartman, 2007) analysed cost allocation in a multiple product
inventory system following an economic order quantity policy to order, where part of the
ordering cost is shared and part is specific to each item. They showed that if the part of the
ordering cost common to all items is not too small, then the core of the game is non-empty.
(Montrucchio & Scarsini, 2007) considered a newsvendor game with stochastic demand of a
single item. They proved that the game is balanced in great generality considering a
possibly infinite number of retailers. Under several conditions, they also showed that with a
continuum of retailers the core becomes a singleton.
Under vertical cooperation, (Guardiola et al., 2007) analysed a supply chain under
decentralised control with a single supplier and n retailers. They proved that the
cooperation in this game is stable and proposed a specific allocation rule that is always in
the core. This last point is important since the well-known Shapley value does not always
belong to the core for this type of games.
(Guardiola et al., 2009) introduced a new class of production-inventory games. Cooperation
among agents is given by sharing production processes and warehouses facilities. In this
context, the authors proved that the corresponding cooperative game is totally balanced and
the set of the Owen-allocations is a point (called the Owen point). Also, the authors showed
the relationship between the Owen point, the Shapley value and the nucleolus.
(zen et al., 2008) conducted a game-theoretical analysis of a supply chain with warehouses,
in which retailers have the chance of reallocating their product orders after the demand has
been met. In this context, the authors considered a cooperative game between the retailers.
They were able to prove that this game has a non-empty core.
(Chen & Zhang, 2009) demonstrated the power of stochastic programming duality approach
in studying stochastic inventory games. In fact, their approach is readily applicable to more
general models. In this context, as a main result, they showed that stochastic programming
provides a way to compute a solution in the core of this kind of games.
Finally, (zen et al., 2010) considered a simple newsvendor game and investigated the
convexity of this type of situations. Whereas it is known that the general newsvendor game
is not convex, they focused on the particular family of newsvendor games with independent
Game Theory 144
symmetric unimodal demand distributions. It allowed them to identify several interesting
subclasses containing convex games only.
4.4 Further research in supply chain management
We devote this section to suggesting several avenues for further follow-up research in
cooperative supply chain games. To this end, we show two interesting contexts related to
current and real supply chains. The first is based on (Plambeck & Taylor, 2005) and shows
the benefits from collaborating between a pharmaceutical company and a manufacturer. The
second context is inspired by the actual Spanish electricity market. We propose to analyse
the cooperation between electricity consumers, retailers and the network operator by means
of cooperative game theory. In a certain sense, such a framework generalises the approach
introduced in (Pettersen et al., 2005) for a single consumer, a single retailer and the network
operator in the Nordic electricity market. It is worth mentioning that both contexts are not
related to holding costs and inventory problems, a feature that is not usual in the supply
chain literature, as we have shown previously.
4.4.1 Contracting manufacturers in the pharmaceutical industry
As pointed out in (Plambeck & Taylor, 2005), firms in the pharmaceutical industry are
characterised by long developments cycles and intensive time-to-market pressure. In this
industry, any firm that produces its own drug must make a significant capital investment in
a plant before the product has completed regulatory trials. Unfortunately, if the drug finally
fails, then the plant belonging to the pharmaceutical company (PC) will have little value
(Tully, 1994). This drawback is usual in industries where production capacity is low in
contrast to their investment power. In this case, contract manufacturing offers the
opportunity to outsource production to contract manufacturers (CMs). They are able to pool
the total demand from many different pharmaceutical companies and, consequently,
achieve high capacity utilization.
Following (Plambeck & Taylor, 2005), we consider two symmetric PCs, j=1, 2, which are
developing a new drug. The price per unit when q
j
units are sold is M
j
q
j
. With probability
e, the product is successful and M
j
= H
j
where H
j
represents the potential market size.
Otherwise, M
j
= L with L < H
j
. On the other hand, each PC should invest in production
capacity c at a cost of k > 0 per unit before the demand is known. Furthermore, the marginal
cost of production is negligible.
Investments by the PCs in innovation (product development) may influence demand
through in two ways. On the one hand, increasing the potential market size, H
j
. On the
other hand, the probability that a drug passes clinical trials influences positively the final
success probability. We here consider the first case, i.e., when investment in innovation
influences H
j
. So, let f(H
j
) be the total cost function of innovation of firm j. It is also assumed
that this function is increasing at the market size, twice differentiable and convex. Each PC
selects a market size H
j
that maximizes its total expected profit, V
j
.
( )
( )
| |
( ) { } ( )
0 0,
max max 1 max
j j
j j j j j
H c q c
V e H c c e L q q kc f H
> e



= +
` `

)
)
. (16)
Consider now that the two PCs pool their production capacity in this game (c + c = 2c). In
other words, we assume that {PC
1
, PC
2
} is a coalition. In this way, the maximum expected
profit that they can achieve is
Cooperative Logistics Games 145
{ }
( ) { } ( ) ( )
{ }
1 2
1 2 1 2 1,2
, 0
max max , , 2
H H c
V R c H H kc f H f H
>
= , (17)
where
( ) ( ) ( ) { }
( )
( ) ( )
{ }
( )
| |
( ) { }
1 2
1 2
2
1 2 1 1 1 2 2 2
, 0
2
, 0
1,2
2
2
0,
, , max
1 max
1 2 max .
H L
H L
q q
q q c
j H H L L
q q
j
q q c
q c
R c H H e H q q H q q
e e H q q L q q
e L q q
>
+ s
>
=
+ s
e
= + +
+ +

_
, (18)
We now turn to the situation where an independent CM (player number 3) possesses the
capability for producing. We consider that the CM invests in production capacity at a cost of
k
CM
per unit, with k
CM
< k. Therefore, we are considering a situation slightly different of that
in (Plambeck & Taylor, 2005).
It is obvious that the CM alone achieves profit zero. This type of firm needs to collaborate
with at least one PC to get a strictly positive profit through the production of the final
product. Then, the joint profit for the coalition {j,3}, j=1,2, is equivalent to V
j
with k
CM
instead
of k. In the same manner, the profit of the grand coalition would be equivalent to V
{1,2}
with
k
CM
instead of k. Cooperative game theory is the natural way to allocate the value of the
grand coalition among all firms. In particular, it could be interesting to analyse stability of
cooperation between the pharmaceutical companies and the manufacturer and to look for
reasonable and fair distribution of the extra benefits among them.
4.4.2 Supply chain without storage: electricity games
Following the description of the Spanish Electricity Market we propose several games which
could be interesting to study. These games have the special feature that the electricity cannot
be stored and, therefore, in this context there is not holding or inventory costs. This aspect is
not usual in the supply chain literature.
In 1998, the Spanish government liberalised the market for generating electricity and
introduced a spot market for electricity. The basic design of this electricity spot market is
similar to the previously deregulated UK market and even closer to the Californian
electricity market that was deregulated at about the same time. A liberalised electricity
market was not new to Spain, as during the 1990s there had been a previous liberalisation of
other sectors, such as the media, telephony, oil and gas. In spite of the fact that de-
regularisation was a slow process which was not completed until 2009, it was not a process
that provided the electricity market with a large number of companies selling energy to
small consumers of power. The present situation in Spain continues to be one with few
companies on the market which stimulate competition and thereby bring about the expected
reduction in prices. The main characteristics of the Spanish electricity sector are the
existence of the wholesale Spanish generation market (Spanish pool), and the fact that all
consumers are considered to have qualified since 2003. This means that they can choose the
electricity company that supplies them with electricity and therefore participate in the pool
in an active manner. The electricity production market in Spain is organised around a series
of auctions and technical procedures for operating the system: Daily Market, Intradaily
Game Theory 146
Market, Bilateral Contracts, International Contracts, Technical Constraints, Technical
Management, etc. (see, for example, (Sancho et al., 2008)). Since 2006, bilateral contracts and
the forward market have become a larger part of the market. On the other hand, generation
facilities in Spain operate either under the Spanish ordinary regime or the Spanish special
regime. The electricity system must acquire all electricity offered by special regime
generators, which consist of small or renewable energy facilities, at tariffs fixed by Royal
Decree or Order that vary depending on the type of generation and are generally higher
than Spanish market prices. Ordinary regime generators provide electricity at market prices
to the Spanish pool and under bilateral contracts to qualified consumers and other suppliers
at agreed prices. Suppliers, including last resort suppliers, and consumers can buy electricity
in this pool. Foreign companies may also buy and sell in the Spanish pool. The market
operator and agency responsible for the markets economic management and bidding
process is the Electrical Market Operator (OMEL - www.omel.es). Market participants are
undertakings that are authorised to act directly in the electric power market as buyers and
sellers of electricity. The following can be market participants:
- Electric power distributors who come to the market to purchase the electricity needed
to supply consumers at regulated tariffs or to distributors who are supplied.
- Resellers: They go into the market to purchase power to sell to qualified consumers.
- Qualified consumers: They can purchase power directly in the organised market,
through a reseller, by signing a physical bilateral contract with a producer or by
continuing temporarily as a regulated tariff consumer.
Transmission companies and regulated distributors must provide network access to all
consumers that have chosen to be supplied on the free market. However, these consumers
must pay an access tariff to the distribution companies if such access is provided. The
electricity transport grid comprises transmission lines, stations, transformers and other
electrical equipment with a voltage superior to 220 KV, as well as other facilities, regardless
of their voltage, that provide transport or international and extra-peninsular
interconnections. Red Elctrica de Espaa (Spanish Electrical System Operator), REE -
www.ree.es, manages most of the transmission network in Spain. It is responsible for the
technical management of the Spanish electricity system with regards to developing the high
voltage network, in order to guarantee electricity supply and proper coordination between
the supply and transmission system, as well as the management of international electricity
flows. The systems operator carries out its duties in coordination with the market operator.
Liberalised suppliers are free to set a price for their consumers. The main direct activity
costs of these entities are the wholesale market price and the regulated access tariffs to be
paid to the distribution companies. Electricity generators and liberalised suppliers or
qualified consumers may also engage in bilateral contracts without participating in the
wholesale market. As from 2009, last resort suppliers, appointed by the Spanish
government, supply electricity at a regulated tariff set by the Spanish government to the last
resort consumers (low-voltage electricity consumers whose contracted power is less than or
equal to 10 KW). Since then, distributors cannot supply electricity to consumers.
All generation facilities that are not governed by the Spanish special regime are governed by
the Spanish ordinary regime. Under said ordinary regime, there are four methods of
contracting for the sale of electricity and determining a price for the electricity:
- Wholesale energy market or pool. This pool was created on January 1, 1998 and
includes a variety of transactions that result from the participation of market agents
Cooperative Logistics Games 147
(including generators, distributors, suppliers and direct consumers) in the daily and
intraday market sessions.
- Bilateral contracts. Bilateral contracts are private contracts between market agents,
whose terms and conditions are freely negotiated and agreed.
- Auctions for purchase options or primary emissions of energy. Principal market
participants are required by law to offer purchase options for a pre-established amount
of their power. Some of the remaining market participants are entitled to purchase such
options during a certain specified period.
- Energy Auctions for Last Resort Demand. Last resort suppliers in the Iberian Peninsula
can acquire electricity in the spot or forward markets to meet last resort demand.
However, beginning in 2007, these last resort suppliers were permitted to begin holding
energy auctions to purchase electricity at lower prices. Since 2003, all consumers have
become qualified consumers. All of them may now choose to acquire electricity under
any form of free trading through contracts with suppliers, by going directly to the
organised market or through bilateral contracts with producers.
With the coming into force of the Last Resort Supply in 2009, the integral tariff system has
been replaced by a last resort tariff system. Last resort tariffs are set on an additive basis and
can only be applied to low-voltage electricity consumers whose contracted power is less
than or equal to 10 KW. Last resort consumers can choose either to be supplied at last resort
tariffs or to be supplied in the liberalised market.
Within the regulatory framework, it is important to point out that there is very low, almost
insignificant, participation in the Spanish electricity market by small and medium
consumers. To this end, over the last few years, different independent system operators
(ISOs) in Europe, Oceania and North America are continuing the development of load
response programmes (LRPs) with the objective of changing electricity demand of large
power users. Nevertheless, some medium commercial or industrial users may submit offers
and bids in new energy markets thanks to lighter requirements for demand reduction with
levels of about 100kW (New York ISO or New England ISO). In addition, some ISOs
encourage the possibility of demand aggregation through commercial entities (see pilot
programmes developed by NYISO since 2002 for small load aggregators (ISO New England
Market - www.iso-ne.com) to reach the minimum level for the participation of users. As
with these international markets, in the medium term, commercial and aggregating
companies will have to offer users in Spain a selling price for power that fits in with the
consumption profile of a specific segment of customers (Verdu et al., 2006). They must also
offer customers various participation schemes in the demand which will allow the electricity
companies to group together sufficient levels of power to be able to buy energy on the
electricity market. At the same time, customers signing up to the schemes will receive
special offers to reduce or modify their consumptions levels (Valero et al., 2007).
After reading the description of the Spanish Electricity Market it is possible to think that
different games could be analysed. For example, in the literature there are many papers
analysing from a game theoretical standpoint the electricity auction-market (see, for
example, (Aparicio et al., 2008) and (Sancho et al., 2008) and their lists of references).
Another interesting problem is the game played by electricity consumers, retailers and the
network operator. In (Pettersen et al, 2005) this game for only one electricity consumer, one
retailer and one network operator is studied from a non-cooperative point of view. A
generalisation of this approach could be to consider a higher number of agents involved in
Game Theory 148
the game. Alternatively, this game could be studied from a cooperative point of view by
restricting the possibilities of cooperation in order to respect some level of competition in
the market.
Taking into account the possibility of bilateral agreements in the electricity market, the
horizontal cooperation among users or consumers could be an interesting problem to be
studied from a game theoretical point of view since, at first sight, collaboration among the
consumers could be profitable for them because, perhaps, all together could obtain better
electricity prices. In this context, we could consider two sides of the electricity market. One
side of the market would consist of the suppliers of electricity who should compete for
selling electricity. The other side of the market would consist of the consumers who could
collaborate in order to get a better position in the market. The analysis of this situation could
provide insights on the level of competition among the suppliers and the interests of
cooperation among the consumers.
The last game we would like to mention in this part is related to vertical cooperation. At first
glance the functioning of the electricity market with respect to small or renewable energy
facilities seems appropriate because the market is promoting the use of green energy.
However, this could provoke inefficiencies in the system such that a loss of productivity in
the firms because of a higher electricity cost. Therefore all agents involved in the electricity
market should collaborate in some sense. Of course, this cooperation should not imply a loss
of competition in the market but a re-structuring of some aspects of it, for example, the
determination of different quotes of electricity production depending on the energy source.
Likewise, in the analysis of this problem, the CO2 market implications or the production of
obnoxious residues might also be taken into account. In this situation, perhaps, a
cooperative game theoretical approach could be used in order to obtain some insight about
the electricity market.
5. Other logistics games
There are a considerable number of papers concerned with other situations related to
logistics problems. In this section we show some of these works as an example of the
magnitude and relevance of cooperative game theory in this question. In particular, we
focus on routing, packing and location games. For each category we will present some
approaches trying to illustrate their relationship with logistics. For this reason, we will pay
special attention to the modelling stage. In other words, we will try to explain how to go
from the logistics problem to cooperative game theory. Also, we will show the main results
of each contribution. For a specific revision of the literature on connection and routing
problems and cooperative game theory we refer to (Borm et al., 2001).
We start with a couple of problems related to routing (see (Borm et al., 2001), (Hamers et al,
1999) and (Potters et al., 1992)). First, we will study the classical Chinese postman game.
Second, we will discuss the well-known travelling salesman game. Both problems are
related to the logistics problem of how to design efficient routes to deliver the commodities
from the supply nodes to the demand nodes.
In the classical Chinese postman situation, a postman must deliver mail to each street of a city.
Obviously, she has to start and finish at the post office. Moreover, each street has an
associated cost, related to the time that the postman expends in each visit. The aim in this
problem is to select the optimal route. To describe mathematically this situation we need a 4-
tuple (N, G, v
0
, t), where N is the set of players (streets), G = (V, E) is a connected undirected
Cooperative Logistics Games 149
graph with vertex set V and edge set E, v
0
eV is the post office and t is a nonnegative cost
function. We denote a route for coalition S c N as (v
0
, e
1
, , e
k
, v
0
) , which starts and finishes
at the post office and visits each player in S at least once. Finally, D(S) represents the set of
all routes for coalition S.
The Chinese postman game (N, c) associated with the 4-tuple (N, G, v
0
, t) is defined from the
following cost function for every coalition S c N.
( )
( ) ( )
( )
( )
0 1 0
, ,..., ,
1
min
k
k
j
v e e v D S
j i S
c S t e t i
e
= e


=
`

)
_ _
. (19)
One result we would like to highlight is that this type of games need not be balanced. For
this reason, the Chinese postman game has been studied in the literature under several
additional constraints on the underlying graph: efficiency, bridge cluster symmetry,
condensation property and so on.
Regarding another routing situation, the travelling salesman problem is similar to the Chinese
postman problem but in this case there are a set of cities (vertices or nodes) which have to be
visited by the salesman and each link connecting two cities has a cost (distance, time, etc.).
The objective is to determine a route or tour that visits each city exactly once at minimal
cost. The travelling salesman problem can be described formally by means of a triple (N,0,t),
where N is the set of player as usual, 0 represents the home location and t is a nonnegative
cost function. The costs match the edges linking the vertices in N{0}. In this case, the
characteristic function of the cooperative game, which could be generated from the
travelling salesman problem, coincides with the minimal cost of a Hamiltonian circuit in the
graph associated with each coalition S. This type of game needs not be balanced, i.e., the
core could be empty. Nevertheless, (Potters et al., 1992) showed that the travelling salesman
game with three players have a non-empty core. Other authors have proved that games with
four and five players are balanced as well (see (Borm et al., 2001)).
We now turn to a different class of games: packing games. Imagine a set of manufacturers,
called A, and a set of transport companies, called B. Each firm ieA has an item of size a
i
,
while each individual in B possesses a truck of capacity b
j
. The items yield a profit
proportional to their size. Nevertheless, it is necessary for each item to be brought to a
certain market by means of a truck. Moreover, we assume that each truck can make only one
trip to the market. We can define a packing as an assignment of some items in A to the
trucks in B such that the total size does not exceed the total truck capacity. The value of a
packing coincides with the sum of the sizes of all packed items. In this way, a bin packing
problem has as a goal to determine a packing of maximal value. Cooperative game theory
tries to share the total profit among the individuals of sets A and B in a reasonable way.
(Faigle & Kern, 1993) introduced these games in the literature. They studied the emptiness
of the core, showing that (bin) packing games may be not balanced. Due to this fact, (Faigle
& Kern, 1993) used a generalisation of the core notion, called the c-core. The c-core of a game
(N,v) is defined as
( ) ( ) ( ) ( ) ( ) ( )
{ }
core : , 1 ,
n
v x R x N v N x S v S S N c c = e = > _ . (20)
Using this concept, (Faigle & Kern, 1993) proved that if v(N) 0, then the c -core is non-
empty for a value of c sufficiently large. Following (Faigle & Kern, 1993), (Kuipers, 1998)
Game Theory 150
showed what is the value of the minimal c such that the c-core is nonempty. Also, this
author studied, for a specific class of packing games, the minimal c such that all games in
this special class have a nonempty c-core. Also, for computational purposes, it is worth
noting that general bin packing situations are NP-complete problems. Nevertheless, the
constraint that all trucks have capacity 1 and that all items are strictly larger than 1/3 makes
the problem easier to solve.
For a more recent study of packing games and their applications see (Sanchez-Soriano et al.,
2002). There the authors analysed the transport system for university students in the
province of Alacant (Spain). The question is how to connect different villages and towns in
Alacant efficiently with the different university campuses. The authors proposed a possible
approach to model this situation. They also considered a particular cost sharing rule based
on the egalitarian solution.
In a realistic logistics problem, as the previous one, we could combine both the routing
problem and the packing problem because in some way they are closely related. In these
situations, we would be interested in determining the number of trucks or containers, taking
into account their capacities, and their routes to deliver the different possible commodities
from the supply nodes to the demand nodes at minimal cost. Of course, a previous logistics
problem, which could be considered, is the location of warehouses or factories in order to
improve the efficiency of a posterior delivery chain which would be related to the
combination of the routing and packing problems.
So, next, we briefly discuss location games. (Puerto et al., 2001) introduced a family of
cooperative games arising from continuous single facility location problems. In such a
situation, there are n users of a certain facility (for example, a hospital), placed in n different
points (towns) in
m
R , 1 m> . In this structure, the costs depend on the distances from the
users to the facility. We seek a location in
m
R for the facility that minimises the total
transportation cost. (Puerto et al., 2001) showed two sufficient conditions so that their
location game has a non-empty core. Also, they studied under which conditions the
proportional egalitarian solution provides core allocations for Weber and minimax
(continuous) location games. More recently, (Goemans & Skutella, 2004) deeply analysed
non-continuous location games. In such a problem, there is a set of F possible locations for
the facility/facilities and we have to decide which facility/facilities to build. In addition,
each user must be connected to an open facility. Both opening facilities and connecting users
have a fix cost. As above, the goal is to minimise the total cost of the system. In this context,
(Goemans & Skutella, 2004) established strong links between fair cost allocations and linear
programming relaxation. In particular, they proved that a fair cost allocation exists if and
only if there is no integrality gap for a corresponding linear programming relaxation. What
is much more interesting is that they also showed that it is in general NP-complete to decide
whether a fair allocation scheme exists and whether a given cost rule is fair.
6. Acknowledgements
J. Aparicio and S. Valero acknowledge the financial support from the Conselleria dEducacio
of Generalitat Valenciana through the project GV/2010/080. J. Sanchez-Soriano and N. Llorca
acknowledge the financial support from the Ministerio de Educacion y Ciencia of Spain through
the project MTM2008-06778-C02-01 and from the Conselleria dEducacio of Generalitat
Valenciana through the project ACOMP/2010/102.
Cooperative Logistics Games 151
7. References
Anupindi, R.; Bassok, Y. & Zemel, E. (2001). A general framework for the study of
decentralized distribution system. Manufacturing and Service Operations
Management, Vol. 3, No. 4, 349-368, ISSN 1523-4614
Aparicio, J.; Ferrando, J.C.; Meca, A. & Sancho, J. (2008). Strategic bidding in continuous
electricity auctions: an application to the Spanish electricity market. Annals of
Operations Research, Vol. 109, 41-60, ISSN 0254-5330
Bird, C. (1976). On cost allocation for a spanning tree: a game theoretic approach. Networks,
Vol. 6, 335-350, ISSN 1534-6457
Bondareva, O.N. (1963). Certain applications of the methods of linear programming to the
theory of cooperative games, Problemy Kibernet (Problems of Cybernetics), Vol. 10,
119-139, ISSN 0555-2567
Borm, P.; Hamers, H. & Hendrickx, R. (2001). Operations research games: a survey. TOP,
Vol. 9, No. 2, 139-216, ISSN 1134-5764
Cachon, G. & Netessine, S. (2004). Game theory in supply chain analysis, In: Handbook of
quantitative supply chain analysis: modeling in the ebusiness era, D. Simchi-Levi, S.D. Wu
and Z.-J. Shen, (Ed.), 13-66, Kluwer Academic Publishers, ISBN 1-4020-7952-4
Chacko, G.K. (1961). Bargaining strategy in a production and distribution system, Operations
Research, Vol. 9, 811-827, ISSN 1526-5463
Chen, X. & Zhang, J. (2009). A stochastic programming duality approach to inventory
centralization games. Operations Research, Vol. 57, No. 4, 840-851, ISSN 1526-5463
Christopher, M. (1998). Logistic and supply chain management: strategies for reducing cost and
improving service, Berrett-Kochler Publishers, ISBN 1-5767-5052-0, San Francisco
Claus, A. & Kleitman, D.J. (1973). Cost allocation for a spanning tree. Networks, Vol. 3, 289-
304, ISSN 1534-6457
Cruijssen, F., Dullaert, W., Fleuren, H. (2007). Horizontal cooperation in transport and
logistics: a literature review. Transportation Journal, 46 (3), 22-39, ISSN 0041-1612
Dror, M. & Hartman, B.C. (2007). Shipment consolidation: who pays for it and how much.
Management Science, Vol. 53, No. 1, 78-87, ISSN 0025-1909
Faigle, U. & Kern, W. (1993). On some approximately balanced combinatorial cooperative
games. Mathematical Methods of Operations Research, Vol. 38, 141-152, ISSN 1432-2994
Friedman, A. (1994) Differential Games, In: Handbook of Game Theory with Economic
Applications, Vol. 2, Aumann, R. & Hart, S. (Ed.), 781-799, North Holland, Elsevier
Science Publishers B.V., ISBN 0-444-89427-6
Ganeshan, R.; Jack, E.; Magazine, M.J. & Stephens, P. (1999). A taxonomic review of supply
chain management research, In: Quantitative Models for Supply Chain Management,
Tayur, S.; Ganeshan, R. & Magazine, M., (Ed.), 839-873, Kluwer Academic
Publishers, ISBN 0-7923-8344-3, Boston
Gerchak, Y. & Gupta, D. (1991). On apportioning costs to customers in centralized
continuous review systems. Journal of Operations Management, Vol. 10, No. 4, 546-
551, ISSN 0272-6963
Goemans, M.X. & Skutella, M. (2004). Cooperative facility location games. Journal of
Algorithms, Vol. 50, 194-214, ISSN 0196-6774
Granot, D. & Sosic, G. (2003). A three stage model for a decentralized distribution system of
retailers. Operations Research, Vol. 51, 771-784, ISSN 1526-5463
Guardiola, L.A.; Meca, A. & Timmer, J. (2007). Cooperation and profit allocation in
distribution chains. Decision Support System, Vol. 44, No. 1, 17-27, ISSN 0167-9236
Game Theory 152
Guardiola, L.A.; Meca, A. & Puerto, J. (2009). Production-inventory games: a new class of
totally balanced combinatorial optimization games. Games and Economic Behavior,
Vol. 65, No. 1, 205-219, ISSN 0899-8256
Hamers, H.; Borm, P.; Leensel, R. van de & Tijs, S. (1999). Cost allocation in the Chinese
postman problem. European Journal of Operational Research, Vol. 118, 153-163, ISSN
0377-2217
Hartman, B.C. & Dror, M. (1996). Cost allocation in continuous-review inventory models.
Naval Research Logistics, Vol. 43, 549-561, ISSN 0894-069X
Hartman, B.C. & Dror, M. (2003). Optimizing centralized inventory operations in a
cooperative game theory setting. IIE Transactions, Vol. 35, 243-257, ISSN 0740-817X
Hartman, B.C. & Dror, M. (2005). Allocation of gains from inventory centralization in
newsvendor environments. IIE Transactions, Vol. 37, 93-107, ISSN 0740-817X
Hartman, B.C.; Dror, M. & Shaked, M. (2000). Cores of inventory centralization games.
Games and Economic Behavior, Vol. 31, 26-49, ISSN 0899-8256
Jeuland, A.P. & Shugan, S.M. (1983). Managing channel profits. Marketing Science, Vol. 2,
239-272, ISSN 0732-2399
Kalai, E. & Samet, D. (1987). On weighted Shapley values. International Journal of Game
Theory, Vol. 16, 205-222, ISSN 0020-7276
Kalai, E. & Zemel, E. (1982). Totally balanced games and games of flow. Mathematics of
Operations Research, Vol. 7, 476-478 ISSN 0364-765X
Kim, K.H. & Hwang, H. (1989). Simultaneous improvement of suppliers profit and buyers
cost by utilizing quantity discount. Journal of the Operational Research Society, Vol. 40,
No. 3, 255-265, ISSN 0160-5682
Klijn, F. & Slikker, M. (2005). Distribution center consolidation games. Operations Research
Letters, Vol. 33, 285-288, ISSN 0167-6377
Kuipers, J. (1998). Bin packing games. Mathematical Methods of Operations Research, Vol. 47,
499-510, ISSN 1432-2994
Lee, H.L.; Padmanabhan, V. & Whang, S. (1997). The bullwhip effect in supply chains. Sloan
Management Review, Vol. 38, No. 3, 93-102, ISSN 1532-9194
Lemke, C.E. (1965) Bimatrix Equilibrium Points and Mathematical Programming.
Management Science, Vol. 11, No. 7, 681-689, ISSN 0025-1909
Leng, M. & Parlar, M. (2005). Game theoretic applications in supply chain management: a
review. INFOR, Vol. 43, No. 3, 187-220, ISSN 0315-5986
Li, S.X. & Huang, Z. (1995). Managing buyer-seller system cooperation with quantity
discount considerations. Computer and Operations Research, Vol. 22, No. 9, 947-958,
ISSN 0305-0548
Mason, R., Lalwani, L. & Boughton, R. (2007). Combining vertical and horizontal
collaboration for transport optimisation. Supply Chain Management: An International
Journal, 12 (3): 187-199, ISSN 1359-8546
Meca, A. (2007). A core-allocation family for generalized holding cost games. Mathematical
Methods of Operations Research, Vol. 65, No. 3, 499-517, ISSN 1432-2994
Meca, A.; Timmer, J.; Garcia-Jurado, I. & Borm, P. (2004). Inventory games. European Journal
of Operational Research, Vol. 156, 127-139, ISSN 0377-2217
Meca, A. & Timmer, J. (2008). Supply chain collaboration, In: Supply Chain: Theory and
Applications, Kordic, V., (Ed.), Chapter 1, I-Tech Education and Publishing, ISBN
978-3-902613-22-6, Vienna
Montrucchio, L. & Scarsini, M. (2007). Large newsvendor games. Games and Economic
Behavior, Vol. 58, No. 2, 316-337, ISSN 0899-8256
Cooperative Logistics Games 153
Mller, A.; Scarsini, M. & Shaked, M. (2002). The newsvendor game has a nonempty core.
Games and Economic Behavior, Vol. 38, 118-126, ISSN 0899-8256
Nagarajan, M. & Sosic, G. (2008). Game-theoretic analysis of cooperation among supply
chain agents: review and extensions. European Journal of Operational Research, Vol.
187, 719-745, ISSN 0377-2217
Owen, G. (1975). The core of linear production games. Mathematical Programming, Vol. 9,
358-370 ISSN 0025-5610
zen, U.; Fransoo, J.; Norde, H. & Slikker, M. (2008). Cooperation between multiple
newsvendors with warehouses. Manufacturing and Service Operations Management,
Vol. 10, No. 2, 311-324, ISSN 1523-4614
zen, U.; Norde, H. & Slikker, M. (2010). On the convexity of newsvendor games.
International Journal of Production Economics, forthcoming, ISSN 0925-5273
Perea, F., Puerto, J., Fernndez, F.R. (2008). Modeling cooperation on a class of distribution
problems. European Journal of Operational Research, 198 (3), 726-733, ISSN 0377-2217
Pettersen, E.; Philpott, A.B. & Wallace, S.W. (2005). An electricity market game between
consumers, retailers and network operators. Decision Support System, Vol. 40, 427-
438, ISSN 0167-9236
Plambeck, E.L. & Taylor, T.A. (2005). Sell the plant? The impact of contract manufacturing
on innovation, capacity, and profitability. Management Science, Vol. 51, No. 1, 133-
150, ISSN 0025-1909
Poirier, C.C. (1999). Advanced supply chain management: how to build sustained competitive
advantage, Berrett-Koehler Publishers, ISBN 0-7963-9344-3, London
Potters, J.; Curiel, I. & Tijs, S. (1992). Travelling salesman games. Mathematical Programming,
Vol. 53, 199-211, ISSN 0025-5610
Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell System
Technical Journal, Vol. 36, 1389, ISSN 0005-8580
Puerto, J.; Garcia-Jurado, I. & Fernandez, F.R. (2001). On the core of a class of location
games. Mathematical Methods of Operations Research, Vol. 54, 373-385, ISSN 1432-2994
Quint, T. (1991). The core of an m-sided assignment game. Games and Economic Behavior, Vol.
3, 487-503, ISSN 0899-8256
Raghavan, T.E.S. (1994) Zero-sum two-person games, In: Handbook of Game Theory with
Economic Applications, Vol. 2, Aumann, R. & Hart, S. (Ed.), 735-768, North Holland,
Elsevier Science Publishers B.V., ISBN 0-444-89427-6
Raghunathan, S. (2003). Impact of demand correlation in the value of and incentives for
information sharing in a supply chain. European Journal of Operational Research, Vol.
146, 634-649, ISSN 0377-2217
Reinhardt, G. & Dada, M. (2005). Allocating the gains from resource pooling with the
Shapley value. Journal of the Operational Research Society, Vol. 56, 997-1000, ISSN
0160-5682
Robinson, L. (1993). A comment on Gerchak and Guptas On Apportioning Costs to
Customers in Centralized Continuous Review Systems. Journal of Operational
Management, Vol. 11, 99-102, ISSN 0899-5682
Sanchez-Soriano, J. (2003). The pairwise egalitarian solution. European Journal of Operational
Research, Vol. 150, 220-231, ISSN 0377-2217
Sanchez-Soriano, J. (2006). Pairwise solutions and the core of transportation situations.
European Journal of Operational Research, Vol. 175, 101-110, ISSN 0377-2217
Game Theory 154
Sanchez-Soriano, J.; Llorca, N.; Meca, A.; Molina, E. & Pulido, M. (2002). An integrated
transport system for Alacants students. UNIVERCITY. Annals of Operations
Research, Vol. 109, 41-60, ISSN 0254-5330
Sanchez-Soriano, J. Lopez, M.A. & Garcia-Jurado, I. (2001). On the core of transportation
games. Mathematical Social Sciences, Vol. 41, 215-225, ISSN 0165-4896
Sancho, J.; Sanchez-Soriano, J.; Chazarra, J.A. & Aparicio, J. (2008). Design and
implementation of a decision support system for competitive electricity markets.
Decision Support System, Vol. 44, 765-784, ISSN 0167-9236
Schmeidler, D. (1969). The nucleolus of a characteristic function game. SIAM Journal of
Applied Mathematics, Vol. 17, 1163-1170, ISSN 0036-1399
Shapley, L.S. (1953). A value for N-person games, In: Contributions to the Theory of Games II,
Kuhn, H.W. & Tucker, A.W. (Ed.), 307-317, Princeton University Press, ISBN 978-0-
691-07935-6
Shapley, L.S. (1967). On balanced sets and cores. Naval Research Logistics Quarterly, Vol. 14,
453-460, ISSN 0028-1441
Shapley, L.S. (1971). Cores of convex games. International Journal of Game Theory, Vol. 1, No.
1, 11-26, ISSN 0020-7276
Shapley, L.S. & Shubik, M. (1971). The assignment game I: the core. International Journal of
Game Theory, Vol. 1, No. 1, 111-130, ISSN 0020-7276
Simatupang, T.M. & Sridharan, R. (2002). The collaborative supply chain. International
Journal of Logistics Management, 13 (1):15-30, ISSN 0957-4093
Slikker, M.; Fransoo, J. & Wouters, M. (2005). Cooperations between multiple news-vendors
with transshipments. European Journal of Operational Research, Vol. 167, 370-380,
ISSN 0377-2217
Thompson, G.L. (1980). Computing the core of a market game, In: Extremal methods and
systems analysis, Fiacco, A.V., Kortanek, K.O. (Ed.). Lecture notes in Economics and
Mathematical Systems, vol. 174, 312-334, Springer-Verlag, ISSN 0075-8442
Thun, J.-H. (2005). The potential of cooperative game theory for supply chain management,
In: Research Methodologies in Supply Chain Management, Kotzab, H.; Seuring, S.;
Mller, M. & Reiner, G., (Ed.), 477-491, Physica-Verlag Heidelberg, ISBN 3-7908-
1583-7, Germany
Tully, S. (1994). Youll never guess who really makes Fortune, Oct 3, 124-128, ISSN 0015-
8259
Valero, S.; Ortiz, M.; Senabre, C.; Alvarez, C.; Franco, F.J.G. & Gabaldon, A. (2007). Methods
for customer and demand response policies selection in new electricity markets.
Generation, Transmission & Distribution, IET, Vol. 1, No. 1, 104 110, ISSN 1751-8687
van Gellekom, J.R.G., Potters, J.A.M., Reijnierse, J.H., Engel, M. C. & Tijs, S.H. (2000).
Characterization of the Owen Set of Linear Production Processes. Games and
Economic Behavior, Vol. 32, 139-156, ISSN 0899-8256
Verdu, S.V.; Garcia, M.O.; Senabre, C.; Marin, A.G. & Franco, F.J.G. (2006). Classification,
Filtering, and Identification of Electrical Customer Load Patterns Through the Use
of Self-Organizing Maps. Power Systems, IEEE Transactions, Vol. 21, No. 4, 1672
1682, ISSN 0885-8950.
Wang, Q. & Parlar, M. (1994). A three-person game theory model arising in stochastic
inventory theory. European Journal of Operational Research, Vol. 76, 83-97, ISSN 0377-
2217
7
Stochastic Game Theory Approach to
Robust Synthetic Gene Network Design
Bor-Sen Chen, Cheng-Wei Li and Chien-Ta Tu
National Tsing Hua University
Taiwan
1. Introduction
The development of foundational technologies such as de novo DNA synthesis, milestone
experiments such as the computational re-design of enzymes, the opportunity to widely
recombine zinc fingers to re-program DNA-binding site specificity and the availability of
well-studied model regulatory system for the design of engineering-inspired molecular
devices provide a very powerful knowledge and technology basis for building novel
biological entities (Heinemann and Panke, 2006). Synthetic biology is to engineer artificial
biological systems to investigate natural biological phenomena and for a variety of
applications. Synthetic biology will revolutionize how we conceptualize and approach the
engineering of biological systems. The vision and applications of this emerging field will
influence many other scientific and engineering disciplines, as well as affect various aspects
of daily life and society (Andrianantoandro et al., 2006). Synthetic biology builds living
machines from the off-the-shelf chemical ingredients, utilizing many of the same strategies
that electrical engineers employ to make computer chips (Tucker & Zilinskas, 2006). The
main goal of the nascent field of synthetic biology is to design and construct biological
systems with the desired behavior (Alon, 2003; Alon, 2007; Andrianantoandro et al., 2006;
Church, 2005; Endy, 2005; Hasty et al., 2002; Heinemann & Panke, 2006; Kobayashi et al.,
2004; Pleiss, 2006; Tucker & Zilinskas, 2006). By a set of powerful techniques for the
automated synthesis of DNA molecules and their assembly into genes and microbial
genomes, synthetic biology envisions the redesign of natural biological systems for greater
efficiency as well as the construction of functional genetic circuit and metabolic pathways
for practical purposes (Andrianantoandro, et al., 2006; Ferber, 2004; Forster & Church, 2007;
Gardner, et al., 2000; Heinemann & Panke, 2006; Isaacs, et al., 2006; Maeda & Sano, 2006;
Tucker & Parker, 2000). Synthetic biology is foreseen to have important applications in
biotechnology and medicine (Andrianantoandro et al., 2006).
Though the engineering of networks of inter-regulating genes, so-called synthetic gene
networks, has demonstrated the feasibility of synthetic biology (Gardner et al., 2000), the
design of gene networks is still a difficult problem and most of the newly designed gene
networks cannot work properly. These design failures are mainly due to intrinsic
perturbations such as gene expression noises, splicing, mutation, uncertain initial states and
disturbances such as changing extra-cellular environments, and interactions with cellular
context. Therefore, how to design a robust synthetic gene network, which could tolerate
uncertain initial conditions, attenuate the effect of all disturbances and function properly on
Game Theory 156
the host cell, will be an important topic for synthetic biology (Alon, 2003; Alon, 2007;
Andrianantoandro et al., 2006; Batt et al., 2007; Church, 2005; Endy, 2005; Goulian, 2004;
Hasty et al., 2002; Heinemann & Panke, 2006; Kaznessis, 2006; Kaznessis, 2007; Kitano, 2002;
Kitano, 2004; Kobayashi et al., 2004; Pleiss, 2006; Salis & Kaznessis, 2006; Tucker & Zilinskas,
2006). Previously, sensitivity analysis has been used for analysis of the dynamic properties
of gene networks either in qualitative simulations of coarse-grained models or in extensive
numerical simulations of nonlinear differential equation models or stochastic dynamic
models (de Jong, 2002; Szallasi et al., 2006). For applications in synthetic biology, these
approaches are not satisfying. The local sensitivity analysis can provide only a partial
description of all possible behaviors of a nonlinear gene network. In particular, it cannot
guarantee that a synthetic gene network behaves as expected for all uncertain initial
conditions and disturbances. Moreover, obtaining all convergences of states and parameters
by extensive numerical simulations quickly becomes computationally intractable when the
size of the synthetic network grows (Batt et al., 2007).
An approach has recently been developed using semidefinite programming to partition the
parameter spaces of polynomial differential equation models into so-called feasible and
infeasible regions (Kuepfer et al., 2007). Following that, a robustness analysis and tuning
approach of synthetic networks was proposed to provide a means to assess the robustness
of the expected behavior of a synthetic gene network in spite of parameter variations (Batt et
al., 2007). This approach has the capability to search for parameter sets for which a given
property is satisfied through a publicly available tool called RoVerGeNe. Several gene
circuit design networks have been introduced to implement or delete some circuits from an
existing gene network so as to modify its structure for improving its robust stability or
filtering ability (Chen et al., 2008b; Chen & Chen, 2008; Chen & Wu, 2008). However, robust
synthetic gene network design is a different topic. It needs to design a complete man-made
gene network to be inserted into a host cell. Therefore, the synthetic gene networks should
be designed with enough robustness to tolerate uncertain initial conditions and to resist all
possible disturbances on the host cell so that they can function properly in a desired steady
state. This is a so-called robust regulation design that can achieve a desired steady state of
synthetic gene networks despite uncertain initial conditions and disturbances on the host
cell. Recently, robust synthetic gene network design has been developed from the robust
stabilization method (Chen & Wu, 2009) and minimax method (Chen, et al., 2009).
In this study, a robust regulation design of synthetic gene network is proposed to achieve a
desired steady state in spite of uncertain initial conditions, parameter variations and
disturbances on the host cell. Because most information of these uncertain factors on the
host cell is unavailable, in order to attenuate their detrimental effects, their worst-case effect
should be considered by the designer in the regulation design procedure from the worst
regulation error perspective. The worst-case effect of all possible initial conditions and
disturbances on the regulation error to a desired steady state is minimized for the robust
synthetic gene networks, i.e., the proposed robust synthetic gene network is designed from
the minimax regulation error perspective. The minimax design scheme is a simple robust
synthetic gene network design method because we do not need the precise information of
the initial conditions, parameter variations and disturbances on the host cell, which are not
easy to measure in the design procedure. This minimax regulation design problem for
robust synthetic gene networks could be transformed to an equivalent dynamic game
problem (Basar & Olsder, 1999; Chen et al., 2002). Dynamic game methods have been widely
applied to many fields of robust engineering design problems with external disturbances.
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 157
Recently, the application of dynamic game theory has been used for robust model matching
control of immune systems under environmental disturbances (Chen et al., 2008a). A robust
drug administration (control input) is designed to obtain a prescribed immune response
under uncertain initial states and environmental disturbances. In this study, the stochastic
game theory will be used for robust synthetic gene network design so that the engineered
gene network can work properly under uncertain initial conditions and environmental
disturbances on the host cell. The uncertain initial states and disturbances are considered as
a player doing his best to deteriorate the regulation performance from the worst-case point
of view, while the system parameters to be designed are considered as another player
optimizing the regulation performance under the worst-case deterioration of a former
player. Since the synthetic gene networks are highly nonlinear, it is not easy to solve the
robust synthetic gene network design problem directly by the nonlinear dynamic game
method directly. Recently, fuzzy systems have been employed to efficiently approximate
nonlinear dynamic systems to solve the nonlinear control problem (Chen et al., 1999; Chen et
al., 2000; Hwang, 2004; Li et al., 2004; Lian et al., 2001; Takagi & Sugeno, 1985). A Takagi-
Sugeno (T-S) fuzzy model (Takagi & Sugeno, 1985) is proposed to interpolate several
linearized genetic networks at different operating points to approximate the nonlinear gene
network via some smooth fuzzy membership functions. Then with the help of the fuzzy
approximation method, a fuzzy dynamic game scheme (Chen et al., 2002) is developed so
that the minimax regulation design of robust synthetic gene networks could be easily solved
by the techniques of the linear dynamic game theory, which can be subsequently solved by a
constrained optimization scheme via the linear matrix inequality (LMI) technique (Boyd et
al., 1994) that can be efficiently solved by the Robust Control Toolbox in Matlab (Balas et al.,
2008). Because the fuzzy model can approximate any nonlinear system, the proposed robust
regulation design method developed from the fuzzy stochastic game theory can be applied
to the robust regulation design problem of any synthetic gene network that can be
interpolated by a T-S fuzzy model. For comparison, the conventional optimal regulation
design method without considering the effect of disturbances is also proposed for the
synthetic gene network. Because the effect of disturbances is not attenuated efficiently, the
optimal regulation design method of synthetic gene networks is much influenced by the
disturbances on the host cell. Finally, an in silico example is given to illustrate the design
procedure and to confirm the efficiency and efficacy of the proposed minimax regulation
design method for robust synthetic gene networks.
2. Robust synthetic gene network design via stochastic game approach
First, for the convenience of problem description, a simple design example of a four-gene
network in (Batt et al., 2007) is provided to give an overview of the design problem of robust
synthetic gene networks. A more general design problem of robust synthetic gene networks
will be given in the sequel. Let us consider a robust regulation design problem of a cascade
loop of transcriptional inhibitions built in E. coli. (Hooshangi et al., 2005). The synthetic gene
network is represented in Fig. 1. It consists of four genes: tetR, lacI, cI and eyfp that code
respectively three repressor proteins, TetR, LacI and CI, and the fluorescent potein EYFP
(enhanced yellow fluorescent protein) (Batt et al., 2007). aTc (anhydrotetracycline) is the
input to the system. The fluorescence of the system, due to the protein EYFP, is the
measured output. The protein CI inhibits gene eyfp. The protein TetR inhibits gene lacI. The
protein LacI inhibits gene cI. The regulatory dynamic equations of the synthetic
transcriptional cascade in Fig. 1 are given as follows (Batt et al., 2007).
Game Theory 158

,0 1
,0 2
,0 3
,0 4
( ( ) ( ) ( ) ( ))
( )
( )
tetR tetR tetR tetR
lacI lacI lacI lacI tetR lacI aTc lacI tetR lacI aTc lacI lacI
cI cI cI cI lacI cI cI
eyfp eyfp eyfp eyfp cI eyfp eyfp
x k x w
x k k r x a u r x a u x w
x k k r x x w
x k k r x x w

= +
= + + +
= + +
= + +

(1)
with the uncertain initial conditions x
tetR
(0), x
lacI
(0), x
cI
(0) and x
eyfp
(0) in the host cell. k
tetR,0
,
k
lacI,0
, k
cI,0
and k
eyfp,0
are basal production rates of the corresponding proteins, which are
assumed to be given constants. k
lacI
, k
cI
and k
eyfp
are the production rate parameters while

tetR
,
lacI
,
cI
and
eyfp
are decay rate parameters of the corresponding proteins. The regulatory
functions r
lacI
, r
cI
and r
eyfp
are the Hill functions for repressors and a
lacI
for an activator.
The Hill function can be derived from considering the equilibrium binding of the
transcription factor to its site on the promoter region. For a repressor, Hill function is an S-
shaped curve which can be described in the form ( )
1 ( )
r
r
n
x
K
r x
|
=
+
.
r
| is the maximal
expression level of promoter. K
r
is the repression coefficient. The Hill coefficient n governs
the steepness of the input function. For an activator, Hill function can be described in the
form ( )
n
a
n n
a
x
a x
K x
|
=
+
.
a
| is the maximal expression level of promoter. K
a
is the activation
coefficient. n determines the steepness of the input function (Alon, 2007). w
1
, w
2
, w
3
and w
4
are the disturbances of the synthetic gene network, which denote the total of environmental
noises, modeling residuals, intrinsic parameter fluctuations in the host cell. Therefore, w
i
,
i=1~4 are assumed uncertain but bounded disturbances. The synthetic gene network design
is to specify k
lacI
, k
cI
, k
eyfp
and
tetR
,
lacI
,
cI
,
eyfp
such that the system states x
tetR
, x
lacI
, x
cI
and x
eyfp
can approach the desired states x
d1
, x
d2
, x
d3
and x
d4
, respectively, in spite of uncertain initial
conditions and disturbances.
If a synthetic gene network consists of n genes, then equation (1) can be extended to the
following n-gene network dynamics.
0 0
( , , ) ( ) , (0) x k f x k g u w x x = + + + =

(2)
where the state vector x denotes the concentrations of proteins in the synthetic gene
network. k
0
denotes the vector of basal production rates of the corresponding proteins.
f(x,k,) denotes the regulation vector of synthetic gene network, which is the function of
production rate parameters k and decay rate parameters to be designed. g(u) denotes the
input function to the synthetic gene network. w denotes the vector of stochastic
disturbances on the host cell, whose statistics may be unavailable. The initial condition x
0
is
assumed stochastic with unknown covariance. The robust synthetic gene network design is
to select parameters k and from feasible ranges so that the state vector x can approach a
desired state vector x
d
in spite of uncertain initial condition x(0) and disturbances w on the
host cell. i.e., x x
d
at the steady state despite uncertain x(0) and w. This is a robust
regulation problem of synthetic gene networks, i.e., the state vector x of synthetic gene
networks is robustly regulated to x
d
in the host cell.
Let us denote the regulation error as
d
x x x =

(3)
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 159
Then the regulation error dynamic system is given by
0
( , , ) , (0)
d
x f x x k v x x = + + =


(4)
where v=k
0
+g(u)+w denotes the total uncertain disturbance in the regulation error system
because these terms always fluctuate in the host cell and are not easily measured correctly.
Because of the uncertainty of v and (0) x

, the minimax regulation design method is an


efficient but simple design scheme for robust synthetic gene network. The uncertainty of
disturbance v and initial condition (0) x

in the following minimax design can be considered


as a player maximizing their effects on the regulation error in the following robust design
problem of synthetic gene networks (Basar and Olsder, 1999; Chen et al., 2002).
1 2
1 2
0
[ , ] (0),
[ , ]
0
min max
(0) (0)
f
f
t
T
t
k k k x v T T
E x Qxdt
E v vdt x x

e
e
(
(

(
+
(

)
)



(5)
where Q is the weighting matrix. In general, Q is a diagonal weighting matrix with
Q=diag([q
11
, q
22
, , q
nn
]) to denote the punishment on regulation error. If only the last state
x
n
is required to be regulated to achieve the desired steady state x
dn
, then we can let q
nn
=1
and q
11
=q
22
==q
n-1n-1
=0. [k
1
,k
2
] and [
1
,
2
] denote the allowable ranges of production rate
vector k and decay rate vector , respectively. The allowable ranges are determined by the
engineering biotechnologies of synthetic biology. k and to be designed can be considered
as another player minimizing the worst-case effect of (0) x

and v on the regulation error. If


the disturbances v and initial condition (0) x

are deterministic, then the expectation


operation E[ ] in (5) could be neglected.
The physical meaning of (5) is that the worst-case effect of uncertain (0) x

and v on the
regulation error x

must be minimized from the mean energy perspective by k and , which


are chosen from the allowable ranges. Therefore, for uncertain (0) x

and v, the robust


synthetic gene network design is to solve the minimax problem in (5) subject to the
regulation error dynamic system in (4). This is the so-called stochastic game problem in the
robust synthetic gene network design (Basar & Olsder, 1999).
In general, it is not easy to solve the nonlinear stochastic game problem in (5) subject to (4)
directly. It is always solved by a sub-minimax method. First, let the upper bound g
2
of (5) be
(Basar & Olsder, 1999; Chen et al., 2002)
1 2
1 2
0
2
[ , ] (0),
[ , ]
0
min max
(0) (0)
f
f
t
T
t
k k k x v T T
E x Qxdt
g
E v vdt x x

e
e
(
(

s
(
+
(

)
)



(6)
We will first solve the sub-minimax problem in (6) and then decrease the upper bound g
2
as
much as possible to approach its minimax solution. In general, the minimax problem in (6)
is equivalent to the following minimax problem (Basar & Olsder, 1999; Chen et al., 2002)
( )
1 2
1 2
2 2
0
[ , ]
[ , ]
min max (0) (0) , (0)
f
t
T T T
k k k v
E x Qx g v v dt g E x x x

e
e
(
(
s
(

)

(7)
Game Theory 160
where g
2
is to be minimized because it is the upper bound in (6) and should be as small as
possible to approach the minimax solution. Let us denote the cost function as
( )
2
0
( , , )
f
t
T T
J k r v E x Qx g v v dt
(
=
(

)

(8)
3. Design procedure and result
3.1 Sub-minimax design for robust synthetic gene networks
From the above analysis, the dynamic game problem in (6) or (7) is equivalent to finding the
worst-case disturbance v
*
which maximizes J(k,,v) and then the minimax k
*
and
*
which
minimize J(k,,v
*
) such that the minimax value J(k
*
,
*
,v
*
) is less than
2
[ (0) (0)]
T
g E x x

, i.e.
( ) ( )
1 2 1 2
1 2 1 2
* * * * 2
[ , ] [ , ]
[ , ] [ , ]
, , min , , min max ( , , ) (0) (0) , (0)
T
k k k k k k v
J k v J k r v J k v g E x x x


e e
e e
(
= = s


(9)
Hence, if there exist k
*
,
*
and v
*
such that the minimax design problem in (9) is solved, then
they can satisfy the minimax performance of the robust synthetic gene network design in (6)
as well. Therefore, the first step of robust synthetic gene network design is to solve the
following dynamic game problem:
( )
1 2
1 2
[ , ]
[ , ]
min max , ,
k k k v
J k v

e
e
(10)
subject to the error dynamic equation in (4). Since
( )
* * * 2
, , [ (0) (0)]
T
J k v g E x x s

according to
(9) and g
2
is the upper bound of the game in (6), the sub-minimax has to make g
2
as small as
possible, too.
From the above analysis, we obtain the following sub-minimax result for robust synthetic
gene network design.
Proposition 1: The sub-minimax synthetic gene network design is equivalent to solving the
following constrained optimization for k
*
and
*
,
1 2
1 2
2
[ , ]
[ , ]
min
k k k
g

e
e
(11)
subject to the following Hamilton-Jacobi inequality (HJI)
2
2
( ) 1 ( ) ( )
( , , ) 0
4
[ ( (0))] [ (0) (0)]
T T
T
d
T
V x V x V x
f x x k x Qx
x x x g
E V x g E x x

c c c | | | | | |
+ + + <
| | |
c c c
\ . \ . \ .
s




(12)
with ( ) 0 V x >

and the worst-case disturbance is given by


*
2
1 ( )
2
V x
v
x g
c
=
c

(13)
Proof: see Appendix A.
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 161
Remark 1:
1. From (6), g
2
is the upper bound of the game. In (11), we minimize the upper bound g
2
to
achieve the sub-minimax solution for robust synthetic gene networks.
2. The physical meaning of the constrained minimization in (11) and (12) is that we want
to specify k
*
and
*
from the allowable parameter ranges such that the upper bound g
2
is
as small as possible until no positive solution ( ) 0 V x >

of HJI in (12) exists.


At present, there exists no efficient analytic or numerical method to solve the HJI in (12) for
nonlinear stochastic system control or filtering designs (Zhang & Chen, 2006; Zhang et al.,
2005).
3.2 Minimax robust synthetic gene networks via fuzzy interpolation method
Because it is very difficult to solve the nonlinear HJI in (12), no simple approach is available
for solving the constrained optimization problem in (11) for the minimax robust synthetic
gene network design problem. Recently, the Takagi-Sugeno (T-S) fuzzy model has been
widely employed (Chen et al., 1999; Chen et al., 2000; Hwang, 2004; Takagi & Sugeno, 1985)
to approximate the nonlinear system via interpolating several linearized systems at different
operating points so that the nonlinear Nash stochastic problem could be transformed to a
fuzzy stochastic game problem (Chen et al., 2002). By using such approach, the HJI in (12)
can be replaced by a set of linear matrix inequalities (LMIs). In this situation, the nonlinear
stochastic game problem in (10) could be easily solved by the fuzzy dynamic method for the
robust design of sub-minimax design problem.
Suppose the nonlinear system in (4) could be approximated by a T-S fuzzy system (Takagi &
Sugeno, 1985). The T-S fuzzy model is a piecewise interpolation of several linearized models
through fuzzy membership functions. The fuzzy model is described by fuzzy if-then rules
and will be employed to deal with the nonlinear stochastic game problem for robust
synthetic gene network design under uncertain initial conditions and disturbances. The ith
rule of fuzzy model for nonlinear systems in (4) is of the following form (Chen et al., 1999;
Takagi & Sugeno, 1985).
Rule i:
If
1
( ) x t

is
1 i
F and ! and ( )
q
x t

is
iq
F ,
then ( , ) , 1, 2, ,
i
x k x v i L = + = A


" (14)
where F
ij
is the fuzzy set. A
i
(k,) is constant matrix with the elements of k and contained in
its entries. q is the number of premise variables and
1
, ,
q
x x

" are the premise variables. The
fuzzy system is inferred as follows (Chen et al., 1999; Chen et al., 2000; Li et al., 2004; Lian et
al., 2001; Takagi and Sugeno, 1985)
1
1
( ( ))[ ( , ) ( ) ]
( )
( ( ))
L
i i
i
L
i
i
x t k x t v
x t
x t

=
=
+
=
_
_
A

0
1
( ( )) ( , ) ( ) , (0)
L
i i
i
h x t k x t v x x
=
= + = (
_
A

(15)
Game Theory 162
where
1
( ( )) ( ( )),
q
i ij j
j
x t F x t
=
=
[

1
( ( ))
( ( )) ,
( ( ))
i
i
L
i
i
x t
h x t
x t

=
=
_

and ( ( ))
ij j
F x t

is the grade of
membership of ( )
j
x t

in
ij
F .
We assume
( ( )) 0
i
x t >

and
1
( ( )) 0
L
i
i
x t
=
>
_

(16)
Therefore, we get the following fuzzy basis functions
( ( )) 0
i
h x t >

and
1
( ( )) 1
L
i
i
h x t
=
=
_

(17)
The T-S fuzzy model in (15) is to interpolate L linear systems to approximate the nonlinear
system in (4) via the fuzzy basis functions ( ( ))
i
h x t

. We could specify system parameter


A
i
(k,) easily so that
1
( ( )) ( , )
L
i i
i
h x t k x
=
_
A

can approximate ( , , )
d
f x x k +

in (4) by the fuzzy


identification method (Takagi and Sugeno, 1985).
After the nonlinear system in (4) is approximated by the T-S fuzzy system in (15), the
nonlinear dynamic game problem in (10) is replaced by solving a dynamic game problem in
(6) subject to the fuzzy system (15).
Proposition 2: The sub-minimax robust synthetic gene network design is to solve k
*
and
*
by the following constraint optimization
1 2
1 2
2
[ , ]
[ , ]
min
k k k
g

e
e
(18)
subject to
2
2
1
( , ) ( , ) 0, 1, ,
, 0
T
i i
P k r k r P Q PP i L
g
P g I P
+ + + s =
s >
A A "
(19)
and the worst-case disturbance
*
v is given by
*
2
1
1
( )
L
i
i
v h x Px
g
=
=
_

(20)
Proof: see Appendix B.
By the fuzzy approximation, the HJI in (12) can be approximated by a set of algebraic
inequalities in (19). By Schur complement (Boyd et al., 1994), the constrained optimization
problem in (18)-(19) is equivalent to the following LMI-constrained optimization problem
1 2
1 2
2
[ , ]
[ , ]
min
k k k
g

e
e
(21)
subject to
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 163
2
2
( , ) ( , )
0, 1, 2, ,
, 0
T
i i
P k k P Q P
i L
P g I
P g I P

(
+ +
s = (
(

s >
A A
"
(22)
Remark 2:
1. The fuzzy basis functions ( )
i
h x

in (15) and (17) can be replaced by other interpolation


functions, for example, cubic spline functions.
2. By the fuzzy approximation, the HJI in (12) of nonlinear dynamic game problem can be
solved by Robust Control Toolbox in Matlab efficiently (Balas et al., 2008). The
constrained optimization in (18) and (19) can be solved by decreasing g
2
until there is no
positive definite solution P > 0 in (22) with k
*
e[k
1
,k
2
] and
*
e[
1
,
2
].
3. In the LMI-constrained optimization in (22) for the robust synthetic gene network
design, we do not need the statistics of initial conditions and disturbances on the host
cell, which are not easy to be measured. Therefore, the proposed method is simple but
robust for synthetic gene networks.
Remark 3:
For comparison, the conventional optimal regulation design is also proposed for synthetic
gene networks. If the effect of external disturbances and uncertain initial conditions on the
regulation error is not considered as (5) in the design procedure, i.e., only the following
optimal regulation design is considered.
1 2
1 2
0
[ , ]
[ , ]
min
f
t
T
k k k
E x Qxdt

e
e
(
(

)

(23)
subject to (4)
then we obtain the following sub-optimal regulation design for synthetic gene networks.
Proposition 3: The sub-optimal synthetic gene network design in (23) is to solve the
following constrained optimization
1 2
1 2
[ , ]
[ , ]
min ( (0))
k k k
E V x

e
e
(

(24)
subject to
( ) 1 ( ) ( )
( ) 0, ( , , ) 0
2
T
T
d
V x V x V x
V x f x x k r x Qx
x x x
c c c | | | |
> + + + <
| |
c c c
\ . \ .



(25)
Proof: see Appendix C.
Because it is not easy to solve the above HJI-constrained optimization for the sub-optimal
regulation design in (24) and (25), the fuzzy approximation method is needed to simplify the
design procedure. If the nonlinear error dynamic equation in (4) is represented by the fuzzy
interpolation system in (15), then the optimal synthetic gene network design in (23) is
equivalent to the following optimal regulation design problem.
1 2
1 2
0
[ , ]
[ , ]
1
min
subject to ( ) ( , )
f
t
T
k k k
L
i i
i
E x Qxdt
x h x k x v

e
e
=
(
(

= +
)
_
A


(26)
Game Theory 164
Proposition 4: The sub-optimal regulation design problem in (26) becomes how to solve the
following constrained optimization problem
( )
1 2
1 2
0
[ , ]
[ , ]
min Tr
k k k
PR

e
e
(27)
subject to
( , ) ( , )
0, 0, 1, 2, ,
2
T
i i
k P P k Q P
P i L
P I

(
+ +
> s =
(

(

A A
" (28)
where R
0
denotes the covariance matrix [ (0) (0)]
T
E x x

.
Proof: similar to the proof of Proposition 2.
Since the effect of stochastic disturbances on x

is not considered as (5) in the above sub-


optimal synthetic gene network design, the synthesized gene networks will be more
sensitive to the external disturbances or other uncertain factors. They will be compared with
the sub-minimax robust synthetic gene network in the simulation example.
Remark 4:
Since the effect of the disturbance v on the regulation error has not been attenuated
efficiently on the design procedure of the sub-optimal regulation in Proposition 3 and 4, the
disturbance will have much effect on the sub-optimal regulation design of synthetic gene
network. This property will be discussed and compared with the proposed robust synthetic
gene network in the design example in the following section.
According to the analyses above, a design procedure is developed for the proposed robust
synthetic gene network.
Design Procedure:
1. Give feasible parameter ranges [k
1
,k
2
] and [
1
,
2
] for production rate parameters k and
decay rate parameters , respectively, according to the biotechnology ability.
2. Give the desired steady state x
d
according to the design purpose and develop a
regulation error dynamic (4) for a synthetic gene network.
3. Construct a T-S fuzzy model in (15) to approximate the regulation error dynamic in (4).
Solve the constrained optimization problem from the ranges ke[k
1
,k
2
] and e[
1
,
2
] in (21)
and (22) for the robust synthetic gene network design k
*
and
*
, respectively according to the
sub-minimax scheme or solve the constrained optimization problem in (27) and (28) for the
sub-optimal regulation design.
3.3 Design example in silico for the proposed robust design
Consider the man-made synthetic gene network in the dynamic equations (1) (Batt et al.,
2007). The synthetic gene network is shown in Fig. 1. Where k
tetR,0
, k
lacI,0
, k
cI,0
and k
eyfp,0
are
basal production rates of the corresponding proteins, which are assumed to be 5000, 587, 210
and 3487, respectively (Batt et al., 2007; Hooshangi et al., 2005). k
lacI
, k
cI
and k
eyfp
are the
production rate parameters while
tetR
,
lacI
,
cI
and
eyfp
are the decay rate parameters of the
corresponding proteins in the host cell (i.e. E. coli.). In the robust synthetic gene network
design, we should select the parameters k and from feasible ranges so that the state of
synthetic gene network x
i
could approach a desired steady state x
d,i
for some biotechnical
purpose. r
lacI
, r
cI
and r
eyfp
are the decreasing Hill functions for regulations of repressors. a
lacI
is
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 165
an increasing function since aTc is an activator. The Hill function is a S-shaped curve
(Alon, 2007). u
aTc
is the input to the synthetic gene network system. We assume
anhydrotetracycline input concentration to be a constant value 10000 (i.e. u
aTc
= 10000). For
the convenience of simulation, we assume that extrinsic disturbances w
1
~w
4
are w
i
=[500n
1
10000n
2
100n
3
100000n
4
]
T
, where n
i
, i=1,2,3,4 are independent Gaussian white noises with
zero mean and unit variance.
From the robust synthetic gene network design procedure, we give the feasible parameter
ranges of production rate parameters k and decay rate parameters as follows (Batt et al.,
2007)
[0.05, 5]
[70,7000]
[0.01314, 0.1517]
[75, 8000]
[0.7617,7.2815]
[30, 30000]
[0.007, 0.067]
tetR
lacI
lacI
cI
cI
eyfp
eyfp
k
k
k

e
e
e
e
e
e
e
(29)
Then we give the desired steady states of the synthetic gene network are
x
d,i
=[1000,50000,300,500000]
T
, i=tetR, lacI, cI, eyfp. Then the regulation error dynamic
equation in (4) is developed for the synthetic gene network. Because it is very difficult to
solve the nonlinear HJI in (12), no simple approach is available to solve the constrained
optimization problem in (11) for robust parameters k
i
*
and
i
*
. We construct the T-S fuzzy
model in (15) to approximate the regulation error dynamic in (4) with the regulation error
dynamic systems state variables as the premise variables in the following.
Rule i:
If
1
( ) x t

is
1 i
F and
2
( ) x t

is
2 i
F and
3
( ) x t

is
3 i
F and
4
( ) x t

is
4 i
F ,
then ( , ) , 1, 2, ,
i
x k x v i L = + = A


"
where the parameters A
i
(k,) and the number of fuzzy rules is L=16. To construct the fuzzy
model, we need to find the operating points of the regulation error dynamic system. The
operating points for
1
x

are chosen at
11
-40 x = and
12
4040 x = . Similarly, the operating
points of
2 3 4
, , x x x

are chosen at
21
-38510 x = ,
22
381 x = ,
31
-16.7 x = ,
32
1686 x = ,
41
-441590 x = , and
42
4372 x = , respectively. For the convenience of design, triangle-type
membership functions are taken for Rule 1 through Rule 16. We create two triangle-type
membership functions for each state (see Fig. 2).
In order to simplify the nonlinear stochastic game problem of the robust synthetic gene
network, we just solve only the sub-minimax problem in (6) instead. With the help of fuzzy
approximation method and LMI technique, we can easily solve the constrained optimization
problem in (21) and (22) instead of the nonlinear constrained optimization problem in (11)
and (12) for the minimax robust synthetic gene network design. Finally, we obtain the upper
bound of the game in (6) g
2
= 0.847536 and a common positive definite symmetric matrix P
for (22) as follows
0.45842 -0.0079 0.0143 -0.00068
-0.0079 0.07186 -0.000557 0.00268
0.0143 -0.000557 0.04847 0.000718
-0.00068 0.00268 0.000718 0.0578
P
(
(
(
=
(
(
(

Game Theory 166
with the specified robust production rate parameters
*
7000
lacI
k = ,
*
4037.5
cI
k = and
*
30000
eyfp
k = and robust decay rate parameters
*
5
tetR
= ,
*
0.1517
lacI
= ,
*
4.0216
cI
= and
*
0.067
eyfp
= of the synthetic gene network. With these design parameters, the parameters A
i
of fuzzy model are described in Appendix D.
Figure 3 presents the simulation result for robust synthetic gene networks by using Monte
Carlo method with 50 rounds and with the uncertain initial values. x
1
(0)~x
4
(0) are assumed
normal-distributed random numbers with means 5000, 8000, 2000, 10000 and standard
deviations 500, 800, 200, 1000, respectively. As can be seen, the synthetic gene network has
robust regulation ability to achieve the desired steady state (black dashed line) in spite of
uncertain initial states and the disturbances on the host cell. Obviously, the robust synthetic
gene network by the proposed sub-minimax regulation design method has robust stability
to the uncertain initial conditions and enough filtering ability to attenuate the disturbances
on the host cell and can approach the desired steady states.
For comparison, we solve the sub-optimal regulation design problem in (27) and (28) for the
specified production rate parameters
*
70
lacI
k = ,
*
4037.5
cI
k = and
*
15015
eyfp
k = and decay
rate parameters
*
2.525
tetR
= ,
*
0.1517
lacI
= ,
*
7.2815
cI
= and
*
0.067
eyfp
= of the synthetic
gene network. The simulation result of conventional optimal regulation design is also
shown in Fig. 4. As can be seen, the conventional optimal regulation design of the synthetic
gene network is more sensitive to the initial conditions and disturbances and cannot achieve
the desired steady state under the uncertain initial conditions and disturbances.
Remark 5:
The experimental systems in the above example may not be fully observable. If we want to
know whether all state variables can approach to the desired states x
d
, several fluorescent
proteins (red, green and cyan colour) should be necessary to observe their protein
expressions of all state variables in the experimental design.
4. Discussion
Because the initial conditions and disturbances on the host cell are uncertain, to simplify the
design problem, a robust synthetic biology design is formulated as a stochastic game
problem in this study. The uncertain initial conditions and disturbances due to intrinsic and
extrinsic molecular noises on the host cell are considered as a player maximizing the
regulation error and the design parameters are considered as another player minimizing the
regulation error. In order to avoid solving HJI in the stochastic game theory-based design
problem, a T-S fuzzy interpolation method is introduced to simplify the design procedure of
robust synthetic gene networks via only solving a set of LMIs, which can be efficiently
solved by Robust Control Toolbox in Matlab.
In our study, we can select the weighting matrix Q=diag([q
11
, q
22
, q
33
, q
44
]) which denotes the
punishment on the corresponding tracking error x

. If we only need to achieve a desired


steady state x
d4
(EYFP), we just assign a value to the fourth diagonal element q
44
of the
weighting matrix Q and set q
11
=q
22
= q
33
=0. The rest of states x
1
~x
3
will not approach to the
given steady state x
d1
~x
d3
because of no any punishment. However, in this case, some
infeasible steady states of x
1
, x
2
, and x
3
may be obtained even an optimal x
4
can be achieved.
In this study, the desired steady states of x
1
, x
2
, and x
3
are given because we can avoid
obtaining infeasible steady states in x
1
, x
2
, and x
3
when an optimal x
4
is achieved. Further,
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 167
the undesired steady states of x
1
, x
2
, and x
3
may also have metabolic toxicity on host cell and
should be avoided. Since the steady states of x
1
~x
3
are not that important, the desired steady
states x
d1
~x
d3
can be adjusted within feasible ranges, so that the desired steady state x
d4
can
still achieve some optimization as possible. This kind of design can avoid hampering the
optimization of x
4
when x
1
, x
2
, and x
3
achieve some feasible steady states.
In our in silico design example, we can design the specified robust production rate
parameters k
i
*
and decay rate parameters
i
*
within the feasible parameter ranges to achieve
the desired steady states of the synthetic gene network. As for the biological
implementation, we could refer to standard biological parts in biological device datasheets
to construct the genetic circuits with the fine-tuned production rate parameters k
i
*
and decay
rate parameters
i
*
. In this way, synthetic biologists can increase efficiency of gene circuit
design through registries of biological parts and standard datasheets, which are developed
concerned with proper packing and characterizing of modular biological activities so that
these biological parts or devices with some desired characteristics may be efficiently
assembled into gene circuits (Canton, et al., 2008).
Quantitative descriptions of devices in the form of standardized, comprehensive datasheets
are widely used in many engineering disciplines. A datasheet is intended to allow an
engineer to quickly determine whether the behavior of a device will meet the requirements
of a system in which a device might be used (Canton, et al., 2008). Such a determination is
based on a set of standard characteristics of device behavior, which are the product of
engineering theory and experience. In the datasheets of engineering, the characteristics
typically reported are common across a wide range of device types, such as sensors, logic
elements and actuators. Recently, biological datasheets have been set as standards for
characterization, manufacture and sharing of information about modular biological devices
for a more efficient, predictable and design-driven genetic engineering science (Arkin, 2008;
Canton, et al., 2008). Because datasheets of biological parts or devices are an embodiment of
engineering standard for synthetic biology (Canton, et al., 2008), a good device standard
should define sufficient information about biological parts or devices to allow the design of
gene circuit systems with the optimal parameters. Datasheets contain a formal set of input-
output transfer functions, dynamic behaviors, compatibility, requirements and other details
about a particular part or device (Arkin, 2008; Canton, et al., 2008). Since parameters k
i
are
combinations of transcription and translation, they could be measured from the input-
output transfer functions and dynamic behaviors of biological parts or devices in biological
device datasheets. From properly characterized input-output transfer functions and
dynamic behaviors of parts or devices in biological device datasheets, an engineer can
estimate the corresponding parameters of biological parts or devices. When the biological
parts and devices in datasheets become more complete in future, we can rapidly select from
a vast list the parts that will meet our design parameters k
i
. Therefore we can ensure that
devices selected from datasheets can fit the optimal parameters and systems synthesized
from them can satisfy the requirements of design specifications for robust synthetic gene
networks.
In order to guarantee the biological feasibility of the calculated optimal parameters, the
ranges [k
1
, k
2
] and [
1
,
2
] of parameters should be determined by the whole parameters of
biological parts repositories (http://partsregistry.org/) so that the optimal parameters
Game Theory 168
selected within these ranges to minimize g
2
in equations (21) and (22) have biological
meaning, or equivalently from the whole biological parts in biological device datasheets, we
can find a set of biological parts whose parameters can minimize the g
2
in equations (21) and
(22) to achieve the robust optimal design of synthetic gene network.
In synthetic gene networks, there is much uncertainty about what affects the behavior of
biological circuitry and systems. For example, devices will perturb the cellular functions and
there are also likely to be parasitic and unpredictable interactions among components as
well as with the host. Since k
i
is a combination of promoter strength, ribosome binding site
and degradation of the transcript, there are some variations or uncertainties on the
parameter value k
i
. These variations or uncertainties of k
i
can be transformed to an
equivalent uncertain disturbance w
i
in equation (1) from the viewpoint of mathematic
model. The proposed robust minimax synthetic biology design method can predict the most
robust value of k
i
from the perspective of stochastic game. In our robust design method, we
dont need the statistics of these parameter uncertainties because the proposed synthetic
genetic network not only can achieve the desired steady state but also can tolerate the worst-
case effect due to these uncertain parameter variations and external noises on the host cell.
For comparison, a sub-optimal regulation design for synthetic gene network is also
developed for synthetic gene network. Because the sub-optimal regulation design cannot
efficiently attenuate the effect of uncertain initial conditions and disturbances on the
regulation, it is not suitable for robust synthetic gene networks with uncertain initial
conditions and disturbances on the host cell. As seen in the example in silico, the proposed
robust synthetic gene network can function properly in spite of uncertain initial conditions
and disturbances on the host cell. Design of more robust and complex genetic circuits is
foreseen to have important applications in biotechnology, medicine and biofuel production,
and to revolutionize how we conceptualize and approach the engineering of biological
systems (Andrianantoandro et al., 2006). Therefore, it has much potential for the robust
synthetic gene network design in the near future.
5. Tables and figures
Fig. 1. Synthetic transcription cascade loop in silico design example. aTc represses TetR, TetR
represses lacI, LacI represses cI, CI represses eyfp. aTc is the system input and the fluorescent
protein EYFP is the output.
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 169
Fig. 2. Membership functions for four states
1
, x

2
, x

3
x

and
4
x

.
Fig. 3. The robust synthetic gene network design with uncertain initial values and the
desired steady states [1000, 50000, 300, 500000]
T
d
x = . And with specified robust production
rate parameters
*
7000
lacI
k = ,
*
4037.5
cI
k = and
*
30000
eyfp
k = while the specified robust decay
rate parameters are
*
5
tetR
= ,
*
0.1517
lacI
= ,
*
4.0216
cI
= and
*
0.067
eyfp
= of the synthetic
gene network. The Monte Carlo simulation method is used with 50 rounds.
Game Theory 170
Fig. 4. The conventional optimal regulation design with uncertain initial values and the
desired steady states [1000, 50000, 300, 500000]
T
d
x = . And with specified production rate
parameters
*
70
lacI
k = ,
*
4037.5
cI
k = and
*
15015
eyfp
k = while the specified decay rate
parameters are
*
2.525
tetR
= ,
*
0.1517
lacI
= ,
*
7.2815
cI
= and
*
0.067
eyfp
= of the synthetic
gene network. It is seen that the conventional optimal regulation design of the synthetic
gene network is sensitive to the initial conditions and disturbances and cannot achieve the
desired steady states. The Monte Carlo simulation method is used with 50 rounds.
6. Appendixes
6.1 Appendix A: Proof of proposition 1
Let us consider a Lyapunov energy function ( ) 0 V x >

, then the cost function in equation (8)


is equivalent to

2
0
( )
( , , ) ( (0)) ( ( ))
f
t
T T
f
dV x
J k v E V x V x t x Qx g v v dt
dt

(
| |
= + +
( |
\ .

)


(A1)
By the chain rule, we get
( ( )) ( ( )) ( ) ( ( ))
( ( ( ) ( ), , ) ( ))
( ) ( )
T T
d
dV x t V x t dx t V x t
f x t x t k v t
dt x t dt x t

| | | |
c c
= = + +
| |
| |
c c
\ . \ .


(A2)
Substituting (A2) into (A1), we maximize ( , , ) J k v by the uncertain disturbance v
max ( , , )
v
J k v
2
0
( ) ( )
max ( (0)) ( ( )) ( , , )
f
T T
t
T T
f d
v
V x V x
E V x V x t x Qx g v v f x x k v dt
x x

(
|
|
c c | | | |
(
| = + + + +
| |
| ( c c
\ . \ .
.
\

)



Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 171
0
( )
max ( (0)) ( ( )) ( , , )
f
T
t
T
f d
v
V x
E V x V x t x Qx f x x k
x

|
c | |
= + + +
|

c
\ .
\

2
1 ( ) 1 ( ) 1 ( ) ( )
2 2 4
T
T
V x V x V x V x
gv gv dt
g x g x x x g
(
|
| | | |
c c c c | | | |
(
| +
| |
| |
| |
| ( c c c c
\ . \ .
\ . \ .
.



2
0
( ) 1 ( ) ( )
( (0)) ( ( )) ( , , )
4
f
T T
t
T
f d
V x V x V x
E V x V x t x Qx f x x k dt
x x x g

(
|
|
c c c | | | | | |
(
| = + + + +
| | |
| ( c c c
\ . \ . \ .
.
\

)



(A3)
with the worst-case disturbance
*
2
1 ( )
2
V x
v
x g
c
=
c

.
By the inequality in (12), it is seen that ( (0)) V x

is the upper bound of (A3) i.e., the sub-


minimax problem becomes how to solve the following constrained optimization problem
1 2 1 2
1 2 1 2
[ , ] [ , ]
[ , ] [ , ]
min max ( , , ) min ( (0))
k k k k k k v
J k v E V x

e e
e e
s (

(A4)
subject to (12) and ( ) 0 V x >

.
By the fact in (9),
2
(0) (0)
T
g E x x
(


is the upper bound of
1 2
1 2
[ , ]
[ , ]
min max ( , , )
k k k v
J k v

e
e
. Therefore
( (0)) E V x (

in (A4) should be bounded by


2
(0) (0)
T
g E x x
(


, i.e.
2
( (0)) (0) (0)
T
E V x g E x x
(
s (



.
Therefore the suboptimal solution is to minimize its upper bound. Hence, the sub-minimax
problem in (A4) could be replaced by
1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2
2 2
0
[ , ] [ , ] [ , ] [ , ]
[ , ] [ , ] [ , ] [ , ]
min max ( , , ) min ( (0)) min (0) (0) min Tr( )
T
k k k k k k k k k k k k v
J k v E V x E g x x g R

e e e e
e e e e
(
s s = (



(A5)
where
0
Tr( ) R denotes the trace of
0
R and
0
R denotes the covariance of the initial condition
(0) x

i.e.,
0
(0) (0)
T
R E x x
(
=


, which is independent of the choice of k and . Therefore, the
sub-minimax design problem is equivalent to solving the following constrained
optimization
1 2
1 2
2
[ , ]
[ , ]
min
k k k
g

e
e
subject to (12) and ( ) 0 V x >

.
6.2 Appendix B: Proof of proposition 2
We replace error dynamic system in (4) by its fuzzy interpolation system in (15). Then HJI in
(12) can be represented by
2
1
( ) 1 ( ) ( )
( ) ( , ) 0
4
T T
L
T
i i
i
V x V x V x
h x k x x Qx
x x x g

=
| |
c c c | | | | | |
+ + <
|
| | |
|
c c c
\ . \ . \ .
\ .
_
A



(B1)
Game Theory 172
Let us choose the Lyapunov function ( ) V x

as ( )
T
V x x Px =

for some positive definite
symmetric matrix P and substitute it into (B1). Then we get
2
1
2
1
( ) ( , ) ( , ) 0
L
T T
i i i
i
h x x P k r k r P Q PP x
g
P g I
=

| |

+ + + s |
`
|
\ .
)
s
_
A A

(B2)
where the property in (17) is used.
It is seen that the inequalities in (19) implies (B2). Therefore, the sub-minimax design for the
fuzzy equivalent system becomes how we solve the constrained optimization in (18) and
(19). By substituting ( )
T
V x x Px =

into (13), we get the worst-case disturbances
*
v in (20).
6.3 Appendix C: Proof of proposition 3
Again, let us consider a Lyapunov energy function ( ) 0 V x >

, then the equation (23) is


equivalent to

1 2
1 2
0
[ , ]
[ , ]
min
f
t
T
k k k
E x Qxdt

e
e
(
(

)


1 2
1 2
0
[ , ]
[ , ]
( )
min ( (0)) ( ( ))
f
t
T
f
k k k
dV x
E V x V x t x Qx dt
dt

e
e
(
| |
= + +
( |
\ .

)


1 2
1 2
0
[ , ]
[ , ]
( ) ( )
min ( (0)) ( ( )) ( , , )
f
T T
t
T
f d
k k k
V x V x
E V x V x t x Qx f x x k v dt
x x

e
e
(
| |
c c | | | |
( | = + + + +
| |
|
c c (
\ . \ .
\ .

)



By the fact that 2
T T T
a b a a b b s + for any two-vectors a and b , we get
1 2 1 2
1 2 1 2
0 0
[ , ] [ , ]
[ , ] [ , ]
( )
min min ( (0)) ( ( )) ( , , )
f f
T
t t
T T
f d
k k k k k k
V x
E x Qxdt E V x V x t x Qx f x x k
x

e e
e e

|
c | |
(
= + + +
|
(

c
\ .
\

) )

1 ( ) ( ) 1
2 2
T
T
V x V x
v v dt
x x
(
|
c c | | | |
( | + +
| |
|
c c (
\ . \ .
.



By the inequality in (25), we get the sub-optimal regulation problem as follows
1 2 1 2
1 2 1 2
0 0
[ , ] [ , ]
[ , ] [ , ]
1
min min ( (0))
2
f f
t t
T T
k k k k k k
E x Qxdt E V x v vdt

e e
e e
(
(
s +
(
(


) )

Since disturbance v is independent of the choice of parameters k and , and only the
choice of ( ) V x

will influence the above minimization, the sub-optimal design becomes how
to solve the constrained optimization problem in (24) and (25).
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 173
6.4 Appendix D: Parameters of the T-S fuzzy model with the specified kinetic parameters
*
k and decay rates
*

1
-1.6879 -0.060601 0.11879 -0.0092833
0.38914 -0.093297 0.010249 -0.0065119
0.10826 -0.02841 -1.4996 -0.0060343
0.00097167 -0.0025457 0.0053402 -0.066832
(
(
(
=
(
(
(

A
2
-3.5629 -0.12704 0.25074 -0.0092833
0.20138 -0.193 0.021476 -0.0065109
0.22906 -0.11458 -3.1644 -0.0060447
0.0014069 -0.00054073 0.0071284 -0.066833
(
(
(
=
(
(
(

A
3
-1.5351 -0.060529 0.11879 -0.0092832
0.40408 -0.092851 0.010249 -0.0065573
-0.18285 0.0322 -1.4996 0.0041303
0.0012516 -0.0027325 -0.0017594 -0.066801
(
(
(
=
(
(
(

A
4
-3.2403 -0.12689 0.25074 -0.0092832
0.23298 -0.19466 0.021741 -0.0065573
-0.38598 0.039126 -3.1671 0.0041304
0.0019632 0.00067731 -0.00013562 -0.066801
(
(
(
=
(
(
(

A
5
-3.5287 -0.060601 0.24784 -0.0093278
0.19497 -0.093286 0.017006 0.0019312
0.22212 -0.080273 -3.1614 -0.0042428
0.001744 -0.0025529 0.0072707 -0.067233
(
(
(
=
(
(
(

A
6
-7.4489 -0.12704 0.52318 -0.0093278
-0.18351 -0.1982 0.095548 0.0014778
0.21344 -0.11298 -7.2861 0.00040939
-0.012439 0.0026832 -0.025864 -0.066952
(
(
(
=
(
(
(

A
7
-3.2061 -0.060529 0.24784 -0.0093277
0.22649 -0.092851 0.01727 0.0018016
-0.38483 -0.019544 -3.1642 0.0068314
0.0023517 -0.0027325 6.7334e-006 -0.067149
(
(
(
=
(
(
(

A
8
-6.768 -0.12689 0.52318 -0.0093277
-0.14191 -0.19465 -0.023172 0.0018026
-0.81178 -0.012738 -6.0679 0.0068211
0.0043172 0.00067013 0.040657 -0.06715
(
(
(
=
(
(
(

A
Game Theory 174
9
-1.6879 0.12793 -0.25078 0.019598
-0.727 -0.07319 -0.026022 -0.003619
0.10826 -0.031432 -0.80806 -0.005567
0.00097182 -0.0027504 0.0047284 -0.066801
(
(
(
=
(
(
(

A
10
-3.5629 0.26819 -0.52934 0.019598
-0.91465 -0.15344 -0.05495 -0.003619
0.22793 -0.094274 -1.7058 -0.005567
0.0013385 0.00063963 0.0057541 -0.066801
(
(
(
=
(
(
(

A
11
-1.5351 0.12778 -0.25078 0.019598
-0.71206 -0.073303 -0.026022 -0.0036189
-0.18285 0.034225 -0.80806 0.0041294
0.0012516 -0.0026058 -0.0023716 -0.066797
(
(
(
=
(
(
(

A
12
-3.2403 0.26787 -0.52934 0.019598
-0.88316 -0.15367 -0.054951 -0.0036189
-0.38597 0.043382 -1.7058 0.0041294
0.0019634 0.00094337 -0.0013455 -0.066797
(
(
(
=
(
(
(

A
13
-3.5287 0.12793 -0.52322 0.019692
-0.92106 -0.07319 -0.058507 0.0047537
0.22099 -0.083177 -1.7026 -0.0029125
0.0016756 -0.0027503 0.0059171 -0.067149
(
(
(
=
(
(
(

A
14
-7.4489 0.26819 -1.1045 0.019692
-1.3492 -0.15343 -0.12363 0.0047547
0.72194 -0.14614 -3.593 -0.0029229
0.018292 0.00063245 0.0083449 -0.06715
(
(
(
=
(
(
(

A
15
-3.2061 0.12778 -0.52322 0.019692
-0.88965 -0.073303 -0.058507 0.0047076
-0.38483 -0.01752 -1.7026 0.0073033
0.0023519 -0.0026058 -0.0011826 -0.067117
(
(
(
=
(
(
(

A
16
-6.768 0.26787 -1.1045 0.019692
-1.2579 -0.15367 -0.12336 0.0047076
-0.81291 -0.0083629 -3.5957 0.0073033
0.0042487 0.00094338 0.0010809 -0.067117
(
(
(
=
(
(
(

A
7. References
Alon, U. (2003) Biological networks: The tinkerer as an engineer, Science, 301, 1866-1867.
Alon, U. (2007) An Introduction to Systems Biology: Design Principles of Biological Circuits.
Chapman & Hall/CRC.
Stochastic Game Theory Approach to Robust Synthetic Gene Network Design 175
Andrianantoandro, E., Basu, S., Karig, D.K. and Weiss, R. (2006) Synthetic biology: new
engineering rules for an emerging discipline, Molecular Systems Biology, 2, 1-14.
Arkin, A. (2008) Setting the standard in synthetic biology, Nature Biotechnology, 26, 771-774.
Balas, G., Chiang, R., Packard, A. and Safonov, M. (2008) Robust Control Toolbox Users Guide.
The MathWorks, Inc., Natick, MA.
Basar, T. and Olsder, G.J. (1999) Dynamic noncooperative game theory. Society for Industrial
and Applied Mathematics; 2
nd
edition. The MathWorks, Inc., Natick, MA.
Batt, G., Yordanov, B., Weiss, R. and Belta, C. (2007) Robustness analysis and tuning of
synthetic gene networks, Bioinformatics, 23, 2415.
Boyd, S., El Ghaoui, L., Feron, E. and Balakrishnan, V. (1994) Linear Matrix Inequalities in
System and Control Theory. Society for Industrial and Applied Mathematics,
Philadelphia.
Canton, B., Labno, A. and Endy, D. (2008) Refinement and standardization of synthetic
biological parts and devices, Nature Biotechnology, 26, 787-793.
Chen, B.S., Chang, C.H. and Chuang, Y.J. (2008a) Robust model matching control of immune
systems under environmental disturbances: Dynamic game approach, Journal of
Theoretical Biology, 253, 824-837.
Chen, B.S., Chang, Y.T. and Wang, Y.C. (2008b) Robust H stabilization design in gene
networks under stochastic molecular noises: Fuzzy-interpolation approach, IEEE
Trans. on Systems, Man, and Cybernetics, Part B (Special Issue for Systems Biology), 38,
25-42.
Chen, B.S. and Chen, P.W. (2008) Robust engineered circuit design principles for stochastic
biochemical networks with parameter uncertainties and disturbances, IEEE Trans.
on Biomedical circuits and Systems, 2, 114-132.
Chen, B.S., Tseng, C.S. and Uang, H.J. (1999) Robustness design of nonlinear dynamic
systems via fuzzy linear control, IEEE Trans. Fuzzy Systems, 7, 571-585.
Chen, B.S., Tseng, C.S. and Uang, H.J. (2000) Mixed H2/H fuzzy output feedback control
design for nonlinear dynamic systems: an LMI approach, IEEE Trans. Fuzzy
Systems, 8, 249-265.
Chen, B.S., Tseng, C.S. and Uang, H.J. (2002) Fuzzy differential games for nonlinear
stochastic systems: Suboptimal approach, IEEE Trans. Fuzzy Systems, 10, 222-233.
Chen, B.S. and Wu, W.S. (2008) Robust filtering circuit design for stochastic gene networks
under intrinsic and extrinsic molecular noises, Mathematical Biosciences, 211, 342-
355.
Chen, B.S. and Wu, C.H. (2009) A systematic design method for robust synthetic biology to
satisfy design specifications, Bmc Syst Biol, 3:66, 1-15.
Chen, B.S., Chang, C.H. and Lee, H.C. (2009) Robust synthetic biology design: stochastic
game theory approach, Bioinformatics, 25, 1822-1830.
Church, G.M. (2005) From systems biology to synthetic biology. Molecular Systems Biology,
1:2005.0032.
de Jong, H. (2002) Modeling and simulation of genetic regulatory systems: A literature
review, Journal of Computational Biology, 9, 67-103.
Endy, D. (2005) Foundations for engineering biology, Nature, 438, 449-453.
Ferber, D. (2004) Synthetic biology: Microbes made to order, Science, 303, 158-161.
Forster, A.C. and Church, G.M. (2007) Synthetic biology projects in vitro, Genome Research,
17, 1-6.
Game Theory 176
Gardner, T.S., Cantor, C.R. and Collins, J.J. (2000) Construction of a genetic toggle switch in
Escherichia coli, Nature, 403, 339-342.
Goulian, M. (2004) Robust control in bacterial regulatory circuits, Current Opinion in
Microbiology, 7, 198-202.
Hasty, J., McMillen, D. and Collins, J.J. (2002) Engineered gene circuits, Nature, 420, 224-230.
Heinemann, M. and Panke, S. (2006) Synthetic biology - putting engineering into biology,
Bioinformatics, 22, 2790-2799.
Hooshangi, S., Thiberge, S. and Weiss, R. (2005) Ultrasensitivity and noise propagation in a
synthetic transcriptional cascade, Proceedings of the National Academy of Sciences of the
United States of America, 102, 3581-3586.
Hwang, C.L. (2004) A novel Takagi-Sugeno-based robust adaptive fuzzy sliding-mode
controller, IEEE Trans. Fuzzy Systems, 12, 676-687.
Isaacs, F.J., Dwyer, D.J. and Collins, J.J. (2006) RNA synthetic biology, Nature Biotechnology,
24, 545-554.
Kaznessis, Y.N. (2006) Multi-scale models for gene network engineering, Chemical
Engineering Science, 61, 940-953.
Kaznessis, Y.N. (2007) Models for synthetic biology, BMC Systems Biology, 1:47.
Kitano, H. (2002) Systems biology: A brief overview, Science, 295, 1662-1664.
Kitano, H. (2004) Biological robustness, Nature Reviews Genetics, 5, 826-837.
Kobayashi, H., Kaern, M., Araki, M., Chung, K., Gardner, T.S., Cantor, C.R. and Collins, J.J.
(2004) Programmable cells: Interfacing natural and engineered gene networks,
Proceedings of the National Academy of Sciences, 101, 8414-8419.
Kuepfer, L., Sauer, U. and Parrilo, P. (2007) Efficient classification of complete parameter
regions based on semidefinite programming, BMC Bioinformatics, 8, 12.
Li, T.H.S., Chang, S.J. and Tong, W. (2004) Fuzzy target tracking control of autonomous
mobile robots by using infrared sensors, IEEE Trans. Fuzzy Systems, 12, 491-501.
Lian, K.Y., Chiu, C.S., Chiang, T.S. and Liu, P. (2001) LMI-based fuzzy chaotic
synchronization and communications, IEEE Trans. Fuzzy Systems, 9, 539-553.
Maeda, Y.T. and Sano, M. (2006) Regulatory dynamics of synthetic gene networks with
positive feedback, Journal of Molecular Biology, 359, 1107-1124.
Pleiss, J. (2006) The promise of synthetic biology, Applied Microbiology and Biotechnology, 73,
735-739.
Salis, H. and Kaznessis, Y.N. (2006) Computer-aided design of modular protein devices:
Boolean AND gene activation, Physical Biology, 3, 295-310.
Szallasi, Z., Stelling, J. and Periwal, V. (2006) System Modeling in Cellular Biology: From
Concepts to Nuts and Bolts. The MIT Press, Cambridge, MA.
Takagi, T. and Sugeno, M. (1985) Fuzzy identification of systems and its applications to
modeling and control, IEEE Trans. on Systems, Man, and Cybernetics, 15, 116-132.
Tucker, J.B. and Zilinskas, R.A. (2006) The promise and perils of synthetic biology, New
Atlantis, 12, 25-45.
Tucker, M. and Parker, R. (2000) Mechanisms and control of mRNA decapping in
Saccharomyces cerevisiae, Annual Review of Biochemistry, 69, 571-595.
Zhang, W. and Chen, B.S. (2006) State feedback H control for a class of nonlinear stochastic
systems, SIAM journal on control and optimization, 44, 1973-1991.
Zhang, W., Chen, B.S. and Tseng, C.S. (2005) Robust H filtering for nonlinear stochastic
systems, IEEE Trans. on Signal Processing, 53, 589-598.

You might also like