Game Theory: Universit e Paris Dauphine-PSL
Game Theory: Universit e Paris Dauphine-PSL
Felipe Garrido-Lucero
This course will be developed in 12 lectures plus 12 sessions of practical work of 1h30 each.
Program
1. Introduction to Game Theory:
Mixed extension
Von Neumann minmax theorem
Nash equilibrium
Complexity aspects of Nash equilibria
4. Potential games
Best-reply dynamic
Congestion games
Price of anarchy and Price of stability in congestion games
5. Repeated games
References
Mathematical foundations of Game Theory - Laraki, Renault, and Sorin [Book]
[Nash equilibrium complexity] Nash and Correlated Equilibria - Gilboa and Zemel [Paper]
1
1 Introduction to game theory
Game theory models situations where several entities (players) make choices and where this
collections of individual behaviors induces an outcome that affects all of them.
Example: Waking up in the morning for using the shower.
Since agents may evaluate these outcomes differently, game theory is also the study of rational
behavior within this framework.
In this course, besides studying the mathematical foundations, we will work with the the first
three directions mentioned above.
Game theory has been successfully applied to numerous disciplines beyond mathematics: eco-
nomics, biology, computer sciences, operations research, political sciences, etc. In particular,
several game theorists have received the “Nobel prize in economics”. Regarding computing sci-
ences, the field of algorithmic game theory has been widely developed the last years, concerning
mainly the complexity of finding and/or implementing an equilibrium.
A game can be usually described by two ways: the extensive form and the normal form. The
first one involves explaining explicitly the “rules of the game” i.e. to say which player plays
and when, which information she receives during the play, what set of choices she has at her
disposal and what the consequences of a play are until a terminal outcome. The second one is
a more abstract way of defining a game as it consists in introducing the set of strategies of each
player and a mapping that associates to every profile of strategies the corresponding outcome,
the one determines a payoff for each player.
In any case, once defined a game, we will focus in two aspects:
1.2 Examples
We start checking some classical examples of game theory.
2
1.2.1 Stable matchings
Consider two sets of agents I and J (students/universities, doctors/hospitals), both of the same
size. In addition, each agent has a strict order of preferences over the agents on the other set.
The problem is to find a matching µ, an assignation (bijection) between the agents of both sets,
such that:
∄ (i, j) ∈ I × J, not matched between them such thatj > µi and i > µj
i j
Notice that this condition does not say that every agent is with his most preferred option, but
every time that an agent a prefers b over his partner in the matching, b does not prefer a as b
is with somebody that prefers more.
1. One of the players cuts the cake, and the second player chooses one of the pieces.
2. The first player offers a piece of cake to the second one. If the second player rejects the offer,
both players win 0 and the cake is lost.
3. A third person starts to move a knife from 0 to 1. The first player to say stop, gets the piece
from 0 until the point the knife arrived, and the second player obtains the rest of the cake.
1.2.3 Auctions
A house (or any indivisible object) is proposed for auction. n players want the object, each of
them with a different valuation vi ∈ R. If an agent gets the object at price p, his payoff is vi − p.
Again, we have many rules for defining the auction,
1. The price of the good starts to decrease in a screen. The first player who says “stop” gets
the good and pays the current price.
2. Agents starts to announce increasing prices. The one who proposes the highest price gets
the good and pays the announced price.
3. Players write in a paper the bid (the price) they offer by the good. The person who makes
the highest bid is the winner. Here, we can split in two subcases:
3.1. The winner pays his bid.
3.2. The winner pays the second highest bid.
3
In addition, each player is given a payoff function gi : S → R, the one depends on the strategies
of all players. We will use the notation gi (s) := gi (si , s−i ). With this, a game G will be denoted
G := (N, (Si )i∈N , (gi )i∈N ).
Intuitively, in a game all players choose an action simultaneously (this does not mean that
players choose at the same time, but they do not have know the other chosen actions when they
choose theirs) seeking to maximize their payoff. Once s ∈ S is materialized, each player i gets
gi (s).
We make the assumption that all players know the game (or its rules), meaning that players
play a game with complete information.
Example 1 (Rock, paper, scissors). There are two players with S1 = S2 = {R, P, S}. We
represent the game with a matrix.
R P S
R 0,0 -1,1 1,-1
P 1,-1 0,0 -1,1
S -1,1 1,-1 0,0
A matrix like above is called a payoff matrix. Player 1 is represented by the rows, while player
2 is represented by the columns. At every cell, the first number represents the payoff of player
1, g1 (s), while the second number corresponds to the payoff of player 2, g2 (s).
Therefore, if player 1 plays rock and player 2 plays paper, the payoff is respectively (−1, 1),
meaning that player 2 wins.
Example 2 (Prisoner’s dilemma). Two prisoners are interrogated separately about a robbery.
Each of them has the option of accusing his partner or remaining in silent. The following matrix
payoff shows the reduction of years of imprisonment that each prisoner may have.
A S
A 1,1 4,0
S 0,4 3,3
Each prisoner has an incentive to accuse his partner, as independently of what the other one
does, the prisoner decreases his sentence. As both think the same, they mutually accuse each
other, decreasing in only one year their sentence. Could they do better ? What prevent them
from obtaining a lower sentence ?
Definition 2. Let G = (N, (Si )i∈N , (gi )i∈N ) be a game. We say that a strategy s∗i ∈ Si is
a strictly dominant strategy for player i if for any si ̸= s∗i and any s−i , gi (s∗i , s−i ) > gi (si , s−i ).
In words, s∗i is a strictly dominant strategy if independently of what the other players play,
i always obtain a higher payoff than playing something else.
a weakly dominant strategy for player i if for any si ̸= s∗i and any s−i , gi (s∗i , s−i ) ≥ gi (si , s−i ).
equivalent to si if gi (s∗i , s−i ) = gi (si , s−i ) for any s−i .
Can we find strictly or weakly dominant strategies for a player in our two previous examples ?
4
Definition 3. A strategy profile s ∈ S is Pareto optimal if there is not s′ ∈ S such that
gi (s′ ) ≥ gi (s), ∀i ∈ N and ∃i ∈ N, gi (s′ ) > gi (s). Otherwise, we say that s′ Pareto dominates s.
The intuition of a Pareto optimal strategy is that we cannot strictly increase the payoff of an
player without decreasing the payoff of another player.
Can we find a Pareto optimal strategy for the prisoner’s dilemma ? Consider the graphical
representation of the payoff matrix.
g2
(0, 4)
(3, 3)
(1, 1)
(4, 0)
g1
We see that (A, A) is Pareto-dominated by (S, S), as both players increase strictly their payoff.
However, (A, S) and (S, A) are not Pareto-dominated as any deviation decreases the payoff of
one of the players.
Example 3 (Second price auction). Consider n players (bidders) and one good for sale. Each
bidder i has a personal value for the good vi ≥ 0. If i buys the good at price p, his payoff is
vi − p. The rules of the game are:
The third case corresponds to a tie, where the winner is chosen randomly between the highest
bidders. What should the players bid ?
Theorem 1 (Vickrey). The strategy b∗i = vi is a weakly dominant strategy for player i.
Proof. Let us prove the theorem. Let vi the valuation of player i for the good and consider
p := maxj̸=i bj . Let split the study in cases:
5
If bi ∈ (0, p), i wins 0,
If bi ∈ (vi , p), i wins 0,
If bi > p, i wins vi − p < 0
In any case, player i does not have incentives for bedding different from vi . We conclude that
bi = vi is a weakly dominant strategy.
Example P 4. Consider a game with n players. Each of them can choose any number si ∈ [0, 100].
Let s̄ := n1 ni=1 si . The winner of the game is the closest player to 23 s̄. What should we choose?
The iterated deletion of strictly dominated strategies method, as its name says, consists in
eliminating one by one all the strategies that are strictly dominated, until arriving to a game
without strictly dominated strategies. The previous example can be solved by the IDSDS
method.
Definition 4. A game G is dominance-solvable if G(∞) , the last game obtained by the iterated
(∞)
deletion of strictly dominated strategies is trivial i.e. Si = {s∗i } for any player i or g is
constant in S (∞) .
Remark 1. The IDSDS is independent of the order of deletion, unlike the ID-weakly-DS.
6
2 Non-cooperative games: Zero-sum games
As a first example of non-cooperative games, we will study zero-sum games. In this setting, the
players have opposite evaluations of outcomes. Formally, we work with only two players (n = 2),
each of them endowed with a set of actions S1 , S2 and a payoff function g1 , g2 : S1 × S2 → R,
respectively. In addition, the payoff functions satisfy,
In words, the payoff of a player always corresponds to minus the payoff of the other player. For
simplicity, we make a change of notation. We pass to write I = S1 , J = S2 and we use only one
payoff function g = g1 = −g2 : I × J → R. We say that player 1 wants to maximize the payoff
of the game, while player 2 wants to minimize it.
Example 5. Rock, paper, scissor is a zero-sum game and its payoff matrix can be rewritten as,
R P S R P S
R 0,0 -1,1 1,-1 R 0 -1 1
P 1,-1 0,0 -1,1 P 1 0 -1
S -1,1 1,-1 0,0 S -1 1 0
Table 3: Rock, paper, scissor game Table 4: Same game, reduced matrix
As players in a zero-sum game are antagonists, we study the minimum payoffs that each of
them can guarantee when playing i.e. payoffs that each player can obtain independently of the
strategy of the opponent.
An important case holds when both players can guarantee the same real value w.
Definition 6. Given a zero-sum game G = (I, J, g), we consider two important coefficients,
Remark that it always holds v(g) ≤ v(g). The maxmin is the highest value that player 1 can
guarantee, while the minmax is the lowest value that player 2 can guarantee.
Example 6 (Matching Pennies). Consider the following game: two friends, each of them with
a penny, show one of the faces. If the faces match, the first friend has to pay 1 euro to the
second friend, otherwise the second friend pays 1 euro to the first friend. The matrix payoff of
this game can be written as,
1 -1
-1 1
7
In the Matching Pennies example, as v = −1 < 1 = v, this game does not have a value (however,
will see later that the value exists when players play mixed strategies).
Remark 2. If both players can guarantee the same value w then, w is the value of the game.
In addition, the value of a game is unique, so if both players can guarantee two values w1 , w2 ,
there must hold that w1 = w2 .
Definition 8. Let G = (I, J, g) be a zero-sum game with a value v(g). We say that,
Player 1 has an optimal strategy i∗ ∈ I if i∗ guarantees the value v(g),
Player 2 has an optimal strategy j ∗ ∈ J if j ∗ guarantees the value v(g),
Example 7. Consider the zero-sum game G = (N, N, g), where g(i, j) = 1/(1 + i + j). The
value of the game is v = 0, any strategy of player 1 is an optimal strategy while player 2 does
not have any optimal strategy.
Remark 3. The existence of a value without optimal strategies is due to the presence of supre-
mum and infimum in the definitions of v and v.
The sets ∆(I), ∆(J) are known as the simplex of I and J, respectively. The mixed extension
of a finite game G = (I, J, g) is thus the game Γ = (∆(I), ∆(J), g) where the payoff function
corresponds to the expected payoff of the players,
X
g(x, y) := xi Ai,j yj , ∀x, y ∈ ∆(I) × ∆(J)
i,j∈I×J
For simplicity, we will write the mixed extensions of the payoff function g simply as g.
8
Example 8. Let us come back to the Matching Pennies game. Assuming that player 1 picks
heads with probability x1 and tails with probability x2 , such that x1 +x2 = 1, and similarly player
2 picks heads and tails with probabilities y1 and y2 respectively, the payoff function becomes,
g(x, y) = 1 · x1 y1 + (−1) · x1 y2 + (−1) · x2 y1 + 1 · x2 y2
There are two important remarks to do about the mixed extension of a finite game.
1. The expected payoff can be written in a matrix way: Let (x, y) ∈ ∆(I) × ∆(J), then,
y1
xi Ai,j yj = [x1 , ..., xI ] · A · ... = xT Ay
X
g(x, y) =
i∈I,j∈J yJ
P
2. Given i ∈ PI a pure strategy, we can identify i ≈ δi ≈ ⃗ei . Thus, if x ∈ ∆(I), x = i∈I xi⃗ei
such that i∈I xi = 1. This point tells us that the simplex ∆(I) is generated as the convex
hull of the canonical basis in RI .
(0, 0, 1)
∆(I)
(1, 0, 0)
(0, 1, 0)
9
2.2.1 Linear programming
A linear program is,
P
min ci · xi
i∈I
P
(P ) s.t. Ai,j xi ≥ bj , ∀j ∈ J
i∈I
xi ≥ 0, ∀i ∈ I
where A ∈ R|I|·|J| is a matrix, and b ∈ R|J| , c ∈ R|I| are vectors, the three of them known and
fixed. The dual problem of (P ) corresponds to another linear program (D) given by,
P
max bj · yj
P j∈J
(D) s.t. Ai,j yj ≤ ci , ∀i ∈ I
j∈J
yj ≥ 0, ∀j ∈ J
There are two important results to keep in mind about linear programming.
Proposition 2. Given a linear program (P ), finding a solution or declaring that the problem
is unfeasible is polynomial
Theorem 3 (Strong Duality). If (P ) and (D) are both feasible, then both have optimal solutions
and they share the same optimal value. In addition, if (x∗ , y ∗ ) are a pair of solutions of the
primal-dual problems, it holds,
X
∀i ∈ I : xi > 0 ⇐⇒ Ai,j yj = ci ,
j∈J
X
∀j ∈ J : yj > 0 ⇐⇒ Ai,j xi = bj
i∈I
Indeed, as the function f (x) := xT Ay is linear and continuous in x, and the set ∆(I) is compact,
there always exists an extreme point of the feasible region that is maximum of f (·). In addition,
as the extreme points of a simplex are the pure strategies, represented by the canonical vectors
⃗ei , the result holds. Analogously we obtain,
Therefore, we focus in proving (3). First of all we assume, without loss of generality, that A has
only positive entries. Indeed, notice we can add λ > 0 to all the coordinates of A so they become
10
positive, without affecting the equality (3) as we obtain a factor λ at both sides. Consider the
following linear program,
P
min 1 · xi
i∈I
P
(P ) s.t. Ai,j xi ≥ 1, ∀j ∈ J
i∈I
xi ≥ 0, ∀i ∈ I
We notice that we have used b = (1)j∈J , c = (1)i∈I , that is, both are vectors with only ones in
every coordinate. The dual of (P ) is given by,
P
min 1 · yj
P j∈J
(D) s.t. Ai,j yj ≤ 1, ∀i ∈ I
j∈J
yj ≥ 0, ∀j ∈ J
Notice that (D) is feasible as y ≡ 0 is a feasible solution. Similarly, as A has only positive
entries, (P ) is feasible as taking any x ≫ 0 (a vector with every coordinate large enough)
is a feasible solution. Therefore, by the strong duality theorem, there exists (x∗ , y ∗ ) optimal
solutions of (P ) and (D) respectively, and w ∈ R such that,
x∗ ≥ 0, y ∗ ≥ 0,
X
Ai,j x∗i ≥ 1, ∀j ∈ J
i∈I
X
Ai,j yj∗ ≤ 1, ∀i ∈ I
j∈J
X X
x∗i = yj∗ = w
i∈I j∈J
x̄ ≥ 0, ȳ ≥ 0,
X X
x̄i = ȳj = 1
i∈I j∈J
In other words, x̄ and ȳ are probability distributions. We claim finally that these are the optimal
strategies of the players and thus, they achieve the value of of the game. Indeed, notice that,
X 1
x̄T A⃗ej = x̄i Ai,j ≥ , ∀j ∈ J
w
i∈I
therefore,
max min xT A⃗ej ≥ min x̄T A⃗ej ≥ 1/w
x∈∆(X) j∈J j∈J
Analogously, miny∈∆(Y ) maxi∈I ⃗ej Ay ≤ 1/w. Thus, recalling (1) and (2),
As the maxmin of G is always lower or equal than the minmax of G, we conclude the equality
and the proof of the theorem.
11
Remark 4. Recalling that solving a linear programming problem is Polynomial, we conclude
that solving a finite zero-sum game, that is, to compute its value and its optimal strategies, can
be done in polynomial time.
Notice that for proving Von Neumann’s theorem we have only used the first part of the strong
duality theorem. The indifference principle’s proof comes from the second part.
Proof. Indifference principle. The proof is a direct application of the second part of the strong
duality theorem. Let (x̄, ȳ) be optimal strategies so they achieve the value v of the game. As we
did for the minimax theorem, we can assume without loss of generality v > 0 by taking A > 0.
We known that v1 (x̄, ȳ) is a solution of the pair of problems (P ) and (D). Using the second part
of the strong duality theorem it holds,
1 X 1
∀i ∈ I, x̄i > 0 =⇒ Ai,j ȳj = 1,
v v
j∈J
1 X 1
∀j ∈ J, ȳj > 0 =⇒ Ai,j x̄i = 1
v v
i∈I
As v > 0, we obtain,
X
∀i ∈ I, x̄i > 0 =⇒ Ai,j ȳj = v,
j∈J
X
∀j ∈ J, ȳj > 0 =⇒ Ai,j x̄i = v
i∈I
R P S
R 0 -1 1
P 1 0 -1
S -1 1 0
Solving a zero-sum game with our current tools can be done in two ways: either we guess
the value of the game and then we use Proposition 1 for computing the optimal strategies, or
we guess the optimal strategies, compute the value of the game and prove that our guess was
correct. For a game as rock, paper and scissor in which everything is symmetric, a good guess
is to propose the average of all payoffs as a value. In this case, we propose v = 0 as the value
of the game. Let compute the optimal strategies. Let x = (x1 , x2 , x3 ) be an optimal strategy
for player 1 and y = (y1 , y2 , y3 ) be an optimal strategy for player 2. In particular, x and y are
probability distributions so each coordinate is non-negative and the sum of all of them has to be
equal to one. Let suppose x1 , x2 , x3 > 0, so player 1 plays the three strategies with some positive
probability. From Proposition 1 it follows,
x1 > 0 =⇒ 0 · y1 + (−1) · y2 + 1 · y3 = 0
x2 > 0 =⇒ 1 · y1 + 0 · y2 + (−1) · y3 = 0 y1 = y2 = y3
x3 > 0 =⇒ (−1) · y1 + 1 · y2 + 0 · y3 = 0
Since y1 + y2 + y3 = 1, we conclude that y = (1/3, 1/3, 1/3). In particular, we obtain that player
2 plays with positive probability all her strategies, so we can repeat the computations for player
12
1 now. Analogously, we obtain that x = (1/3, 1/3, 1/3). In words, both players play the three
strategies with the same probability. It makes sense to get a symmetric equilibrium in which both
agents play the same mixed strategy as both of them have the same symmetric matrix payoff.
The pure strategies that are played with positive probability have a name.
Definition 10. Let x ∈ Rn+ be a vector. We define the support of x as the set,
That is, the set of all the strictly positive coordinates of the vector x.
As a final remark of zero-sum games, let us come back to the equilibrium of the rock, paper
and scissor game. Notice that, if (x̄, ȳ) is the equilibrium we computed,
That is, if one of the players plays the equilibrium, the opponent is indifferent to play any
option in his support as he always gets the same payoff. This is not a coincidence as we will
see later in the course. However, if my opponent plays her optimal strategy, although I am
indifferent, the best to do is to also play the optimal strategy. Otherwise, the opponent can
change of strategy and decreases my payoff. For example, if my opponent plays rock, paper and
scissor with probability 1/3 each, although my payoff is the same when playing any of the three
pure strategies, if I decide to play always rock, my opponent could notice it and change to play
always paper, decreasing my payoff. Therefore, the best response to an optimal strategy is to
play an optimal strategy.
F T
F 2,1 0,0
T 0,0 1,2
Two friends try to decide between going to a football match or to the theater. The first friend
prefers the football to the theater, while the second one prefers the theater to the football match.
However, independently of the chosen event, both prefer to go together to the same place rather
than going to different ones.
To study this kind of games, we draw arrows indicating the improvement of the players’ payoffs.
In this case we obtain,
13
F T
F 2, 1 0, 0
T 0, 0 1, 2
An equilibrium will be any strategy profile in which no agent can improve his payoff by unilat-
erally deviating. Seen at the matrix, an equilibrium will be any point that only has incoming
arrows. For our example, we find two equilibria: (F, F ) and (T, T ).
1. In a two-player zero-sum game, a strategy profile (s∗ , t∗ ) is a Nash equilibrium if and only if
it achieves the value of the game and (s∗ , t∗ ) are optimal strategies for the players.
2. Eliminating strictly dominated strategies does not change the equilibrium of the game.
3. Therefore, IDSDS does not change the equilibria of the game.
When players in a game play a Nash equilibrium, each of them is maximizing his payoff with
respect to the strategies played by the other players. We say that players are best replying to
the strategies of the other players.
Definition 12. Let i ∈ N be a player. We define his best reply function as,
Y
BRi : S−i = Si′ → Si
i′ ̸=i
That is, given a strategy profile for all the players but i, i’s best reply function outputs literally
the best reply that i can make for maximizing his payoff.
14
Finding a Nash equilibrium in a payoff matrix is equivalent to check the best reply of each agent
fixing the strategy played by the others players. As we did for the coordination game, we can
draw lines expressing the best reply of each player. Let us see another example.
L M R
T 1,8 4,-1 -1,2
M 2,7 6,0 2,1
B 3,3 6,2 0,1
Suppose that player 2 plays L. The best reply of player 1 is to play B. Suppose that player 1
plays M . Then, the best reply of player 2 is to play L. Doing the same for all the possible cases,
we can highlight each of the best reply of both players. If we find a box of the matrix totally
highlighted, the corresponding strategy profile is a Nash equilibrium. In the example, we find
that (B, L) is an equilibrium.
Example 12 (Cournot’s game). In the 19th century Cournot defined his duopoly competition
model. Consider two firms i ∈ {1, 2}, the one choose a quantity qi to produce from a certain
good (same good). We assume that firms can produce up to a certain level a > 0 of the good,
so firms have the same strategy set S1 = S2 = [0, a]. The cost of production is linear in the
produced quantity so
Ci (qi ) = cqi ,
where c > 0 (same production cost for both firms). The market price p is also assumed to be
linear on the total production level so,
p = max{a − (q1 + q2 ), 0}
Finally, once a firm i chooses its strategy (its level of production), its payoff is,
Finally, we assume a > c (why?). Let us compute the Nash equilibrium of this game. We use
the best reply functions. Suppose firm 2 produces s2 . Firm 1’s best reply function corresponds
to (let forget the maximum in the market price for now),
∂g1
= 0 ⇐⇒ a − 2s1 − s2 − c = 0
∂s1
a − c − s2
⇐⇒ s∗1 := −→ Best reply function firm 1
2
Since firms have symmetric payoff functions, given s1 the best reply of firm 2 has to be,
a − c − s1
s∗2 = −→ Best reply function firm 2
2
We obtain therefore, a system with two equations and two variables. Solving it we find that the
Nash equilibrium of the game is,
a−c
s∗1 = s∗2 =
3
How can we see if this game has another Nash equilibrium? Let plot the best reply functions.
15
s2
a−c
BR1
a−c
2
BR2
a−c
s1
2 a−c
Since the best reply functions have only one intersection point, we can conclude there is only
one Nash equilibrium in the Cournot competition.
The game G is called a finite game in mixed strategies. For simplicity, we keep denoting gi to
the mixed version of the payoff function of player i and not ḡi .
L M R
T 1,4 0,4 2,6
B 3,2 6,1 5,2
Given a mixed strategy profile (σ 1 , σ 2 ) ∈ Σ1 × Σ2 , the expected payoffs are given by,
Definition 14. A strategy profile σ ∗ ∈ i∈N Σi is a Nash equilibrium in mixed strategies of the
Q
game G if it is a Nash equilibrium of the extended game G. That is, if it satisfies,
∀i ∈ N, gi (σi∗ , σ−i
∗ ∗
) ≥ gi (τi , σ−i ), ∀τi ∈ Σi
16
Theorem 4 (Nash). A finite game always has a Nash equilibrium in mixed strategies.
The proof of Theorem 4 uses the following result, known as the Brower’s fixed point theorem.
We will not prove Brower’s theorem but only use it for proving Theorem 4.
Proof of Theorem 4. Let G = (N, ∆(Si ), gi ) be a finite game in mixed strategies. Consider ∆ =
Q
i∈N ∆(Si ) and define the gain function of player i ∈ N by Gaini (σ, a) = max{0, gi (a, σ−i ) −
g(σ)}, that is, the difference of payoff that player i gets when she deviates from σi to a ∈ Si .
Then, consider the following function f : ∆ → ∆,
Since ∆ is convex and compact and f is a continuous function, by Brower’s theorem there exist
σ ∗ ∈ ∆ such that f (σ ∗ ) = σ ∗ . We claim that σ ∗ is a Nash equilibrium. Let i ∈ N be a fixed
player. We want to prove that for any si ∈ Si , Gaini (si , σ ∗ ) = 0, meaning that i is best replying
∗ . Let assume that the gains are not all zero. Therefore, there exist i ∈ N and s ∈ S
to σ−i i i
such that Gaini (si , σ ∗ ) > 0. Notice that,
Gaini (·, σ ∗ )
⇒ σi∗ = P ∗
(5)
ti ∈Si Gaini (ti , σ )
Finally, we prove the following equality as it will be useful for the end:
When Gaini (σ ∗ , si ) > 0, the previous equality holds by the definition of the function Gain.
Similarly, when Gaini (si , σ ∗ ) = 0, from Equation (5), we obtain that σi∗ (si ) = 0 and then, both
17
sides of Equation (6) are equal to 0. Finally, we observe the following,
0 = gi (σ ∗ ) − gi (σ ∗ )
X
= σ ∗ (si )gi (si , σ−i
∗
) − gi (σ ∗ )
si ∈Si
X X
= σ ∗ (si )gi (si , σ−i
∗
) − gi (σ ∗ ) σ ∗ (si )
si ∈Si si ∈Si
X
∗ ∗ ∗
= σ (si )[gi (si , σ−i ) − gi (σ )]
si ∈Si
(6) X
= σ ∗ (si )Gaini (σ ∗ , si )
si ∈Si
(5) X X
= (σ ∗ (si ))2 Gaini (ti , σ ∗ ) > 0
si ∈Si ti ∈Si
where the last inequality holds as we know that both sums are strictly positive. We obtain
a contradiction with the fact that the gains are not all zero. We conclude that σ ∗ is a Nash
equilibrium.
L R
T 2,1 0,0
B 0,0 1,3
We saw last time that (L, T ) and (B, R) where both of them Nash equilibria. However, can we
find another one when considering the mixed extension of this game? Consider that player 1
plays Top with probability x and Bottom with probability 1 − x. Analogously, player 2 plays Left
with probability y and Right with probability 1 − y. Let see the best response functions of the
players,
The solution to the previous optimization problems depends on the strategy of the another player.
For BR1 for example, notice that,
and therefore, the optimal x∗ depends on the sign of [3y − 1]. We obtain the following solution,
∗
x =1 if 3y − 1 > 0 ⇔ y > 1/3
BR1 (y) ∗
x =0 if y < 1/3
∗
x ∈ [0, 1] if y = 1/3
We can plot the Best reply functions to find graphically the Nash equilibria.
18
y
BR2
1
BR1
1
3
3
x
4 1
The two best reply functions have three intersection points, each of them being a Nash equilib-
rium. The points (x, y) = (0, 0) and (x, y) = (1, 1) are the two already known Nash equilibria in
pure strategies. However, we obtain a third one (x, y) = (3/4, 1/3), being a Nash equilibrium in
mixed strategies.
Finally, notice the following,
g2 (x∗ , L) = 1 · x∗ + 0 · (1 − x∗ ) = x∗
g2 (x∗ , R) = 0 · x∗ + 3 · (1 − x∗ ) = 3 · (1 − x∗ )
and g2 (x∗ , L) = g2 (x∗ , R) if and only if x∗ = 3/4. Therefore, if player 1 is playing his mixed
optimal strategy x∗ , player 2 is indifferent about what to play, as any strategy (pure or mixed)
gives her the same utility. This remark always holds. Even more, we can use it for computing
Nash equilibria as it imposes constraints over the strategies (indifference principle!).
Let us apply the previous remark to the first coordination game with saw,
F T
F 2,1 0,0
T 0,0 1,2
Let x and y be the probability that player 1 and player 2 choose to go to the football, respectively.
We impose indifference on the payoffs,
g1 (F, y) = g1 (T, y) ⇔ 2y + 0 · (1 − y) = 0 · y + 1 · (1 − y) ⇔ y = 1/3
Analogously for player 2,
g2 (x, F ) = g2 (x, T ) ⇔ 1x + 0 · (1 − x) = 0 · x + 2 · (1 − x) ⇔ x = 2/3
We obtain that (x∗ , y ∗ ) = (2/3, 1/3) is also a Nash equilibrium, in this case in mixed strategies.
Intuitively, each player chooses with a higher probability his more preferred option, although
with some positive probability he also chooses the second option as both players are better off
when choosing the same activity.
Definition 15. Let x ∈ Rn+ be a vector. We define the support of x as the set,
supp(x) := {i ∈ {1, ..., n} : xi > 0}
That is, the set of all the strictly positive coordinates of the vector x.
Let us summarize in the following scheme how to compute the Nash equilibria of a finite game
with two players.
Nash equilibria computation
19
1. Find the pure Nash equilibria of the game by underlining in each column and row the best
reply of each player and look for strategy profiles totally underlined.
2. Assume that agents play mixed strategies x, y ∈ ∆(S1 ) × ∆(S2 ) and impose the indifference
principle (be careful with the pure strategies that may not belong to the support):
g1 (s, y) = g1 (t, y), ∀s, t ∈ S1 : xs , xt > 0
g2 (x, s) = g2 (x, t), ∀s, t ∈ S2 : xs , xt > 0
Solve the system of equations taking care of computing coherent values for x and y. The
solutions correspond to the mixed Nash equilibria of the game.
3. An alternative method to steps 1 and 2 is to compute the best reply functions
BR1 (y) = max g1 (x, y), BR2 (x) = max g2 (x, y)
x∈∆(S1 ) y∈∆(S2 )
and plotting them for obtaining all their intersection points. This also works for counting
the number of Nash equilibria.
For ending the study of finite games, we state a last theorem, the one will not be proved.
Theorem 6. Let G be a finite game and G be its mixed extension. The number of Nash
equilibria of G is “always” finite and odd.
20
There is often only a small difference between a problem in P and a N P -complete problem:
Determining whether a graph can be colored with 2 colors is in P , but with 3 colors is in
N P -complete.
Concerning the computation of Nash equilibria, we obtain that this problem is by nature, com-
binatorial. Indeed, the indifference principle tells us that a mixed strategy if a Nash equilibrium
if and only if all pure strategies in its support are best responses. This result reveals the subtle
nature of a mixed Nash equilibrium: Players combine pure best responses strategies when play-
ing in equilibrium. Finding a Nash equilibrium means, therefore, finding the right supports:
Once done it, the precises mixed strategies can be computed by solving a system of algebraic
equations. However, given a set of pure strategies for each player, the number of possible
supports of a mixed strategy profile is exponential.
Most of algorithms proposed over the past half century for finding Nash equilibria are combi-
natorial in nature, and work by seeking supports. Unfortunately, non of them are known to be
efficient - to always succeed after only a polynomial number of steps.
It is Nash equilibrium problem NP-complete? The difficult of finding a Nash equilibrium gives
the intuition that this problem may be N P -complete. However, this is not the appropriated
class for it, as the existence of a solution is always guaranteed thanks to Nash’s theorem. Indeed,
in 1994, Christos Papadimitriou defined a new class of computation problems: The PPAD Class.
We can easily obtain N P -complete problems by slightly changing the problem: Given a two-
player game in strategic game form, does it have:
Let us prove (3) by reducing the clique problem: Given an undirected graph Gr = (V, E) and
an integer k, does there exist a clique of size k in Gr? That is, does there exist V ′ ⊆ V, |V ′ | = k
such that (i, j) ∈ E for all i, j ∈ V ′ .
Let Gr = (V, E) be a graph with V = {1, ..., n} and let k be a natural number. Let ε = 1/nk,
M = nk 2 , r = 1 + ε/k, and consider the following two-person game G,
1 + ε if i = j
g1 ((1, i), (1, j)) = g2 ((1, i), (1, j)) = 1 if i ̸= j, {i, j} ∈ E
0 otherwise
k if i = j
g1 ((2, i), (1, j)) = g2 ((1, i), (2, j)) =
0 if i ̸= j
−M if i = j
g1 ((1, i), (2, j)) = g2 ((2, i), (1, j)) =
0 if i ̸= j
g1 ((2, i), (2, j)) = g2 ((2, i), (2, j)) = 0
Considering ei,j = 1{(i,j)∈E} , the following matrix expresses the payoffs of this game.
21
P1 \ P2 (1, 1) ... (1, n) (2, 1) ... (2, n)
(1, 1) (1 + ε, 1 + ε) (−M, k)
(1 + ε, 1 + ε)
..
. . (eij , eij ) . (0, 0)
. .
(eij , eij ) . (0, 0) .
(1, n) (1 + ε, 1 + ε) (−M, k)
(2, 1) (k, −M )
(k, −M )
..
. . (0, 0) (0, 0)
.
(0, 0) .
(2, n) (k, −M )
Theorem 7. G has a Nash equilibrium with expected payoff of at least r for both players if and
only if Gr has a clique of size k.
22
4 Potential games
We have seen that any finite game has a Nash equilibrium in mixed strategies. It is still open
the question of which games have always a Nash equilibrium in pure strategies. We will study
the case of Potential games.
Potential games is a special case of N -players games in which players’ unilateral deviations can
be centralized by one function.
Definition 17. A game G = (N, (Si )i∈N , (gi )i∈N ) is a potential game if there exists a function
ϕ : S → R such that,
The function ϕ is called a potential function. In the particular case that Si = R and gi is
a differentiable function over si , for every player, we can express the property of a potential
function by,
∂gi ∂ϕ
= , ∀i ∈ N
∂si ∂si
Example 15. Consider the following Cournot game,
n
X X
gi (s1 , ..., sn ) = A − sj si − Csi = Asi − s2i − si sj − Csi
j=1 j̸=i
where si is the production level of the firm i and A, C ∈ R, with A > C. Take the function
n
X n
X n
X X Xn
2
ϕ(s1 , ..., sn ) = A sj − sj − si · sj − C
sj
j=1 j=1 i=1 j̸=i j=1
However, this is a contradiction with the fact that s∗ ∈ argmaxs∈S ϕ(s), as (ti , s∗−i ) ∈ S and
gets a higher value for ϕ. Therefore, players cannot have profitable deviations and we conclude
that s∗ is a Nash equilibrium.
We conclude that potential games are one of those games in which we always have the existence
of a Nash equilibrium in pure strategies (assuming that ϕ has a maximum).
23
4.1 Best-reply dynamic
The existence of Nash equilibria is not the only nice result obtained from having a potential
function. Indeed, it also gives a nice way of computing a Nash equilibrium. For being more pre-
cise, the presence of a potential function guarantees the convergences of the best-reply dynamic
to a Nash equilibrium.
Algorithm 1: Best-reply dynamic
1 Input: G = (N, (Si )i∈N , (gi )i∈N ) a N -player game, s ∈ S a strategy profile
2 repeat
3 for i ∈ N do
4 Fixing s−i compute ti ∈ argmaxsi ∈Si gi (si , s−i ) and replace si ← ti
5 until Convergence;
Intuitively, the best-reply dynamic consists in replacing the strategy of each player by his best-
reply to the strategies played by the other players. Normally, this dynamic is not guaranteed
to converge since agents may cycle endlessly between their possible strategy profiles. However,
this is not the case for potential games as the value of the potential function increases with each
iteration. Let us prove this result.
Theorem 9. Let G = (N, (SiQ )i∈N , (gi )i∈N ) be a potential game with potential function ϕ.
Suppose that ϕ is bounded in i∈N Si . Then, the best-reply dynamic converges to a Nash
equilibrium after finitely many iterations.
Proof. Let i ∈ N be a player and fix s−i . Let ti ∈ argmaxsi ∈Si gi (si , s−i ) be his best-reply. If
ti is different from si , the current strategy played by i, then gi (ti , s−i ) > gi (s), implying that
ϕ(ti , s−i ) > ϕ(s), by the definition of the function ϕ. Therefore, at each iteration that a player
changes of strategy profile when computing his best-reply, he increases the value of the potential
function. Since ϕ is bounded in S, after finitely many iterations the dynamic must stop. Let s∗
be the output of the dynamic. Since every player is best-replying to the opponents’ strategies,
s∗ is a Nash equilibrium of G.
O
D
For each arc e ∈ E we have a cost function ce : R+ → R increasing, i.e. the higher number of
people, higher the cost of using the edge e. We can define the following N -player game:
N = {1, 2, ..., n} is the set of players,
For each i ∈ N , Si = {Paths from O to D} = {OD-Routes},
24
A strategy profile is s = {R1 , R2 , ..., Rn } containing the routes chosen by each player. s can
be seen as the flow of people on the network,
Given s a strategy profile, we can compute the number of players that choose the route R as
Finally, the payoff function of player i is given by gi (s) = −ci (s). In words, players seek to
minimize their travel costs.
Example 16. Pigou Consider the following routing game with n players,
cT (x) = 1
O D
x
CB (x) = n
At each arc the cost depends on the number x of players that choose it. Going by the top arc
has a fixed cost of 1 unit, while the cost of going by the bottom arc is equal to the proportion of
agents that choose it over the total number of agents n. We can find two equilibria:
1. n players choose the bottom path, paying a cost of 1 unit. Indeed, nobody has an incentive
to change of path.
2. One player goes by the top path, while the n − 1 others go by the bottom path. The players
on the bottom do not have an incentive to change as they pass from paying (n − 1)/n to pay
1. Similarly, the player on the top path cannot change to the bottom and decrease his cost,
as the cost of the bottom path becomes 1 when he changes.
Let
P focus on the total cost for a moment, that is, the sum of the costs of all players C(s) =
∗ ∗ ∗
i∈N ci (s). The first equilibrium s has a total cost of C(s ) = n. The second equilibrium t
(n−1)2
n−1
has a total cost of C(t∗ ) = 1 + i=1 n−1 . In particular we find that C(t∗ ) < C(s∗ ).
P
n = 1+ n
Can we find another distribution of the players with an even lower total cost ? Let k be the
number of players that take the bottom path and n − k the players that take the top path. The
total cost of this strategy s(k) is given by,
" 2 #
k k k
C(s(k)) = (n − k) · 1 + k · = n 1 − +
n n n
Minimizing C(s(k)) over k/n we find that its minimum is achieved at k/n = 1/2, that is, half
of the players taking each path. Under this strategy profile the total cost obtained is equal to
C(s(n/2)) = 3n/4. Remark that both equilibria achieve strictly higher total costs.
25
The previous example recall us the discussions about the prisoner’s dilemma. Should we seek
the selfish optimum or what is the best for the entire society?
Example 17. The following routing game is called the Braess’s Paradox, described by the
following graph,
x
1 n
O 0 D
x
n 1
The equilibrium of this game is to take the zig-zag path, the one produces a total cost of 2n, as
each player has a cost of 2. Imagine next that we remove the arc with zero cost in the middle.
The equilibrium changes to the situation in which half of the players take the top path and the
other half the bottom path. Consequently, each player obtains a cost of 3/2 and the total cost
becomes 3n/2. Contrary to the intuition, taking out the free arc decreases the total cost when
players move under equilibrium.
We have said that congestion games are potential games. Let prove it formally.
Theorem 10. Let G = (N, V, E, (ce )e∈E ) be a congestion game. Then, G is a potential game.
Proof. It is enough with finding a potential function for this game. Let ϕ : RE → R given by,
fe
XX
ϕ(f ) = ce (k)
e∈E k=1
where f ∈ RE is the flow in each arc e ∈ E given by fe (s) = R:e∈R fR (s) and fR (s) = |{i ∈
P
N : si = R}|, and s is the strategy profile played by the N players. Fix the chosen routes
of all players and suppose that i ∈ N changes from R to R′ . Let compute ci (f ′ ) − ci (f ) and
ϕ(f ′ ) − ϕ(f ), where f is the flow before the change and f ′ the flow after the change.
X X X
ci (f ′ ) − ci (f ) = 0+ ce (fe + 1) − ce (fe )
e∈R∩R′ e∈R′ \R e∈R\R′
Corollary 1. Let G = (N, V, E, (ce )e∈E ) be a congestion game. Then, G has always a Nash
equilibrium in pure strategies.
Proof. Since G is a potential game, any maximum of ϕ over the set of strategy profiles is a Nash
equilibrium. Since the set of feasible flows (strategy profiles) is finite (we have finitely many
options of splitting the players in the graph), ϕ always has a maximum over S. We conclude
that G always has a Nash equilibrium in pure strategies.
26
4.3 Price of anarchy and price of stability in congestion games
Let continue with the study of the total cost of a given flow in a routing game. Let G =
(N, V, E, (ce )e∈E ) be a congestion game and let s ∈ S be a strategy profile, the one induces a
flow f . Previously we defined the total cost of f as,
X
C(f ) := ci (f )
i∈N
cT (x) = 1
O D
x
CB (x) = n
The maximum total cost under equilibrium is achieved when all players take the bottom path
with a total cost of n, while the minimum ever total cost is 3n/4, achieved when half of the
players take each path. Therefore, the PoA of this game is equal to 4/3.
Theorem 11 (Roughgarden). For any network (single O-D pair), if costs are affine i.e. ∀e ∈
E, ce (x) = ae x + be with ae > 0, then 1 ≤ PoA ≤ 4/3.
Imagine the situation in which we can influence the players a bit and maybe help them to
converge to a good Nash equilibrium, so even if they keep playing in equilibrium, they do not
reach necessarily the POA. The ratio between the best Nash equilibrium and the social optimum
is know as the Price of stability (PoS). Let C− (EQ) := min{C(f ∗ ) : f ∗ is an equilibrium flow}.
Then, the PoS of a game is defined as,
C− (EQ)
PoS(G) :=
C(OP T )
Notice that P oA ≥ P oS ≥ 1 always.
27
5 Repeated games
Repeated games represent dynamic interactions in discrete time. These games are played in
stages in which players simultaneously choose an action in their own action set. The selected
actions determine the players’ payoffs at that stage. Then, the players’ payoffs of the repeated
game are obtained as a combination of the players’ stage payoffs.
Repetition opens the door to new phenomena as players may play dominated strategies (not an
equilibrium) in the stage games in order of obtaining a higher payoff in the repeated game. In
one shot games, that is, games played only once, agents have incentives to play Nash equilibria
or they may end up obtaining low payoffs, e.g. in a prisoner’s dilemma with one of the players
deciding to cooperate while his partner takes the rational decision of confessing. The new facet
obtained from the repetition of the game may induce cooperation between agents at each stage
as players can punish their opponents if these lasts deviate from the cooperation path. In
the prisoner’s dilemma example, the prisoner who cooperated and got a low payoff due to the
partner’s confession, knows he cannot trust again and start to confess at every posterior stage.
The possibility of following cooperation paths increases the set of equilibrium payoffs, result
known from long time but still without a clear name of the person who proved.
L R
T 1,0 0,0
B 0,0 0,1
G has two Nash equilibrium payoffs (1, 0) and (0, 1). Notice that any sequence of strategy profiles
in GT in which at each stage the players play a Nash equilibrium of G, is a Nash equilibrium
of GT . In the 2-stage game for example, playing (T, L) the first stage ant (B, R) the second
stage, is a Nash equilibrium of G2 . Even more, this Nash equilibrium achieves the average payoff
(1/2, 1/2), therefore (1/2, 1/2) ∈ E2 .
Remark 6. Repetition allows the convexification of the equilibrium payoffs.
Example 19. Consider the following stage game G,
C2 D2 E2
C1 3,3 0,4 -10,-10
D1 4,0 1,1 -10,-10
E1 -10,-10 -10,-10 -10,-10
Game G corresponds to a prisoner’s dilemma with an extra row and column in which each player
forces the outcome of the game, independently of the strategy chosen by the partner. The set of
Nash equilibrium payoffs of the stage game is E1 = {(1, 1), (−10, −10)}. Let construct a Nash
equilibrium of the 2-stage game with payoff (2, 2):
1. In the first stage, player 1 plays C1 and player 2 plays C2 .
28
2. In the second stage, player 1 plays D1 if player 2 played C2 at stage 1, and he plays E1
(punishment) otherwise. Similarly, player 2 plays D2 if player 1 has played C1 at stage 1,
and he plays E2 (punishment) otherwise.
None of the players has an incentive from deviating from this strategy as they achieve a lower
payoff due to the punishment. We obtain then, that (2, 2) ∈ E2 . In the general T -stage game,
for any T ≥ 1, we can construct a Nash equilibrium that achieves T T−1 (3, 3) + T1 (1, 1).
Cooperation is not the only remark we can obtain from this second example. Unlike the first
example in which agents pick a strategy per stage independently of the opponents past choices,
players may determine their next action from the previous strategy profiles, as in the second
example. The sequence of strategy profiles played from 0 until the current t, is called a history
of the game of length t. With this in mind, we give the formal model of a T -stage game.
Definition 18. A history of length t is defined as a vector (s(1), ..., s(t)) ∈ S t , with s(k) being
the strategy profile played by the players at the stage k. The set of histories of length t is
Ht := S t = S × ... × S (t times). The set of all histories is denoted H := ∪Tt=0 Ht , where by
convention we say H0 = ∅.
Previously we remarked the fact that agents may determine their strategies from the past chosen
actions. In other words, players will pick a strategy from the observed history of the game.
The intuition for a behavior strategy is the following: Given a history of the game ht of any
length, player i observes ht and plays σi (ht ), that corresponds to a mixed strategy in ∆(Si ), in
his next stage.
Definition 20. Given T ∈ N, we define the T -stage game GT = (N, (Σi )i∈N , (γiT )i∈N ), with,
T T
!
1X 1X
γiT (σ) = Eσ gi (s(t)) = gi (σ(·))
T T
t=1 i=1
where σ is a behavior strategy that gives, for each stage t = {1, ..., T } and each player i ∈ N ,
the strategy σi (h(t)) ∈ ∆(Si ) played.
C2 D2
C1 3,3 0,4
D1 4,0 1,1
29
Let show by induction that, without the presence of punishments, the only equilibrium payoff of
the T -stages game, is the Nash equilibrium payoff of G. For T = 1, this is clear. Assume that for
a fixed T ≥ 1, ET = {(1, 1)} and consider a Nash equilibrium σ = (σ1 (t), σ2 (t))Tt≥1
+1
of the (T +1)-
stage repeated game. Notice that, if we consider the truncated strategy σ = (σ1 (t), σ2 (t))Tt≥2
′ +1
,
that is, the strategy starting from t = 2, we obtain a Nash equilibrium of the T -stage game.
Assuming that player 1 played (x, 1 − x) and player 2 played (y, 1 − y) in the first stage, the
equilibrium payoff of both players after the T + 1-stages is,
1 T
g1T +1 (σ) = (3xy + 4(1 − x)y + 1(1 − x)(1 − y)) +
T +1 T +1
1 T
g2T +1 (σ) = (3xy + 4x(1 − y) + 1(1 − x)(1 − y)) +
T +1 T +1
Since σ is a Nash equilibrium, it must hold that g1T +1 (σ) is greater or equal to the payoff that
player 1 would get, for example, if he played D1 at the first stage, that is,
1 T
g1T +1 (σ) ≥ g1 (D1 , y) +
T +1 T +1
⇐⇒ 3xy + 4(1 − x)y + 1(1 − x)(1 − y) ≥ 4y + 1(1 − y)
⇐⇒ 0 ≥ x =⇒ x = 0
Analogously, we can find that y = 0. This implies that both players played (D1 , D2 ) at stage 1,
and therefore the Nash equilibrium σ of the T -stage game achieves the average payoff (1, 1).
Remark 8. Repeating a finite number of times the prisoner’s dilemma is not enough for ob-
taining the cooperation of the players. Maybe if we repeat it infinitely many times?
1. ∀ε > 0, σ is an ε-Nash equilibrium of any long enough finitely repeated game, i.e. ∃T0 , ∀T ≥
T0 , ∀i ∈ N ,∀τi ∈ Σi , γiT (τi , σ−i ) ≤ γiT (σ) + ε, and,
2. ((γiT (σ))i∈N )T has a limit γ(σ) ∈ Rn when T goes to infinity.
γ(σ) ∈ Rn is called a uniform equilibrium payoff of the uniform game. The set of uniform
equilibrium payoffs is denoted by E∞ .
Repeating infinitely many times the same game allows the cooperation between players in the
prisoner’s dilemma. The issue when we consider only finitely many stages is that the agents, by
using backward induction, realize that the most rational move is to play the Nash equilibrium
at each step. Indeed, suppose that both agents are cooperation at each stage. Then, a rational
player should deviate at the last stage so he can obtain a higher payoff than cooperating. Since
both agents think the same, the two of them deviate to play the Nash equilibrium at the last
stage. Once knowing this, we can forget the last stage and consider a repeated game with one
stage less. By the same argument, the players confess at the penultimate stage. By induction,
the prisoners end up confessing at every stage. This argument is not valid with a infinite number
of stages as there is not “a last” stage.
For expressing the formal result concerning the possible equilibrium payoffs of an infinitely
repeated game, we need some definitions.
30
Definition 22. We define the set of feasible payoffs as
Definition 23. For each player i ∈ N , the punishment level of i or threat point is,
vi = min
Q max gi (xi , x−i )
x−i ∈ j̸=i ∆(Sj ) xi ∈∆(Si )
vi is also called the independent minmax of player i. It represents the lowest payoff that the rest
of the players can force to i to get. In particular, no rational player should obtain less than his
punishment level. We define the set of individually rational payoffs by
IR := {u = (ui )i∈N ∈ Rn : ui ≥ vi , ∀i ∈ N }
Finally, we define the set of feasible and individually rational payoffs as E = conv(g(S)) ∩ IR.
Given the strategy of all players except i at stage t, we can always construct a strategy for
i such that he obtains at least his punishment level at that stage. Therefore, players cannot
obtain less than their punishment levels at any equilibrium. In consequence, ET and E∞ are
both included in E.
Let us illustrate all the definitions on the previous prisoner’s dilemma: The punishment levels
are v1 = v2 = 1. Then, the feasible and individually rational payoffs are represented in the
following picture by the blue region,
g2
g1
1 4
1. The set of uniform equilibrium payoffs is the set of feasible and IR payoffs: E = E∞ .
2. If there exists u ∈ E1 such that for each player i ∈ N , ui > vi then, ET −−−−→ E.
T →∞
31
TD Nº1 - Dominant strategies
Felipe Garrido-Lucero
Let G = (N, (Si )i∈N , (gi )i∈N ) be a game and si , ti ∈ Si be two strategies for player i.
si strictly dominates ti if, ∀s−i ∈ S−i : gi (si , s−i ) > gi (ti , s−i ).
si weakly dominates ti if, ∀s−i ∈ S−i : gi (si , s−i ) ≥ gi (ti , s−i ).
si is equivalent to ti if, ∀s−i ∈ S−i : gi (si , s−i ) = gi (ti , s−i ).
- G is solvable if the iterated deletion of strictly dominated strategies outputs a trivial game.
- A strategy profile s ∈ S is Pareto optimal if there is not s′ ∈ S such that gi (s′ ) ≥ gi (s), ∀i ∈ N
and ∃i ∈ N, gi (s′ ) > gi (s).
e f g h
A 6,3 4,4 4,1 3,0
B 5,4 6,5 0,2 5,1
C 5,0 3,2 6,1 4,0
D 2,0 2,3 3,3 6,1
L R
T 1,1 0,0
M 1,1 2,1
B 0,0 2,1
Show that we can obtain two different solutions using the iterated deletion of weakly
dominated strategies.
Q3. There are two players. Each player is given an unmarked envelope and asked to put in it
either nothing, or 300 euros, or 600 euros of his own money. A referee collects the envelopes,
opens them, gathers all the money, then adds 50% of that amount (using his own money)
and divides the total into two equal parts which he then distributes to the players.
1. Represent this game frame with two alternative tables: the first table showing in each
cell the amount of money distributed to each player, the second table showing the change
in wealth of each player (money received minus contribution).
2. Suppose that player 1 has some resentment towards the referee and ranks the outcomes
in terms of how much money the referee losses (the more, the better). Meanwhile, player
2 is selfish and greedy and ranks the outcomes in terms of her own net gain. Represent
the corresponding game using a table.
3. Is there a strict dominant strategy for both players?
Q4. Let G = (N, (Si )i∈N , (gi )i∈N ) be a game. Suppose that for any player i ∈ N there exists a
strictly dominant strategy s∗i ∈ Si . Prove, by giving a counterexample, that s∗ := (s∗i )i∈N
is not always Pareto optimal.
TD Nº2 - Zero-sum games
Felipe Garrido-Lucero
Q1. Let G = (I, J, g) be a zero-sum game. Let w1 , w2 be both values of G. Prove that w1 = w2 .
L M R
T 2 1 5
M -1 -1 -1
B 0 0 0
Compute the value of the game and the optimal strategies by,
L M R
T 3 6 5
M 5 2 6
B 1 0 3
1. Prove that this game does not have a value in pure strategies.
2. Compute the value of the game and the optimal mixed strategies. For this, it may help
you to check first the presence of dominated strategies.
Q4. Consider the following zero-sum game G = ([0, 1], [3, 4], g) where g(x, y) = |x − y|. Can you
find the value and optimal strategies for G ?
TD Nº3 - Zero-sum games
Felipe Garrido-Lucero
-Von Neumann minimax Theorem. Let G = (∆(I), ∆(J), g) be a zero-sum game in mixed
strategies. It always holds,
- Indifference principle. Let (x̄, ȳ) be optimal strategies and v be the value of the game. Then,
L R
T a b
B c d
1. Suppose it holds c < a ≤ b < d. Prove that the value of this game is always a.
2. Consider that a > c, a > b, d > b and d > c. Show the value of the game is always,
ad − bc
v=
a + d − (b + c)
Q2. Let G = (I, J, g) be a zero-sum game. A saddle point is a strategy profile (s̄, t̄) ∈ S × T
such that for any (s, t) ∈ S × T ,
Show that if G has a saddle point (s̄, t̄), then G has a value and the saddle point is a profile
of optimal strategies.
R P S
R 0 -1 1
P 1 0 -1
S -1 1 0
1. Prove that this game does not have a value in pure strategies.
2. Argument why this game has a value in mixed strategies. Compute it as well as a pair
of optimal strategies.
TD Nº4 - Nash equilibrium
Felipe Garrido-Lucero
i )i∈N ) be a game with N = {1, ..., n} the set of players, Si the strategy
Let G = (N, (Si )i∈N , (gQ
set of player i and gi : i∈N Si → R the payoff function of player i
A strategy profile s∗ ∈ S is a Nash equilibrium if for any i ∈ N ,
gi (s∗i , s∗−i ) = max gi (si , s∗−i )
si ∈Si
or equivalently considering the best reply functions of the players BRi : S−i → Si such that
BRi (s−i ) = argmaxsi ∈Si gi (si , s−i ) it holds that,
∀i ∈ N, s∗i ∈ BRi (s∗−i )
Q1. Let G = (N, (Si )i∈N , (gi )i∈N ) be a game and let s∗ = (s∗i )i∈N be a Nash equilibrium of G.
Show that none of the strategies in s∗ can be eliminated by the IDSDS.
Q2. Consider the following Prisoner’s dilemma:
x y z x y z
a 2,0,4 1,1,1 1,2,3 a 2,0,3 4,1,2 1,1,2
b 3,2,3 0,1,0 2,1,0 b 1,3,2 2,2,2 0,4,3
c 1,0,2 0,0,3 3,1,1 c 0,0,0 3,0,3 2,1,0
Q4. Two neighboring countries i = 1, 2, simultaneously choose how many resources (in hours)
to spend in recycling activities ri . The average benefit πi for every dollar spent on recycling
is:
rj
πi (ri , rj ) = 10 − ri + ,
2
and the cost per hour for each country is 4. Country i’s average benefit is increasing
in the resources that the neighboring country j spends on his recycling because a clean
environment produces positive external effects on other countries.
1. Find each country’s best-response function, and compute the Nash equilibrium (r1∗ , r2∗ ).
2. Graph the best-response functions and indicate the pure strategy Nash equilibrium on
the graph.
3. On your previous figure, show how the equilibrium would change if the intercept of
one of the countries’ average benefit functions fell from 10 to some smaller number.
TD Nº5 - Nash equilibrium in mixed strategies
Felipe Garrido-Lucero
Consider G = (N, (Si )i∈N , (gi )i∈N ) a finite game such that each player has a finite number of
pure strategies. For every i ∈ N consider,
X
Σi = ∆(Si ) = σi : σi (si ) ≥ 0 and σi (si ) = 1
si ∈Si
Q1. Consider the game in which two firms simultaneously and independently decide whether to
lobby the Congress in favor a particular bill. When both firms (or none of them) lobby,
congress’ decisions are unaffected so both firms get the same payoff. If, instead, only one
of them lobbies, it benefices from the entire policy.
Q2. Consider the following game with two players and three strategies per player.
L M R
T 3,2 4,3 1,4
M 1,3 7,0 2,1
B 2,2 8,-5 2,0
L R
T 6,0 0,6
B 3,2 6,0
1. Draw every player’s expected utility for a given strategy of his opponent.
2. What is every player’s expected payoff from playing her maxmin strategy?
3. Find every player’s Nash equilibrium strategy (pure and mixed) and their payoffs.
TD Nº6 - Nash equilibrium in mixed strategies
Felipe Garrido-Lucero
and plotting them for obtaining all their intersection points. This also works for counting
the number of Nash equilibria.
L R
T 6,0 0,6
B 3,2 6,0
1. Draw every player’s expected utility for a given strategy of his opponent.
2. What is every player’s expected payoff from playing her maxmin strategy?
3. Find every player’s Nash equilibrium strategy (pure and mixed) and their payoffs.
Q2. Consider two candidates competing for office: Democrat (D) and Republican (R). For sim-
plicity, we assume that voters compare the two candidates according to only one dimension
(e.g. the budget share that each candidate promises to spend on education). Voters’ ideal
policies are uniformly distributed along the interval [0, 1], and each votes for the candidate
with a policy promise closest to the voter’s ideal. Candidates simultaneously and inde-
pendently announce their policy positions. A candidate’s payoff from winning is 1, and
from losing is -1. If both candidates receive the same number of votes, then a coin toss
determines the winner of the election.
1. Show there exists a unique pure Nash equilibrium.
2. Show that with three candidates (democrat, republican, and independent), no pure
strategy Nash equilibrium exists.
Q3. Consider two firms that compete for developing a new product. The benefit of being the
first company to produce the item is 36 million euros. Giving x1 , x2 the efforts made by
each firm, the probability of firm i of being the first developer is given by x1x+x
i
2
. Assume
that both firms have a total production cost equal to their level of effort xi .
1. Compute each firm’s best-reply function.
2. Find a symmetric Nash equilibrium, i.e. x∗1 = x∗2 = x∗ .
TD Nº7 - Potential games
Felipe Garrido-Lucero
A game G = (N, (Si )i∈N , (gi )i∈N ) is said to be a potential game if there exists a function
ϕ : S → R such that, ∀i ∈ N, ∀s ∈ S, ∀ti ∈ Si ,
Q1. Consider the matching pennies game and apply the best-reply dynamic.
H T
H 1,-1 -1,1
T -1,1 1,-1
C B
C 1,1 4,0
B 0,4 3,3
Q3. Repeat the previous question for the following game of coordination,
T F
T 2,1 0,0
F 0,0 1,2
Q4. Suppose that ϕ1 and ϕ2 are two potential functions of the same game G = (N, (Si )i∈N , (gi )i∈N ).
Prove there exists a constant c ∈ R such that ϕ1 (s) − ϕ2 (s) = c, ∀s ∈ S.
TD Nº8 - Congestion games
Felipe Garrido-Lucero
Q1. Consider the following routing game with two players. The costs in each edge correspond
to the ones if one or two players choose to use that edge. Suppose that player 1 wants to
go from A to C and player 2 from B to D.
2, 5 2, 3
C
A
4, 10
1, 3
D
Q2. There are three machines 1, 2, 3 used by firms 1 and 2. Firm 1 can produce using machines 1
and 2 or 1 and 3. Firm 2 can produce using machines 1 and 2, 1 and 3, or 2 and 3. Costs for
using machine 1 are 5 and 6 respectively, corresponding to one and two users respectively.
For machine 2 the costs are 3 and 4, and for machine 3 costs are 2 and 5 respectively.
Q3. Consider the following routing game with two players 1 and 2, where 1 goes from A to C,
and 2 goes from C to A.
1, 2 2, 3
C
A 2, 8
Then, we define the price of anarchy PoA and price of stability PoS respectively as,
C+ (EQ) C− (EQ)
P oA(G) = , P oS(G) =
C(OP T ) C(OP T )
Q1. Consider the following network in which four players travel: player 1 goes from u to v,
player 2 goes from u to w, player 3 goes from v to w, and player 4 goes from w to v.
x
u v
0
0 x x
x
Q2. Consider the non-linear version of Pigou’s example, where p ≥ 1 is a fixed natural number,
and n players travel from O to D.
cT (x) = 1
O D
x p
CB (x) = n
Q3. Price of anarchy can be defined for more games that just congestion games. Consider a
network formation game defined by a set of agents N and some value α ∈ R+ . Suppose
that agents can create directedP arcs between them. Given E a set of arcs, the cost of each
player is ci (E) = α · deg(i) + j∈N d(i, j), where deg(i) represents the number of outgoing
arcs from i and d(i, j) is the shortest distance between i and j in the undirected graph.
Consider a network formation game with five players.
Definition 24. For each player i ∈ N his punishment level (or threat point) is,
vi = min
Q max gi (xi , x−i )
x−i ∈ j̸=i ∆(Sj ) xi ∈∆(Si )
1. The set of uniform equilibrium payoffs is the set of feasible and IR payoffs: E = E∞ .
2. If there exists u ∈ E1 such that for each player i ∈ N , ui > vi then, ET −−−−→ E.
T →∞
C2 D2
C1 3,3 0,4
D1 4,0 1,1
Q2. Compute the set of feasible and individually rational payoffs in the following games:
L R L R L R
T 1,1 3,0 T 1,0 0,1 T 1,-1 -1,1
B 0,3 0,0 B 0,0 1,1 B -1,1 1,-1
Q3. Let G be a finite 2-player zero-sum game. What are the equilibrium payoffs of the finitely
repeated game GT and the uniform game G∞ ?