0% found this document useful (0 votes)
4 views10 pages

Lecture09 Notes

This lecture focuses on zero-sum games, where two decision-makers have opposing objectives, and explores concepts such as value, minimax, and maximin strategies. It introduces the formal model of zero-sum games, discusses security levels, and provides examples like Matching Pennies and Rock-Paper-Scissors. The lecture also examines the relationship between security levels and the implications of mixed strategies on these levels.

Uploaded by

rashmigupt1978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Lecture09 Notes

This lecture focuses on zero-sum games, where two decision-makers have opposing objectives, and explores concepts such as value, minimax, and maximin strategies. It introduces the formal model of zero-sum games, discusses security levels, and provides examples like Matching Pennies and Rock-Paper-Scissors. The lecture also examines the relationship between security levels and the implications of mixed strategies on these levels.

Uploaded by

rashmigupt1978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Game Theory

Lecture #9 – Zero-Sum Games

Focus of Lecture:

ˆ Zero-Sum Games
ˆ Value
ˆ Minimax and Maximin

1 Introduction
The last lecture introduced the idea of strategic decision-making in uncertain environments.
Here, we focused on scenarios where an individual was tasked with making decisions to
maximize her utility (such as whether or not to bring an umbrella). A significant complication
that arose in this setting was that the individual was not able to fully assess the ramification
of a decision, as her utility was dependent on both her choice and an environmental factor
that was out of her control. How should an individual assess the quality of a given decision
for such situations?
A central component of the decision-maker’s analysis is a predictive model for the environ-
ment’s behavior. For example, one could have a probabilistic forecast for the environment,
e.g., it is going to rain with 50% probability, and evaluate the quality of a decision through
an appropriately defined average or expectation. This may be a reasonable path forward in
scenarios where an individual has a reliable and comprehensive model of the environment.
For situations where such a model is not readily available, an alternative modeling choice is
that of a worst-case model. Here, an individual models the environment from a worst-case
perspective where she identifies the worst possible environmental choice for each of her pos-
sible decisions. Our last lecture demonstrated that worst-case models could be effective for
strategic decision-making provided that the individual was able to randomize her choices.
In this lecture we are going to start the transition from systems with a single strategic
decision-maker to systems with multiple strategic decision-makers. This lecture will focus
on a special class of such systems, termed zero-sum games, that involve two decision-makers
with opposed objectives. Here, we will continue to focus on worst-case models and ask
the following question: what happens if both decision-makers in a zero-sum game employ
worst-case models against each other?

2 Zero-Sum Games
In this section we introduce the formal model for zero-sum games. These strategic environ-
ments involve two decision-makers with diametrically opposed objectives. The specifics of
the model are as follows:
ˆ Decision-makers: There are two decision-makers, i.e., N = {1, 2}. We will use the
terms decision-makers, players, and agents interchangeably throughout the text.
ˆ Choice Sets: Each decision-maker i ∈ N is associated with a given choice set Ai .
The set of joint choices is defined by A = A1 × A2 . We will use the terms choices and
actions interchangeably throughout the text.
ˆ Utility Function: Each decision-maker i ∈ N is associated with a given utility
function Ui : A → R that defines her preference over the joint actions A. We will use
the terms utilities, payoffs, rewards, and objectives interchangeably throughout the
text.
ˆ Zero-Sum Property: A zero sum game imposes the constraint that for any choice
profile a = (a1 , a2 ) ∈ A we have that U1 (a) + U2 (a) = 0. That is, the payoffs of the
decision-makers always sums to 0 for any joint choice a ∈ A.

In the following we look at two classic examples of zero-sum games and how we represent
the components highlighted above.

Example 2.1 (Matching Pennies) The game matching pennies involves two players each
with a choice set A1 = A2 = {H, T }, where we refer to H as heads and T as tails. The goal
of player 1 is to match the choice of player 2, while the goal of player 2 is to not match the
choice of player 1. The set of joint choices is given by A = {(H, H), (H, T ), (T, H), (T, T )},
where the first entry in each tuple corresponds to the choice of player 1 and the second entry
in each tuple corresponds to the choice of player 2. The following utility functions encode
the players’ preferences defined above where we associate a payoff of 1 if the player’s goal is
satisfied and −1 otherwise, i.e.,

U1 (a1 = H, a2 = H) = 1 U2 (a1 = H, a2 = H) = −1
U1 (a1 = H, a2 = T ) = −1 U2 (a1 = H, a2 = T ) = 1
U1 (a1 = T, a2 = H) = −1 U2 (a1 = T, a2 = H) = 1
U1 (a1 = T, a2 = T ) = 1 U2 (a1 = T, a2 = T ) = −1

Rather than express each utility function exhaustively as above, here we can use a mild
variation of the payoff matrix presented in the previous lecture given by

H T
H 1, −1 −1, 1
T −1, 1 1, −1

The row indicates the decision of player 1, the column indicates the decision of player 2,
and the cell corresponding to a particular choice for each player, i.e., the intersection of the
chosen Row and Column, contains the payoffs to both player 1 (first number) and 2 (second
number) for that joint choice. In a zero-sum game, note that the terms in any cell add to 0;
hence, we will often only present the payoffs to player 1, which we will refer to as the row
player, i.e.,
H T
H 1, −1
T −1 1

Given this minimal representation, we will ofter refer to the row player as the maximizing
player, i.e., goal is to maximize the cell number, and the col player as the minimizing
player, i.e., goal is to minimize the cell number. Note that minimizing the cell number is
consistent with maximizing the negative of the cell number, which is the utility of col.

Example 2.2 (Rock - Paper - Scissors) Consider the classic game Rock - Paper - Scis-
sors which involves two players N = {1, 2} and each players is associated with a choice set
A1 = A2 = {R, P, S}, i.e., Rock, Paper, or Scissors. The set of joint choices is given by
A = {(R, R), (R, P ), (R, S), . . . , (S, R), (S, P ), (S, S)}. If we associate a payoff of 1 to Win,
0 to Tie, and −1 to Loss, we have a zero-sum game with the payoff matrix

R P S
R 0 −1 1,
P 1 0 −1
S −1 1 0

As before, the row player is the maximizing player whose goal is to maximize the cell number
and the col player is the minimizing player whose goal is to minimize the cell number.

3 Strategic Behavior in Zero-Sum Games


What choice should each player make in a zero-sum game? Answering this question requires
developing a reasonable model of the opposing player’s behavior which is a difficult task.
One model which seems reasonable in zero-sum games is a worst-case model as discussed
extensively in the last lecture. Analyzing the performance of this model requires analyzing
all the “what if” scenarios for each player, i.e., what would the opponent do if I play ai ?
The performance guarantees attainable through such worst-case models are referred to as
security levels, which we formally define below:

Definition 3.1 (Security Levels and Security Strategies) Consider a two-player zero-
sum game with action sets A1 and A2 and a utility function for row given by U : A1 ×A2 →
R. Note that the utility function for player 2 is given by −U . The security level of player 1,
i.e., the highest achievable payoff when using a worst-case model, is given by
v = max min U (a1 , a2 ). (1)
a1 ∈A1 a2 ∈A2

We will refer to a security strategy of player 1 as any action ass 1 ∈ A1 that guarantees a
ss
payoff of at least v, i.e., Ui (a1 , a2 ) ≥ v for all a2 ∈ A2 . The security level of player 2, i.e.,
the lowest achievable value of U when using a worst-case model, is given by
v = min max U (a1 , a2 ). (2)
a2 ∈A2 a1 ∈A1
We will refer to a security strategy of player 2 as any action ass
2 ∈ A1 that guarantees a
penalty of at most v, i.e., U (a1 , ass
2 ) ≤ v for all a 1 ∈ A1 .

Before proceeding, we will spend just a little time analyzing the max min and min max
operators given above. First, these operators have an order commitment that reads from
left to right. For example, in (1) player 1 first commits to a choice a1 ∈ A1 and then agent
2, observing this choice a1 , selects an action a2 with minimum utility value. Accordingly, we
can rewrite (1) as
v = max U (a1 , W C2 (a1 )).
a1 ∈A1

where
W C2 (a1 ) ∈ arg min U (a1 , a2 )
a2 ∈A2

is a worst-case choice of a2 given knowledge of a1 . Similarly, one can rewrite (2) as


v = min U (W C1 (a2 ), a2 ).
a2 ∈A2

where
W C1 (a2 ) ∈ arg max U (a1 , a2 )
a1 ∈A1

The following example will shed some light on the computation of security strategies and
security levels.

Example 3.1 Consider a two-player game with payoff matrix given by

L R
T 3 0
B 1 2

Focusing on row, if row plays T , then the worst-case outcome is 0. Alternatively, if row
plays B, then the worst-case outcome is 1. Accordingly, row’s security strategy is B and
the security level is v = 1. Focusing on col, if col plays L, then the worst-case outcome
(i.e., maximum penalty) is 3. Alternatively, if col plays R, then the worst-case outcome is
2. Accordingly, col’s security strategy is R and the security level is v = 2.

Example 3.2 Consider an alternative two-player game with payoff matrix given by

L R
T 3 0
B 2 1

Focusing on row, if row plays T , then the worst-case outcome is 0. Alternatively, if row
plays B, then the worst-case outcome is 1. Accordingly, row’s security strategy is B and
the security level is v = 1. Focusing on col, if col plays L, then the worst-case outcome
(i.e., maximum penalty) is 2. Alternatively, if col plays R, then the worst-case outcome is
1. Accordingly, col’s security strategy is R and the security level is v = 1.
The above examples appear to demonstrate that v ≤ v; however, whether or not this is true
in general will be addressed in the following section. In Example 3.2 in particular we had
v = v; this turns out to be an important concept in zero-sum games. Whenever it is the
case that a game has v = v, we will say that such a game has a value:

Definition 3.2 (Game Value) Consider a two-player zero-sum game with action sets A1
and A2 with a payoff matrix given by U : A1 × A2 → R. If v = v, we say that the game has
a value, and that value is defined as v := v = v.

4 Comparing Security Levels


In this section we start to explore the relationship between v and v. We begin by providing
a lemma pertaining to the min max and max min operators defined above. Informally, this
lemma demonstrates that the largest minimum (i.e., max min) is always smaller than the
smallest maximum (i.e., min max).

Lemma 4.1 Let F : X × Y → R be any function where X and Y are discrete sets. Then

max min F (x, y) ≤ min max F (x, y).


x∈X y∈Y y∈Y x∈X

Proof 4.1 Suppose (x̃, ỹ) and (x̃0 , ỹ 0 ) are the solutions associated with the max min and
min max problems defined above, i.e.,

F (x̃, ỹ) = max min F (x, y), (3)


x∈X y∈Y

F (x̃0 , ỹ 0 ) = min max F (x, y). (4)


y∈Y x∈X

Focusing on the pair (x̃0 , ỹ), we have that

F (x̃0 , ỹ) ≥ min F (x, ỹ) = F (x̃, ỹ) = max min F (x, y),
x∈X x∈X y∈Y

where the first equality comes from the definition of max min. Furthermore, we also have
that
F (x̃0 , ỹ) ≤ max F (x̃0 , y) = F (x̃0 , ỹ 0 ) = min max F (x, y),
y∈Y y∈Y x∈X

where the first equality comes from the definition of min max. Combining these two sets of
inequalities gives us

max min F (x, y) ≤ F (x̃0 , ỹ) ≤ min max F (x, y),


x∈X y∈Y y∈Y x∈X

which completes the proof.

Given this lemma, we can now state the following proposition which shows how security
levels compare in zero-sum games.
Proposition 4.1 Consider a two-player zero-sum game with action sets A1 and A2 and a
utility function for player 1 (ROW) given by U : A1 × A2 → R. Then the security levels of
the two players satisfy v ≤ v.

This proposition confirms our intuition of the previous section.

5 What about mixed strategies?


In this section we want to explore how the use of mixed strategies impact the resulting
security levels and security strategies of the players. To that end, it is convenient to express
the payoff matrix as merely a matrix M with rows I, columns J , and game matrix elements
mij which corresponds to the entry M (i, j). Given this notation, the computation of the
security levels of the two players is now of the form
v = max min mij , (5)
i∈I j∈J
v = min max mij , (6)
j∈J i∈I

and we know from Proposition 4.1 that v ≤ v. We will refer to i∗ as a maximizing security
strategy (or maximinimizer) if
v ≤ mi∗ j for all j
i.e., i∗ assures a payoff of at least v. Similarly, we will refer to j ∗ as a minimizing security
strategy (or minimaximizer) if
mij ∗ ≤ v for all i
i.e., j ∗ assures a penalty of at most v.
The matrix notation above allows us to easily assess the impact of probabilistic strategies
on the resulting security levels. Given a row strategy p ∈ ∆(I) and col strategy q ∈ ∆(J ),
the expected utility to the row player is defined as
XX
U(p, q) = pi × qj × mij = pT M q,
i∈I j∈J

which is the expected utility to row given that each player is choosing independently accord-
ingly to their respective mixed strategies. While the computation of the expected utilities
looks complicated, it can be computed using matrix multiplication as shown above. Accord-
ingly, one can re-define the security levels of the two player when allowing mixed strategies,
v = max min pT M q
p∈∆(I) q∈∆(J )

v = min max pT M q.
q∈∆(J ) p∈∆(I)

As before, we could easily show that v ≤ v and that the game has a value of v ∗ if v = v = v ∗ .
Lastly, we will refer to p∗ as a maximizing security strategy (or maximinimizer) if
v ≤ p∗T M q for all q,
and q ∗ as a minimizing security strategy (or minimaximizer) if

pT M q ∗ ≤ v for all p.

Example 5.1 Recall the previous two-player zero-sum game with payoff matrix given by

L R
T 3 0
B 1 2

When restricting attention to pure strategies, i.e., non-probabilistic strategies, we found the
security levels were v = 1 and v = 2. Suppose row playing a strategy (p, 1 − p), i.e., play T
with probability p and B with probability (1 − p). Given this strategy, the expected utility of
row satisfies

U(p, L) = 3p + 1(1 − p) = 2p + 1
U(p, R) = 2(1 − p) = 2 − 2p

Plotting these expected utilities as in

2.5

2
Expected Utility for ROW

COL playing R
1.5 COL player L

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p

reveals a new security level of v = 1.5 with a security strategy p = 1/4. Repeating this
analysis for col, if col is playing a strategy (q, 1 − q), i.e., play L with probability q and
R with probability (1 − q), then the expected penalty of col satisfies

U(T, q) = 3q + 0(1 − q) = 3q
U(B, q) = q + 2(1 − q) = 2 − q,

Plotting these expected penalities as in


3

2.5

Expected Penalty for COL


2

ROW playing T
ROW playing B

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
q

reveals a new security level of v = 1.5 with a security strategy q = 1/2. Note that now over
mixed strategies we have v = v.

Observe that v = v in the example above, i.e., the game has a value. This is in contrast to
our previously studied examples where v > v; however, the computation in those example
involved pure strategies, i.e., non-probabilistic strategies, as opposed to mixed strategies.
The following famous theorem demonstrates that a game will always have a value when
considering mixed strategies.

Theorem 5.1 (Minimax Theorem) Consider any two-player zero-sum game with player
set N = {1, 2}, action sets A1 and A2 , and payoff matrix U : A → R. The game always has
a value when considering mixed strategies, i.e.,
v = max min U(p, q) = min max U(p, q) = v. (7)
p∈∆(A1 ) q∈∆(A2 ) q∈∆(A2 ) p∈∆(A1 )

The proof of this theorem involves judicious use of the “separating hyperplane” theorem,
which is beyond the scope of this class. Nonetheless, this remarkable theorem provides
several noteworthy results. First, it establishes that every zero-sum matrix game has a
value over mixed strategies. Second, in zero-sum games mixed strategies do often constitute
a reasonable prediction of behavior. Lastly, computing the players’ security strategies in
zero-sum games is relatively straightforward and can be done in a tractable fashion.

6 Conclusion
This lecture established our first significant result pertaining to strategic decision-making in
competitive environments. In particular, we demonstrated that any zero-sum game possesses
a value, i.e., the security levels of the two players are the same. This also implies a stability
condition regarding these security strategies. That is, while a security strategy optimizes
worst-case guarantees, it is also a best response to the opponent’s security strategy. The
following lectures will focus on exploring this stability condition beyond zero-sum games.
7 Exercises
1. A zero-sum game has a payoff (for the row player) given by
 
1 3
4 2

(a) Compute the security strategies for both players using pure actions, and conclude
that the game does not have a value in this case.
(b) Repeat using mixed strategies, and compute the value of the game.
2. A zero-sum game has a payoff (for the row player) given by
 
0 3 1
4 1 2

(a) Compute the security strategies for both players using pure actions, and conclude
that the game does not have a value in this case.
(b) Repeat using mixed strategies for the row player.
(c) Compute the value of the game.
(d) What is the column player’s security level? Identify a security strategy for the
column player.
3. Consider the following resource allocation problem known as a Colonel Blotto game

Colonel 1 Colonel 1

v1
<latexit sha1_base64="yvbwXYS+8s427c1T0pWQT/aFexQ=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY9OKxov2ANpbNdtIu3WzC7qZQQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKnjVDFssFjEqh1QjYJLbBhuBLYThTQKBLaC0e2s3hqj0jyWj2aSoB/RgeQhZ9RY62H85PXKFbfqzkVWwcuhArnqvfJXtx+zNEJpmKBadzw3MX5GleFM4LTUTTUmlI3oADsWJY1Q+9l81Sk5s06fhLGyTxoyd39PZDTSehIFtjOiZqiXazPzv1onNeG1n3GZpAYlW3wUpoKYmMzuJn2ukBkxsUCZ4nZXwoZUUWZsOiUbgrd88io0L6qe5fvLSu0mj6MIJ3AK5+DBFdTgDurQAAYDeIZXeHOE8+K8Ox+L1oKTzxzDHzmfPwchjZ0=</latexit>
v2
<latexit sha1_base64="FG2gWdgINTC3AsiSGTLmOtGSYIA=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2i6DHohePFW0ttGvJprNtaDa7JNlCWfoTvHhQxKu/yJv/xrTdg7a+EHh4Z4bMvEEiuDau++0U1tY3NreK26Wd3b39g/LhUUvHqWLYZLGIVTugGgWX2DTcCGwnCmkUCHwMRjez+uMYleaxfDCTBP2IDiQPOaPGWvfjp1qvXHGr7lxkFbwcKpCr0St/dfsxSyOUhgmqdcdzE+NnVBnOBE5L3VRjQtmIDrBjUdIItZ/NV52SM+v0SRgr+6Qhc/f3REYjrSdRYDsjaoZ6uTYz/6t1UhNe+RmXSWpQssVHYSqIicnsbtLnCpkREwuUKW53JWxIFWXGplOyIXjLJ69Cq1b1LN9dVOrXeRxFOIFTOAcPLqEOt9CAJjAYwDO8wpsjnBfn3flYtBacfOYY/sj5/AEIpY2e</latexit>
v1
<latexit sha1_base64="yvbwXYS+8s427c1T0pWQT/aFexQ=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY9OKxov2ANpbNdtIu3WzC7qZQQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKnjVDFssFjEqh1QjYJLbBhuBLYThTQKBLaC0e2s3hqj0jyWj2aSoB/RgeQhZ9RY62H85PXKFbfqzkVWwcuhArnqvfJXtx+zNEJpmKBadzw3MX5GleFM4LTUTTUmlI3oADsWJY1Q+9l81Sk5s06fhLGyTxoyd39PZDTSehIFtjOiZqiXazPzv1onNeG1n3GZpAYlW3wUpoKYmMzuJn2ukBkxsUCZ4nZXwoZUUWZsOiUbgrd88io0L6qe5fvLSu0mj6MIJ3AK5+DBFdTgDurQAAYDeIZXeHOE8+K8Ox+L1oKTzxzDHzmfPwchjZ0=</latexit>
v2
<latexit sha1_base64="FG2gWdgINTC3AsiSGTLmOtGSYIA=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2i6DHohePFW0ttGvJprNtaDa7JNlCWfoTvHhQxKu/yJv/xrTdg7a+EHh4Z4bMvEEiuDau++0U1tY3NreK26Wd3b39g/LhUUvHqWLYZLGIVTugGgWX2DTcCGwnCmkUCHwMRjez+uMYleaxfDCTBP2IDiQPOaPGWvfjp1qvXHGr7lxkFbwcKpCr0St/dfsxSyOUhgmqdcdzE+NnVBnOBE5L3VRjQtmIDrBjUdIItZ/NV52SM+v0SRgr+6Qhc/f3REYjrSdRYDsjaoZ6uTYz/6t1UhNe+RmXSWpQssVHYSqIicnsbtLnCpkREwuUKW53JWxIFWXGplOyIXjLJ69Cq1b1LN9dVOrXeRxFOIFTOAcPLqEOt9CAJjAYwDO8wpsjnBfn3flYtBacfOYY/sj5/AEIpY2e</latexit>

Colonel 2 Colonel 2-a Colonel 2-b

There are two Colonels, denoted by N = {1, 2}, and the number of assets (or soldiers)
available to each Colonel is B1 = 5 and B2 = 3. In this problem, each Colonel must
simultaneously decide how to allocate its assets over two distinct battlefields that have
valuations v 1 ≥ 0 and v 2 ≥ 0. A Colonel wins a battlefield when it allocates strictly
more assets to that battlefield than its opponent, and consequently the other Colonel
loses. The payoff to a Colonel is the total value of the battlefields it won minus the
total value of the battlefields it lost (ties give zero payoff to either side). For example,
if Colonel 1 allocates 4 assets to battlefield 1 and 1 asset to battlefield 2, which we
denote by x1 = (4, 1), and Colonel 2’s allocation is x2 = (1, 2), the valuation to Colonel
1 is v 1 − v 2 and the valuation to Colonel 2 is v 2 − v 1 .
(a) What are the set of actions available to each Colonel?
(b) What is the payoff matrix of the Colonel Blotto game?
(c) Suppose v 1 = v 2 = 1. What are the security levels of each Colonel under pure
strategies?
(d) Extra Credit: What are the security levels of each Colonel under mixed strate-
gies?

You might also like