Lecture Note 1 Introduction
Lecture Note 1 Introduction
2 Game Theory
The first pillar of the course will be game theory. GT is a tool for analyzing the behavior of rational
agents when
1
In this representation, Player 1 chooses a row to play, and Player 2 chooses a column to play. Player
1 tries to maximize the first value at the resulting entry in the bimatrix, while Player 2 tries to
maximize the second value.
Here is an example of something that is not a Nash equilibrium: Player 1 always plays rock,
and Player 2 always plays scissors. In this case, Player 2 is not playing optimally given the strategy
of Player 1, since they could improve their payoff from −1 to 1 by switching to deterministically
playing paper. In fact, this argument works for any pair of deterministic strategies, and so we see
that there is no Nash equilibrium consisting of deterministic strategies.
Instead, RPS is an example of a game where we need randomization in order to arrive at a
Nash equilibrium. The idea is that each player gets to choose a probability distribution over their
actions instead (e.g. a distribution over rows for Player 1). Now, the value that a given player
receives under a pair of mixed strategies is their expected payoff given the randomized strategies.
In RPS, it’s easy to see that the unique Nash equilibrium is for each player to play each action
with probability 13 . Given this distribution, there is no other action that either player can switch
to and improve their utility. This is what we call a (mixed-strategy) Nash equilibrium.
The famous results of John Nash from 1951 is that every game has a Nash equilibrium:
The attentive reader may have noticed that the RPS game has a further property: whenever
one player wins, the other loses. More generally, a bimatrix game is a zero-sum game if it can be
represented in the following form:
minn maxm x> Ay
x∈∆ y∈∆
where ∆ n , ∆m are the n and m-dimensional probability simplexes, respectively, and A contains the
payoff entries to the y-player from the bimatrix representation. This is called a bilinear saddle-
point problem. The key here is that we can now represent the game as a single matrix, where the
x-player wishes to minimize the bilinear term x> Ay and the y-player wishes to maximize it. Zero-
sum matrix games are very special: they can be solved in polynomial time with a linear program
whose size is linear in the matrix size.
A more exciting application of zero-sum games is to use it to compute an optimal strategy for
two-player poker (AKA heads-up poker). In fact, as we will discuss later, this was the foundation
for many recent” “superhuman AI for poker” results [1, 13, 2, 4]. In order to model poker games
we will need a more expressive game class called extensive-form games (EFGs). These games are
played on trees, where players may sometimes have groups of nodes, called information sets, that
they cannot distinguish among. An example is shown in Figure 1.
EFGs can also be represented as a bilinear saddle-point problem:
where X, Y are no longer probability simplexes, but more general convex polytopes that encode
the sequential decisions spaces of each player. This is called the sequence-form representation [17],
and we will cover that later. Like matrix games, zero-sum EFGs can be solved in polynomial time
with linear programming, and the LP has size linear in the game tree.
It turns out that in many practical scenarios, the LP for solving a zero-sum game ends up
being far too large to solve. This is especially true for EFGs, where the game tree can quickly
become extremely large if the game has almost any amount of depth. Instead, iterative methods
are used in practice. What is meant by iterative methods here is the class of algorithms that build
a sequence of strategies x0 , x1 , . . . , xT , y0 , y1 , . . . , yT where only a constant number of strategies is
2
C
A K
P1 P1
f r r f
−1 P2 P2 −1
f r f r
1 P1 1 P1
c f c f
3 −2 −3 −2
Figure 1: A poker game where P1 is dealt Ace or King. “r,” “f,” and “c” stands for raise, fold, and
check respectively. Leaf values denote P1 payoffs. The shaded area denotes an information set: P2
does not know which of these nodes they are at, and must thus use the same strategy in both.
kept in memory, and only oracle access to Ay and A> x is needed (this is different from writing
1 PT 1 PT
down A explicitly!). Typically the average strategies x̄T = T t=1 xt , ȳT = T t=1 yt converge
to a Nash equilibrium. The reason these methods are preferred is two-fold, first by never writing
down A explicitly we save a lot of memory (now we just need enough memory to store the much
smaller x, y strategy vectors), secondly they avoid the expensive matrix inversions involved in the
simplex algorithm and interior-point methods.
The algorithmic techniques we will learn in this section are largely centered around iterative
methods. First, we will do a quick introduction to online learning and online convex optimization.
We will
√ learn about two classes of algorithms: ones that converge to an equilibrium at a rate
O(1/ T ). These roughly correspond to saddle-point variants of gradient-descent-like methods.
Then we will learn about methods that converge to the solution at a rate of O(1/T ). These
roughly correspond to saddle-point variants of accelerated gradient methods. Then we will also
look at the practical performance of these algorithms. Here we will see that the following quote is
very much true:
In theory, theory and practice are the same. In practice, they are not.
3
methods were used to create a small enough game that can be solved with iterative methods. We
will also cover how these methods are used.
Killer applications of zero-sum games include poker (as we saw), other recreational two-player
games, and generative-adversarial networks (GANs). Other applications that are, as of yet, less
proven to be effective in practice are robust sequential decision making (the adversary represents
uncertainty), and security scenarios where we assume the world is adversarial.
3 Market Design
The second pillar of the notes will be market design. In market design we are typically concerned
with how to design the rules of the game, and how to do that in order to achieve “good” outcomes.
4
Thus, game theory is a key tool in market design, since it will be our way of understanding what
outcomes we may expect to arise, given a set of market rules.
Market design is a huge area, and so it has many killer applications. The ones we will see in
this course include Internet ad auctions and how to fairly assign course seats to students. However
there are many others such as how to price and assign rideshares at Lyft/Uber, how to assign NYC
kids to schools, how to enable nationwide kidney exchanges, how to allocate spectrum, etc.
Student 1 arrives first and signs up for course B. Then Student 2 arrives and signs up for A. The
total welfare of this assignment is 5 + 2 = 7. This does not seem to be an efficient use of resources:
we can improve our solution by swapping the courses, since Student 1 gets the same utility as
before, and Student 2 improves their utility. This is what’s called a Pareto-improving allocation
because each student is at least as well off as before, and at least one student is strictly better off.
One desiderata for efficiency is that no such improvement should be possible; an allocation with
this property is called Pareto efficient.
Let’s look at another example. Now we have 2 students and 4 courses, where each student takes
2 courses. Again courses have only 1 seat.
Course A Course B Course C Course D
Student 1 10 10 1 1
Student 2 10 10 1 1
Now say that Student 1 shows up first, and signs up for A and B. Then Student 2 shows up and
signs up for C and D. Call this assignment x1 . Here we get that x1 is Pareto efficient, but it does
not seem fair. A fairer solution seems to be that each students get a course with value 10 and a
course with value 1, let x2 be such an allocation. One way to look at this improvement is through
the notion of envy: each student should like their own course schedule at least as well as that of
any other student. Under x1 Student 2 envies Student 1, whereas under x2 no student envies the
other. Fairness turns out to be a complicated idea, and we will see later that there are several
appealing notions that we may wish to strive for.
Instead of first-come-first-serve, we can use ideas from market design to get a better mechanism.
The solution that we will learn about is based on a fake-money market: we give every student some
fixed budget of fake currency (aka funny money). Then, we treat the assignment problem as
5
a market problem under the assigned budgets, and ask for what is called a market equilibrium.
Briefly, a market equilibrium is a set of prices, one for each item, and an allocation of items to
buyers. The allocation must be such that every item is fully allocated, and every buyer is getting
an assignment that maximizes their utility given the prices and their budget. Given such a market
equilibrium, we then take the allocation from the equilibrium, throw away the prices (the money
was fake anyway!), and use that to perform our course allocation. This turns out to have a number
of attractive fairness and efficiency properties. This system is deployed at several business schools
such as Wharton (UPenn), Rotman (U Toronto), and Tuck (Dartmouth) [5, 6].
Of course, if we want to implement this protocol we need to be able to compute a market
equilibrium. This turns out to be a rich research area: in the case of what is called a Fisher
market, where each agent i has a linear utility function vi ∈ Rm + over the m items in the market
there is a neat convex program that results in a market equilibrium [9]:
X
max Bi log(vi · xi )
x≥0
i
X
s.t. xij ≤ 1, ∀j
i
Here xij is how much buyer i is allocated of item j. Notice that we are simply maximizing the
budget-weighted logarithmic utilities, with no prices! It turns out that the prices are the dual
variables on the supply constraints. We will see some nice applications of convex duality and
Fenchel conjugates in deriving this relationship. We will also see that this class of markets have a
relationship to the types of auction systems that are used at Google and Facebook [7, 8].
In the case of markets such as those for course seats, the problem is computationally harder and
requires combinatorial optimization. Current methods use a mixture of MIP and local search [6].
4 Target Audience
These notes are targeted at senior undergraduates, master’s, and Ph.D. students in operations
research and computer science. They assume a basic background in convex, linear, and integer
optimization. They also assume knowledge of basic computational complexity theory (e.g. that
mixed-integer programming is NP-hard). Most of these things can be learned alongside the course.
The course does not assume any background in game theory or mechanism design.
5 Acknowledgments
These lecture notes (the present one and the following ones) owe a large debt to several other
professors that have taught courses on Economics and Computation. In particular, Tim Roughgar-
den’s lecture notes [14] and video lectures, John Dickerson’s course at UMD1 , and Ariel Procaccia’s
course at CMU2 provided inspiration for course topics as well as presentation ideas.
I would also like to thank the following people for extensive feedback on the lecture notes:
Ryan D’Orazio for both finding mistakes and providing helpful suggestions on presenting Blackwell
approachability. Gabriele Farina for discussions around several regret minimization issues. And for
helping me develop much of my thinking on regret minimization in general.
I also thank the following people who pointed out mistakes and typos in the notes: Mustafa
Mert Çelikok, Ajay Sakarwal, Eugene Vinitsky.
1
https://www.cs.umd.edu/class/spring2018/cmsc828m/
2
http://www.cs.cmu.edu/ arielpro/15896s16/index.html
6
References
[1] Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold’em
poker is solved. Science, 347(6218):145–149, 2015.
[2] Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus
beats top professionals. Science, 359(6374):418–424, 2018.
[3] Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted re-
gret minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33,
pages 1829–1836, 2019.
[4] Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365
(6456):885–890, 2019.
[5] Eric Budish. The combinatorial assignment problem: Approximate competitive equilibrium
from equal incomes. Journal of Political Economy, 119(6):1061–1103, 2011.
[6] Eric Budish, Gérard P Cachon, Judd B Kessler, and Abraham Othman. Course match: A
large-scale implementation of approximate competitive equilibrium from equal incomes for
combinatorial allocation. Operations Research, 65(2):314–336, 2016.
[7] Vincent Conitzer, Christian Kroer, Eric Sodomka, and Nicolás E Stier-Moses. Multiplica-
tive pacing equilibria in auction markets. In International Conference on Web and Internet
Economics, 2018.
[8] Vincent Conitzer, Christian Kroer, Debmalya Panigrahi, Okke Schrijvers, Eric Sodomka, Nico-
las E Stier-Moses, and Chris Wilkens. Pacing equilibrium in first-price auction markets. In
Proceedings of the 2019 ACM Conference on Economics and Computation. ACM, 2019.
[9] Edmund Eisenberg and David Gale. Consensus of subjective probabilities: The pari-mutuel
method. The Annals of Mathematical Statistics, 30(1):165–168, 1959.
[10] Fei Fang, Thanh H Nguyen, Rob Pickles, Wai Y Lam, Gopalasamy R Clements, Bo An,
Amandeep Singh, Brian C Schwedock, Milin Tambe, and Andrew Lemieux. Paws—a deployed
game-theoretic application to combat poaching. AI Magazine, 38(1):23–36, 2017.
[11] Samid Hoda, Andrew Gilpin, Javier Pena, and Tuomas Sandholm. Smoothing techniques for
computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):
494–512, 2010.
[12] Christian Kroer, Gabriele Farina, and Tuomas Sandholm. Solving large sequential games with
the excessive gap technique. In Advances in Neural Information Processing Systems, pages
864–874, 2018.
[13] Matej Moravčı́k, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin Morrill, Nolan Bard, Trevor
Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level arti-
ficial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
[14] Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press,
2016.
7
[15] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cam-
bridge university press, 2011.
[16] Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit
Texas hold’em. In Twenty-Fourth International Joint Conference on Artificial Intelligence,
2015.
[17] Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic
Behavior, 14(2):220–246, 1996.
[18] Haifeng Xu. The mysteries of security games: Equilibrium computation becomes combinatorial
algorithm design. In Proceedings of the 2016 ACM Conference on Economics and Computation,
pages 497–514. ACM, 2016.