lecture_notes_adversarial_search
lecture_notes_adversarial_search
Today’s topic – Adversarial Search – stems from looking at search in a competitive context. In
a competitive context, we have two (or more) agents with conflicting interests (adversaries).
Most commonly, adversarial search is studied in the setting of games with two opposing play-
ers, and as such this is also the setting we focus on in this course.
Within this games setting, we will begin by covering the MiniMax algorithm for computing
optimal moves. We will briefly discuss the complexity of games, and conclude that MiniMax is
infeasible for most problems. We will look at some options of making MiniMax more efficient,
specifically 𝛼 − 𝛽 pruning and evaluation functions. Finally, we will dip our toes into the topic
of approximate algorithms with Monte Carlo Tree Search.
1 Why Games?
Ever since the dawn of modern computing, games have been on the table as an interesting
objective. Already in the 1800s, Charles Babbage was talking about computers playing chess
and checkers. So, why?
One factor is the simple fact that it is hard for a computer to perform well at many games.
Beating humans at games has been a long-standing goal in terms of artificial intelligence. Only
recently (relative to the entire computing history) have computers started beating humans at
most boardgames, and even now complex games like go require enormous computing power.
However, I’d argue that the most important factor is that games are (generally) well-defined,
and easy to formalize (i.e. easy to express in code and math). By developing robust theory
on well-defined games, we can often apply the same theory to less well-defined problems.
Many real-world problems can be inserted into our games-framework by various degrees of
abstraction.
1
2 Types of Games
Games broadly exist in four distinct categories based on
When we speak of deterministic vs. stochastic games, we’re interested in whether or not the
game contains random elements. Deterministic games, are the ones where any action affects
the game in exactly one well-defined way, for example, chess, go, or tic-tac-toe. Stochastic
games on the other hand involve random elements. Typical random elements include cards
and dice. As such, monopoly, backgammon, bridge, and poker are examples of stochastic
games.
Perfect vs. imperfect information refers to how much each player knows about the game. In
a perfect information game, all players know the complete state of the game. For example in
chess, both players can see the entire board. Battleships is an example of an imperfect infor-
mation game, since each player cannot see the other’s ships.
3 Terminology
For the sake of not confusing things for you, I will mostly keep the same terminology as
AIMA. However, AIMA refers to its players as MAX and MIN, which I find confusing. Instead
I like to simply refer to them as Player A and Player B.
We also need a small amount of tree terminology, specifically I’ll refer to parents and successors.
To illustrate, consider the following tree:
2
root
a b
aa ab ac ba bb
In this tree
• The parent of ”a” is ”root”.
• The successors of ”a” are [”aa”, ”ab, ”ac”].
1 def adversarial_search(s):
2 current_player = TO_MOVE(s)
3 best_action = None
4 best_score = -BIG_NUMBER
5 for a in ACTIONS(s):
6 new_s = RESULT(s, a)
7 score = SCORE(new_s, current_player)
8 if score > best_score:
9 best_score = score
10 best_action = a
11 return a
1 def adversarial_search(s):
2 actions = ACTIONS(s)
3 scores = [SCORE(RESULT(s, a), TO_MOVE(s)) for a in actions]
4 return actions[numpy.argmax(scores)]
Essentially we pretend that we have a function SCORE(𝑠) that tells us how good or bad a
state is. Using this, we simply try each possible action, get the score, and return the action
that gives us the highest score. All algorithms we cover today fit into this framework, each
with their own version of the SCORE(𝑠) function.
3
In order to develop intuition surrounding this MiniMax algorithm, we need to consider the
goal of each player.
Player A wants to maximize UTILITY(𝑠, "PLAYER A").
Conversely, Player B wants to maximize UTILITY(𝑠, "PLAYER B").
Now consider a game where
UTILITY(𝑠, "PLAYER B") = -UTILITY(𝑠, "PLAYER A").
We call this a zero-sum game.
In this setting, Player B has the goal
max UTILITY(𝑠, "PLAYER B") ⟺ min UTILITY(𝑠, "PLAYER A").
In other words Player B wants to minimize the utility for Player A, and vice-versa.
With this we have one maximizing player and one minimizing player, which gives us the
MiniMax algorithm.
Note: not all games are zero-sum games, but the intuition still holds.
With this intuition in mind, the MiniMax algorithm is almost deceptively simple:
It simply “tries” a move, and then pretends to hand the game over to the other player. And
then it goes back and forth with one player maximizing and the other minimizing until the
game is finished. Note that we start the SCORE() function by calling min_player().
This is because we’re trying to find the best move for the current player, so the current player
is always the maximizing player.
Let’s look at an example to really get into how this works. Consider the following game of
tic-tac-toe:
X O
O X X
O
Here it is X’s turn to play, and thus X is the maximizing player. If we expand the MiniMax
tree from this game position we get the following:
4
X O
O X X
Player: X
O
Max: 0
X O X X O X O
O X X O X X O X X
Player: O
O X O X O
Min: 0 Min: 0 Min: 0
X O X X O X X O O X O X O O X O
O X X O X X O X X O X X O X X O X X
Player: X
O O O O X O X O O X O O X O
Max: 0 Max: 1 Max: 0 Max: 1 Max: 0 Max: 0
X O X X O X X O O X O X X O O X O X
O X X O X X O X X O X X O X X O X X
Player: O
O X O X O O X X O X O O X X O O X O
Score: 0 Score: 1 Score: 0 Score: 1 Score: 0 Score: 0
First, note that when the game is over (at the leaf nodes), even though it is technically O’s
turn, we evaluate the UTILITY score for X since X is the maximizing player.
Second, MiniMax thinks it doesn’t matter where the player X puts its next piece (from the
starting state). What we observe here is the fact that MiniMax assumes that both players are
playing optimally. We, as humans, can clearly see that there are two possibilities for X to win,
but because MiniMax assumes that O also plays optimally, it assumes that O will block those
opportunities, which means that X cannot win either way. What’s particularly interesting
about this is that MiniMax only guarantees optimal play against an optimal oponent.
5
6 Search Complexity
For a simple game, like Tic-Tac-Toe, MiniMax is perfectly feasible. In total, there can be a
maximum of 9 moves before the game ends.
In the first iteration, there are 9 possible moves, then 8, then 7, …, finally just 1. As such, the
full search tree is 9! = 362880 nodes to search (technically even fewer, since it stops early
when a player wins). In other words perfectly feasible on a modern computer.
Now there are two major problems with MiniMax. First, consider another variation of Tic-
Tac-Toe, where each player only gets 3 pieces. In this version, each player will first place their
three pieces. When all 6 pieces are on the board, the players will instead move one of their
existing pieces. In this version, the search tree can become infinitely deep which immediately
makes MiniMax unusable.
Second, consider a more complex game like chess. Instead of 9 (or fewer) moves each turn,
there is an estimated 35 possible moves each turn (on average). We say that chess has a
branching factor of ≈ 35. In other words, for each node we explore, it has on average 35
successor nodes. As such, in order to search to a depth 𝐷, we have to look at 35𝐷 nodes,
which quickly becomes too many for even the most powerful supercomputers. We need to be
more efficient!
7 Evaluation Functions
Our first strategy to make MiniMax feasible is to simply stop the search before the game is
over. The easest way to implement this is to stop searching at a fixed depth. For example
we can decide that we search 5 moves ahead, and then we stop. It is also possible to decide
6
when to stop based on some heuristic, for example if you find a game state that looks good
you might want to search a few more moves.
Stopping the search early is an easy way to prevent both our issues. However, this brings us
an entirely new problem – we can no longer rely on UTILITY(𝑠, 𝑝), since the game may
not be done. The solution: an evaluation function, which we’ll call EVAL(𝑠, 𝑝). Which
brings us a modified MiniMax algorithm (called H-MINIMAX in AIMA):
Here I’ve added simple depth-checking and the EVAL(𝑠, 𝑝) function, it is otherwise iden-
tical to the original MiniMax.
An EVAL function preserves the optimality of MiniMax if and only if the EVAL function
preserves ordering of the MiniMax score. More formally, optimality is preserved if and only
if:
Max: ?
Min: 1 Min: 2
1 2 2 4
Here, the numbers represent the MiniMax scores in each node. If we start a MiniMax search
at the root node, it would choose to go to the right. Now, consider this tree with EVAL scores
instead:
7
Max: ?
Min: 1 Min: 20
1 20 20 400
In this tree as well, a MiniMax search would still choose to go to the right, since the ordering
of all the scores is preserved.
I should note, however, that it is usually impossible to design an EVAL function that fulfills
this requirement. In other words, MiniMax with EVAL is typically not optimal.
This discussion on optimality has also given us insight into designing an EVAL function:
there is (usually) no “correct” EVAL function. All we can do, is try our best to emulate the
MiniMax score. The most common strategy is to make a heuristic score that reflects how likely
we think it is for 𝑝 to win given the state 𝑠.
As an example, consider again the simple game of Tic-Tac-Toe. In Tic-Tac-Toe, not all squares
are equal:
3 2 3
2 4 2
3 2 3
Now we can simply sum up the score for each of our pieces, and we’ll have a decent EVAL
function.
1 0 1
0 2 0
1 0 1
and we would get an equivalent EVAL function, since the nodes would still be ordered in the
same way.
Before I finish this section I quickly want to note that most modern EVAL functions are
learned neural networks. Having a neural network learn a good EVAL function takes away
the human error in designing a good function, and typically yields significantly better results
in practice.
8
8 𝛼 − 𝛽 Pruning
Evaluation functions are great and all, but what if we don’t want to compromise optimality?
Is there a way we could apply a full MiniMax search to larger problems? When doing a Min-
iMax search, do we really have to expand all nodes in the game tree?
Max: ?
Min: 3 Min: ?
3 12 8 2 𝑥1 𝑥2
• the empty min-node will have a MiniMax score ≤ 2, since we have already found a
successor with score 2.
• the root node will have a MiniMax score ≥ 3, since we have already found a successor
with score 3.
Max: ≥ 3
Min: 3 Min: ≤ 2
3 12 8 2 𝑥1 𝑥2
These two observations together means that the MiniMax search will never choose to go to
the right, regardless of the value in the 𝑥1 and 𝑥2 nodes. In other words, as soon as we
see the score of 2, we can stop searching that branch – we don’t have to expand the 𝑥1 or 𝑥2
nodes. This is the principle behind 𝛼 − 𝛽 pruning. We use variables 𝛼 and 𝛽 to keep track of
the best and worst scores we have seen during search:
9
1 def SCORE(s, current_player, alpha):
2 return min_player(s, current_player, alpha, BIG_NUMBER)
3
4 def min_player(s, current_player, alpha, beta):
5 if IS_TERMINAL(s): return UTILITY(s, current_player)
6 best_score = BIG_NUMBER
7 for a in ACTIONS(s):
8 best_score = min(best_score, max_player(RESULT(s, a), current_player, alpha, beta))
9 beta = min(beta, best_score)
10 if best_score <= alpha: break
11 return best_score
12
13 def max_player(s, current_player, alpha, beta):
14 if IS_TERMINAL(s): return UTILITY(s, current_player)
15 best_score = -BIG_NUMBER
16 for a in ACTIONS(s):
17 best_score = max(best_score, min_player(RESULT(s, a), current_player, alpha, beta))
18 alpha = max(alpha, best_score)
19 if best_score >= beta: break
20 return best_score
Note that this SCORE function requires the 𝛼-parameter as well, which necessitates a minor
change at the top-level in the adversarial_search() function (simply passing along
best_score in the alpha parameter).
One caveat about 𝛼−𝛽 pruning is that move-ordering is important. Consider again this game
tree:
Max: ?
Min: 3 Min: ?
3 12 8 2 𝑥1 𝑥2
The only reason we can skip the nodes 𝑥1 and 𝑥2 is because we saw the 2-node first. Instead,
consider if the tree was ordered like this:
Max: 3
Min: 3 Min: 2
3 12 8 6 8 2
Here we do have to look at all the nodes, since we don’t discover the 2-node until the very end
of our search. If that 2-node was a 4 (or higher) instead, the min-node would have score=4,
and the search would choose the path to the right. In other words, 𝛼 − 𝛽 pruning does not
guarantee a faster search. However with optimal move-ordering it allows you to search up
to 2× the depth using the same computations as plain MiniMax.
10
9 Monte Carlo Tree Search
Who cares about optimality anyway? While 𝛼−𝛽 pruning can allow us to search larger game
trees, it’s still entirely insufficient for most larger games. What else can we do?
Well, we can decide that we don’t care about looking at all possible nodes. As long as we look
at enough nodes, we’ll still get an idea of how good a move is. This is the idea behind Monte
Carlo search.
Monte Carlo is the name for a broad family of algorithms that rely on random sampling in
order to build consensus. So here, in the adversarial search domain, we’ll rely on random
sapling in order to get an idea of what the next best move is.
Essentially, we perform a number of trial games, and average their respective UTILITY. Each
trial game is played out by simply selecting random moves until the game is over. This will
give us an estimate of how good the state 𝑠 is. The larger we make NUMBER_OF_TRIALS,
the better our estimate will be.
For many games this simple version will result in optimal or near-optimal play. However,
there is a more “official” version of Monte Carlo search that uses some extra tricks:
• It adds a selection policy and playout policy in order to guide the search to focus more
on moves that seem good (while trying to balance exploration/exploitation).
• It “remembers” which nodes it has already looked at, in order to avoid unnecessary
re-computations.
This is the version that most people refer to when talking about Monte Carlo Tree Search, or
MCTS for short.
One major positive about Monte Carlo search is that it is extremely flexible. For example,
AlphaGo – the famous Go-algorithm that beat Lee Sedol – uses a version of Monte Carlo Tree
Search with neural networks to guide the search.
11
10 Lookup
As a final strategy for this lecture, we’ll quickly consider lookups, otherwise known as “Hey,
I’ve seen this before, I know what to do!”.
Lookups can be utilised in various ways though. In chess for example, it is common to hand-
craft lookup tables from well-known openings and endings. For less-studied games it is pos-
sible to simply “remember” the MiniMax score for any given state by storing it in a dictionary.
Stochastic games are easy to deal with in principle. In the game tree, we can represent random
elements using CHANCE nodes:
Max: 3.9
Chance: 3.9 8 2 6 8
0.9 0.1
3 12
Here, the chance node represents a random element with 90% probability of going left, and
10% of going right. The MiniMax score of the chance node is simply the expected value of its
successors (0.9 ⋅ 3 + 0.1 ⋅ 12 = 3.9)
One noteworthy thing about stochastic game trees, is that they impose stricter requirements
on EVAL functions. Recall that in a deterministic game tree, an EVAL function preserves
optimality as long as it preserves ordering. With a stochastic tree, an EVAL function is instead
required to be proportional to the MiniMax score:
12
In practice this doesn’t change anything though, since it’s typically impossible to design an
optimal EVAL function anyway.
Imperfect information is much harder to deal with. Consider a game of poker, where we don’t
know the opponent’s hand. One way to handle such a scenario is to run MiniMax for each
possible combination of cards, but that quickly becomes incredibly computationally intensive.
Another strategy is to encode belief : “Given the cards on the table, how likely do I think it is
that I win?”.
13