COE206 L5 Adversarial Search
COE206 L5 Adversarial Search
Intelligence
Mustafa MISIR
http://mustafamisir.github.io
http://memoryrlab.github.io
L5: Adversarial Search
Game Playing 1
1
MIT 6.034 Artificial Intelligence (Fall 2010) - Search: Games, Minimax, and Alpha-Beta - https://www.youtube.com/watch?v=STjW3eH0Cik
1 / 57
Outline
I Formal Definition
I Optimal Decision in Games (MiniMax)
I Alpha-Beta Pruning
I Stochastic Games
I Partially Observable Games
2 / 57
Games
3 / 57
Games – Utility
4 / 57
Games – Types
5 / 57
Games – Types, e.g.
6 / 57
Games AI – History 2
2
image source: https://www.andreykurenkov.com/writing/ai/a- brief- history- of- game- ai/
7 / 57
Adversarial Search 3
In competitive environments, in
which the agents’ goals are in con-
flict, giving rise to adversarial search
problems — often known as games.
3
involving or characterized by conflict or opposition
8 / 57
Games – Game Tree, e.g. Tic Tac Toe 4
4
A (partial) game tree for the game of tic-tac-toe. The top node is the initial state, and MAX moves first, placing an X in an empty square. We show part of the
tree, giving alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.
9 / 57
Games – Game Tree, e.g. Tic Tac Toe
10 / 57
Optimal Decisions in Games
In adversarial search,
I MIN has something to say about it.
I MAX must find a strategy, which specifies MAX’s move in
the initial state, then MAX’s moves in the states resulting
from every possible response by MIN, then MAX’s moves in
the states resulting from every possible response by MIN to
those moves, and so on.
11 / 57
Optimal Decisions in Games – MiniMax 5
I The possible moves for MAX at the root node are labeled a1 ,
a2 , and a3
I The possible replies to a1 for MIN are b1 , b2 , b3 , and so on.
I The utilities of the terminal states range from 2 to 14.
5
Even a simple game like tic-tac-toe is too complex to draw the entire game tree on one page, so we will switch to the trivial game – The 4 nodes are MAX
nodes, in which it is MAX’s turn to move, and the 5 nodes are MIN nodes. The terminal nodes show the utility values for MAX; the other nodes are labeled with their
minimax values. MAX’s best move at the root is a1 , because it leads to the state with the highest minimax value, and MIN’s best reply is b1 , because it leads to the state
with the lowest minimax value.
12 / 57
Optimal Decisions in Games – MiniMax
13 / 57
Optimal Decisions in Games – MiniMax 6
6
returns the action corresponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the
opponent plays to minimize utility.
14 / 57
Optimal Decisions in Games – MiniMax
7
In two-player, zero-sum games, the two-element vector can be reduced to a single value because the values are always opposite.
8
a game tree with three players (A, B, C). Each node is labeled with values from the viewpoint of each player. The best move is marked at the root.
16 / 57
Optimal Decisions in Games – MiniMax, e.g.
17 / 57
Optimal Decisions in Games – Alliances
18 / 57
Optimal Decisions in Games – Alliances
If the game is not zero-sum, then collaboration can also occur
with just two players.
I e.g. there is a terminal state with utilities
hvA = 1000, vB = 1000i and that 1000 is the highest possible
utility for each player.
I the optimal strategy is for both players to do everything
possible to reach this state—that is, the players will
automatically cooperate to achieve a mutually desirable goal.
19 / 57
MiniMax – Properties
Performs a complete depth-first exploration of the game tree
I Completeness : Yes 9
9
Is the algorithm guaranteed to find a solution when there is one?
10
How long does it take to find a solution?
11
How much memory is needed to perform the search?
12
Does the strategy find the optimal solution?
20 / 57
Alpha–Beta Pruning – Improving MiniMax 14
consideration:
I prunes away branches that cannot possibly influence the final
decision
13
selective removal of certain parts of a plant, such as branches, buds, or roots
14
https://en.wikipedia.org/wiki/Alpha- beta_pruning
21 / 57
Alpha–Beta Pruning 15
15
can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves
22 / 57
Alpha–Beta Pruning
23 / 57
Alpha–Beta Pruning 16
16
these routines are the same as the MINIMAX functions, except for the two lines in each of MIN-VALUE and MAX-VALUE that maintain α and β (and the
bookkeeping to pass these parameters along).
24 / 57
Alpha–Beta Pruning
25 / 57
Alpha–Beta Pruning
26 / 57
Alpha–Beta Pruning, e.g. 17
17
Alpha–Beta Pruning example by John Levine (U. Strathclyde): https://www.youtube.com/watch?v=zp3VMe0Jpf8
27 / 57
Alpha–Beta Pruning – Properties
18
How long does it take to find a solution?
28 / 57
Imperfect Real-Time Decisions
29 / 57
Imperfect Real-Time Decisions – Evaluation Functions
Suggestion: cut off the search earlier and apply a heuristic
evaluation function to states in the search, effectively turning
nonterminal nodes into terminal leaves.
30 / 57
Imperfect Real-Time Decisions – Evaluation Functions
An evaluation function returns an estimate of the expected
utility of the game from a given position, just as the heuristic
functions discussed before, i.e. estimate of the distance to the
goal.
31 / 57
Imperfect Real-Time Decisions – Evaluation Functions
32 / 57
Imperfect Real-Time Decisions – Cutting Off Search
33 / 57
Imperfect Real-Time Decisions – Forward Pruning
19
https://en.wikipedia.org/wiki/Beam_search
34 / 57
Stochastic Games
35 / 57
Stochastic Games
Although White knows what his or her own legal moves are, White
does not know what Black is going to roll and thus does not know
what Black’s legal moves will be.
I means that White cannot construct a standard game tree of
the sort like in tic-tac-toe.
There are 36 ways to roll two dice, each equally likely; but because
a 6–5 is the same as a 5–6, there are only 21 distinct rolls.
36 / 57
Stochastic Games 20
20
Chance nodes are shown as circles. The branches leading from each chance node denote the possible dice rolls; each branch is labeled with the roll and its
probability.
37 / 57
Stochastic Games
Still want to pick the move that leads to the best position without
having definite minimax values.
I Instead, we can only calculate the expected value of a
position: the average over all possible outcomes of the
chance nodes.
I Leads us to generalize the minimax value for deterministic
games to an expecti-minimax value for games with chance
nodes.
38 / 57
Stochastic Games – Coin Flipping, e.g.
39 / 57
Stochastic Games – Evaluation Functions
The presence of chance nodes means that one has to be more
careful about what the evaluation values mean.
I with an evaluation function that assigns the values [1, 2, 3,
4] to the leaves move a1 is best;
I with values [1, 20, 30, 400], move a2 is best
the program behaves totally differently if we make a change in the
scale of some evaluation values!
40 / 57
Partially Observable Games
In deterministic partially observable games, uncertainty about
the state of the board arises entirely from lack of access to the
choices made by the opponent
I e.g. the game of Kriegspiel, a partially observable variant of
chess in which pieces can move but are completely invisible to
the opponent.
41 / 57
Partially Observable Games – Card Games
Card games provide many examples of stochastic partial
observability, where the missing information is generated
randomly.
I e.g. in many games, cards are dealt randomly at the
beginning of the game, with each player receiving a hand that
is not visible to the other players – e.g. bridge, whist, hearts,
and some forms of poker.
42 / 57