Lecture 13: Adversarial Search
Algorithms or Game-Playing Algorithms
Key Points from Last
Lecture
Outline of today’s lecture
• Games
• Types of Games
• Mini Max
• α-β pruning
• Stochastic Games
Game Playing State-of-the-Art
• Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used an endgame database defining perfect play for all
positions involving 8 or fewer pieces on the board, a total of
443,748,401,247 positions. Checkers is now solved!
• Chess: Deep Blue defeated human world champion Gary Kasparov in a six-
game match in 1997. Deep Blue examined 200 million positions per
second, used very sophisticated evaluation and undisclosed methods for
extending some lines of search up to 40 ply. Current programs are even
better, if less historic.
• Othello (also known as Reversi): Human champions refuse to compete
against computers, which are too good.
• Go: Human champions are beginning to be challenged by machines, though
the best humans still beat the best machines. In go, b > 300, so most
programs use pattern knowledge bases to suggest plausible moves, along
with aggressive pruning.
Game Search
• Game-playing programs developed by AI researchers since
the beginning of the modern AI era (chess, checkers in
1950s)
• Game Search
– Sequences of player’s decisions we control
– Decision of other player(s) we do not control
• Contingency problem: many possible opponent’s moves
must be “covered” by the solution
– Introduces uncertainty to the game since we do not know what the
opponent will do
• Rational opponent: maximizes its own utility function
Types of Game Problems
• Adversarial
– Win of one player is a loss of the other
– Focus of this course
• Cooperative
– Players have common interests and utility
function
• A spectrum of others in between
Typical AI “Games”:
• Deterministic and Fully Observable
Environment
• Two agents with turn-taking for actions
• Zero-sum (adverserial)
• Abstract (robotic soccer notable exception)
– state easy to represent, few action choices,
well-defined goals
– hard to solve
Types of Games
Deterministic Chance
Perfect Tic Tac Toe, Backgammon
Information Chess
Imperfect Stratego Poker,
information Bridge
Deterministic Single-Player
• Deterministic, single player,
perfect information:
– Know the rules
– Know what actions do
– Know when you win
– E.g. Freecell, 8-Puzzle, Rubik’s
cube
• … it’s just search!
• Slight reinterpretation:
– Each node stores a value: the
best outcome it can reach
– This is the maximal outcome of
its children (the max value)
– Note that we don’t have path
sums as before (utilities at end)
• After search, can pick move that
leads to best node lose win lose
8
Deterministic Two-Player
• E.g. tic-tac-toe, chess,
checkers
• Zero-sum games max
– One player maximizes result
– The other minimizes result
min
• Minimax search
– A state-space search tree
– Players alternate
– Choose move to position with 8 2 5 6
highest minimax value = best
achievable utility against best
play
9
Game Search
• Problem Formulation
– Initial state: initial board position + information about
whose move it is
– Successors: legal moves a player can make
– Goal (terminal test): determines when the game is over
– Utility function: measures the outcome of the game and
its desirability
• Search objective
– Find the sequence of player’s decisions (moves)
maximizing its utility
– Consider the opponent’s moves and their utility
Game Tree
• Initial State and Legal Moves for Each
Side
Game Tree
(2-player, deterministic, turns)
Game Tree
(2-player, deterministic, turns)
• MAX and MIN are the 2 players
• MAX goes first
• Players then take turns
Game Tree
(2-player, deterministic, turns)
• MAX has 9 possible legal first
moves (ignoring symmetry)
Game Tree
(2-player, deterministic, turns)
• Utility of terminal states (when
game is over) is from MAX’s point
of view
• Points are awarded to both
players at the end of the game
• -1 is a loss
• 0 is a draw
• 1 is a win
Minimax Algorithm
• How do we deal with the contingency
problem?
– Assuming that the opponent is rational and
always optimizes its behavior (opposite to us), we
consider the opponent’s best response
– Then the minimax algorithm determines the best
move
Minimax
• Finds an optimal (contingent) strategy, assuming perfect play
for deterministic games
• Idea: choose move to position with highest MINIMAX VALUE
= best achievable payoff against best play
• MINIMAX-VALUE (n)
– UTILITY (n) if n is a terminal state
– max_s MINIMAX-VALUE (s) if n is a MAX node
– min_s MINIMAX-VALUE (s) if n is a MIN node
(where s is an element of the successors of n)
Minimax Example
18
Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration)
• For chess, b ≈ 35, m ≈100 for "reasonable" games
→ exact solution completely infeasible
• Do we really need to explore every path???
Solutions to the Complexity Problem
• Dynamic pruning of redundant branches of the
search tree
– Some branches will never be played by rational players
since they include sub-optimal decisions (for either player)
• Identify a provably suboptimal branch of the search tree before it is
fully explored
• Eliminate the suboptimal branch
– Procedure: Alpha-Beta Pruning
• Early cutoff of the search tree
– Use imperfect minimax value estimate of non-terminal
states
Multiplayer Games
• Many popular games allow more than two players.
• First, we need to replace the single value for each node with a vector of
values.
For example, in a three-player game with players A, B, and C , a vector
⟨vA,vB,vC⟩ is associated with each node.
• For terminal states, this vector gives the utility of the state from each player’s
viewpoint.
• (In two-player, zero-sum games, the two-element vector can be reduced to a
single value because the values are always opposite.)
• The simplest way to implement this is to have the UTILITY function return a
vector of utilities.
Multiplayer Games
• Each node must hold a vector of values
For example, for three players A, B, C (VA, VB, VC)
• The backed-up vector at node n will always be the one that maximizes the
payoff of the player choosing at n