Game-Playing & Adversarial Search
Game-Playing & Adversarial Search
Game-Playing & Adversarial Search
(Please read lecture topic material before and after each lecture on that topic)
Overview
• Alpha-Beta Pruning
– The fact of the adversary leads to an advantage in search!
• Practical Considerations
– Redundant path elimination, look-up tables, etc.
• Expectiminimax (5.5)
Types of Games
battleship
Kriegspiel
• Utility values for each agent are the opposite of the other
– This creates the adversarial situation
• Compare to, e.g., “Prisoner’s Dilemma” (p. 666-668, R&N 3rd ed.)
– “Deterministic, NON-turn-taking, NON-zero-sum game of IMperfect
information”
Game tree (2-player, deterministic, turns)
• Search – no adversary
– Solution is (heuristic) method for finding goal
– Heuristics and CSP techniques can find optimal solution
– Evaluation function: estimate of cost from start to goal through given node
– Examples: path planning, scheduling activities
• Games – adversary
– Solution is strategy
• strategy specifies move for every possible opponent reply.
– Time limits force an approximate solution
– Evaluation function: evaluate “goodness” of game position
– Examples: chess, checkers, Othello, backgammon
Games as Search
• MAX moves first and they take turns until the game is over
– Winner gets reward, loser gets penalty.
– “Zero sum” means the sum of the reward and the penalty is a constant.
Designed to find the optimal strategy for Max and find best move:
Minimax maximizes the utility for the worst-case outcome for max
• Complete?
– Yes (if tree is finite).
• Optimal?
– Yes (against an optimal opponent).
• No. (Why not?)
• Time complexity?
– O(bm)
• Space complexity?
– O(bm) (depth-first search, generate all actions at once)
– O(m) (backtracking search, generate actions one at a time)
• Tic-Tac-Toe
– b ≈ 5 legal actions per state on average, total of 9 plies in game.
• “ply” = one action by one player, “move” = two plies.
– 59 = 1,953,125
– 9! = 362,880 (Computer goes first)
– 8! = 40,320 (Computer goes second)
exact solution quite reasonable
• Chess
– b ≈ 35 (approximate average branching factor)
– d ≈ 100 (depth of game tree for “typical” game)
– bd ≈ 35100 ≈ 10154 nodes!!
exact solution completely infeasible
• An Evaluation Function:
– Estimates how good the current board configuration is for a player.
– Typically, evaluate how good it is for the player, how good it is for the
opponent, then subtract the opponent’s score from the player’s.
– Often called “static” because it is called on a static board position.
– Othello: Number of white pieces - Number of black pieces
– Chess: Value of all white pieces - Value of all black pieces
Backup Values
Another Alpha-Beta Example
(−∞,+∞)
(−∞, +∞)
Alpha-Beta Example (continued)
(−∞,+∞)
(−∞,3]
Alpha-Beta Example (continued)
(−∞,+∞)
(−∞,3]
Alpha-Beta Example (continued)
[3,+∞)
[3,3]
Alpha-Beta Example (continued)
[3,+∞)
This node is
worse for MAX
[3,3] (−∞,2]
Alpha-Beta Example (continued)
[3,14] ,
[3,5] ,
[3,3]
[3,3]
• Prune whenever ≥ .
– Prune below a Max node whose alpha value becomes greater than
or equal to the beta value of its ancestors.
• Max nodes update alpha based on children’s returned values.
– Prune below a Min node whose beta value becomes less than or
equal to the alpha value of its ancestors.
• Min nodes update beta based on children’s returned values.
Alpha-Beta Example Revisited
, , passed to kids
=−
=+
Alpha-Beta Example (continued)
=−
=+
=−
=3
MIN updates , based on kids
Alpha-Beta Example (continued)
=−
=+
=−
=3
MIN updates , based on kids.
No change.
Alpha-Beta Example (continued)
3 is returned
as node value.
Alpha-Beta Example (continued)
=3
=+
, , passed to kids
=3
=+
Alpha-Beta Example (continued)
=3
=+
MIN updates ,
based on kids.
=3
=2
Alpha-Beta Example (continued)
=3
=+
=3 ≥ ,
=2 so prune.
Alpha-Beta Example (continued)
=3
=+ ,
, , passed to kids
=3
=+
Alpha-Beta Example (continued)
=3
=+ ,
MIN updates ,
based on kids.
=3
=14
Alpha-Beta Example (continued)
=3
=+ ,
MIN updates ,
based on kids.
=3
=5
Alpha-Beta Example (continued)
=3
2 is returned
=+ as node value.
2
Alpha-Beta Example (continued)
2
Effectiveness of Alpha-Beta Search
• Worst-Case
– branches are ordered so that no pruning takes place. In this case
alpha-beta gives no improvement over exhaustive search
• Best-Case
– each player’s best move is the left-most child (i.e., evaluated first)
– in practice, performance is closer to best rather than worst-case
– E.g., sort moves by the remembered move values found last time.
– E.g., expand captures first, then threats, then forward moves, etc.
– E.g., run Iterative Deepening search, sort by value last iteration.
3 4 1 2 7 8 5 6
Answer to Example
Min
Max
3 4 1 2 7 8 5 6
Answer: NONE! Because the most favorable nodes for both are
explored last (i.e., in the diagram, are on the right-hand side).
Second Example
(the exact mirror image of the first example)
6 5 8 7 2 1 3 4
Answer to Second Example
(the exact mirror image of the first example)
Min
Max
6 5 8 7 2 1 3 4
Answer: LOTS! Because the most favorable nodes for both are
explored first (i.e., in the diagram, are on the left-hand side).
Iterative (Progressive) Deepening
• Example:
1. P-QR4 P-QR4; 2. P-KR4 P-KR4
leads to the same position as
1. P-QR4 P-KR4; 2. P-KR4 P-QR4
The State of Play
• Checkers:
– Chinook ended 40-year-reign of human world champion Marion Tinsley
in 1994.
• Chess:
– Deep Blue defeated human world champion Garry Kasparov in a six-
game match in 1997.
• Othello:
– human champions refuse to compete against computers: they are too
good.
• Go:
– human champions refuse to compete against computers: they are too
bad
– b > 300 (!)