Lecture 5 - Adversal Search
Lecture 5 - Adversal Search
◉ https://int8.io/monte-carlo-tree-search-beginners-guide/
3 Plan
◉ Game AI
◉ Alpha-Beta Pruning
◉ https://www.ibm.com/ibm/history/
ibm100/us/en/icons/deepblue/
7 AI for Go
◉ Go originated in China over 3,000 years ago. Winning this
board game requires multiple layers of strategic thinking.
https://deepmind.com/research/case-studies/
8 More Games…
◉ Poker AI: Libratus (CMU, 2017), Pluribus (CMU, 2019), DeepStack (University of Alberta)
…
◉ Axes:
• Deterministic or stochastic?
• One, two, or more players?
• Zero sum?
• Perfect information (can you see the
state)?
• For a MAX node, the backed-up value is the maximum of the value of its children (i.e.,
the best for MAX)
• For a MIN node, the backed-up value is the minimum of the values of its children
(i.e. the best for MIN)
•
1
8
The Minimax Procedure
◉ Time complexity: As it performs DFS for the game-tree, so the time complexity of Minimax
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.
◉ Space complexity: Space complexity of Mini-max algorithm is also similar to DFS which is
O(bm).
2
2
What if MIN does not play optimally?
◉ Definition of optimal play for MAX assumes
MIN plays optimally:
• Maximizes worst-case outcome for MAX.
• (Classic game theoretic strategy)
◉ But if MIN does not play optimally, what
will happen?
2
3
What if MIN does not play optimally?
◉ MAX will do even better.
◉ Basic idea: “If you have an idea which is surely bad, don’t take the time to see how truly
awful it is” ~ Pat Winston
3
1
Alpha-Beta Pruning
◉ During Minimax, keep track of two additional values:
• α: MAX’s current lower bound on MAX’s outcome
• β: MIN’s current upper bound on MIN’s outcome
◉ MAX will never allow a move that could lead to a worse score (for MAX) than α
◉ MIN will never allow a move that could lead to a better score (for MAX) than β
◉ Initially, α = - , β=
◉ As the search tree is traversed, the possible utility value window shrinks as α increases,
β
decreases
◉ Whenever the current ranges of alpha and beta no longer overlap, it is clear that the
current node is a dead end
3
3
When to Prune
◉ Prune below a Max node when its value becomes ≥ the β value of its ancestors.
• — Max nodes update based on children’s returned values.
• — Idea: Player MIN at node above won’t pick that value anyway, he can force a worse value.
◉ Prune below a Min node when its β value becomes ≤ the value of its ancestors.
• — Min nodes update β based on children’s returned values.
• — Idea: Player MAX at node above won’t pick that value anyway; she can do better.
3
4
Example
◉ Max player will start first move from node A where α= -∞ and β= +∞, these value of alpha
and beta passed down from A to B, then B to D.
3
5
Example
◉ At Node D, the value of α will be calculated as its turn for Max: max (2, 3) = 3 will be the value of α at
node D and node value will also 3. Now algorithm backtrack to node B, where the value of β will change
as this is a turn of Min: β= min (∞, 3) = 3, hence at node B, α= -∞ and β= 3.
◉ In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α=
-∞, and β= 3 will also be passed.
3
6
Example
◉ At node E, Max will take its turn: α= max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
3
7
Example
◉ Next step, algorithm backtrack the tree, from node B to node A. At node A: α=max (-∞, 3)=
3, β= +∞; then pass to Node C. At node C, α=3 and β= +∞, then passed to node F. At node F:
compare to left and right child, α remains 3. The node value of F will become 1.
3
8
Example
◉ Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta will
be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, α>=β, so the
next child of C which is G will be pruned. The algorithm will not compute the entire sub-tree
G.
3
9
Example
◉ C now returns the value of 1 to A. The best value for A is max(3, 1) = 3. Hence the
optimal value for the maximizer is 3 for this example.
4
0
Alpha-Beta Implementation
4
1
Move Ordering in Alpha-Beta pruning
◉ The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is
examined.
◉ Worst ordering:
• In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the tree, and
works exactly as minimax algorithm.
• In this case, the best move occurs on the right side of the tree. The time complexity for such an
order is O(bm).
◉ Ideal ordering:
• The ideal ordering for alpha-beta pruning occurs when best moves occur at the left side of the
tree.
• We apply DFS hence it first search left of the tree and go deep twice as minimax algorithm in
the same amount of time. Complexity in ideal ordering is O(b m2 ) (Best-Case Analysis of
Beta Pruning).
Alpha-
4
2
Test Example…
5 6
3 4 1 2 7 8
4
3
Test Example…
-which nodes can be pruned?
Max
Min
Max
5 6
3 4 1 2 7 8
Answer: NONE! Because the most favorable nodes for both are
explored last (i.e., in the diagram, are on the right-hand side).
4
4
Test Example 2…
example) -which nodes can be pruned?
3 4
6 5 8 7 2 1
4
5
Test Example 2…
example) Max -which nodes can be pruned?
Min
Max
6 5 8 7 2 1 3 4
Answer: LOTS! Because the most favorable nodes for both are
explored first (i.e., in the diagram, are on the left-hand side).
4
6
Resource Limits
◉ Problem: In realistic games, cannot search to leaves!
◉ Example:
• Suppose we have 100 seconds, can explore 10K nodes /
sec
• So can check 1M nodes per move
• α-β reaches about depth 8 – decent chess program
◉ Guarantee of optimal play is gone
https://www.alphagomovie.com/
5
1
5
2
AlphaZero, MuZero, and More…
◉ From a helicopter view Monte Carlo Tree Search has one main purpose: given a game state
to choose the most promising next move.
5
5
Monte Carlo Tree Search
1. Selection
Start from root R and select successive child nodes until a leaf node L is reached. The root is the current
game state and a leaf is any node that has a potential child from which no simulation (playout) has yet
been initiated.
2. Expansion
Unless L ends the game decisively (e.g. win/loss/draw) for either player, create one (or more) child
nodes and choose node C from one of them. Child nodes are any valid moves from the game position
defined by L.
3. Simulation
Complete one random playout from node C. This step is sometimes also called playout or rollout. A
playout may be as simple as choosing uniform random moves until the game is decided (for example in
chess, the game is won, lost, or drawn).
4. Backpropagation
Use the result of the playout to update information in the nodes on the path from C to R.
5
6
Monte Carlo Tree Search
5
7
Choosing the Most Promising Move: Monte Carlo
◉ In Monte Carlo Tree Search algorithm, the most
promising move is computed in a sightly different fashion.
◉ For every node on the backpropagation path certain statistics are computed/updated
6
1
Node’s Statistics
◉ Back-propagating updates the total simulation
reward Q(v) and total number of visits N(v)
for all nodes v on backpropagation path:
• Q(v) – Total simulation reward, e.g., sum
of simulation results that passed through
considered node.
• N(v) – Total number of visits, i.e.,
how many times a node has been on
the backpropagation path
◉ Upper Confidence Bound applied to trees (UCT) is a function that lets us choose the next
node among visited nodes to traverse through – the core function of Monte Carlo Tree
Search
Exploitation Exploration
◉ The general idea of simulating moves into the future, observing the outcome, and using the
outcome to determine which moves are good ones is one kind of reinforcement learning (we
will cover in the future lectures).
Search with Uncertainty
6
7
Stochastic Games
◉ What if we don’t know what the result of an action will be? E.g.,
• In solitaire, shuffle is unknown
◉ Quiz: what would pacman’s computation look like if we assumed that the ghosts were doing
1- ply minimax and taking the result 80% of the time, otherwise moving randomly?
7
9
Expectimax for Pacman
8
0
Stochastic Two-Player: Backgammon
◉ The goal of the game is to move all one’s pieces
off the board.
• Black moves clockwise toward 25, and
White moves counterclockwise toward 0.
• A piece can move to any position unless
multiple opponent pieces are there; if there is
one opponent, it is captured and must
start over.
◉ Generalization of minimax:
• Terminals have utility tuples
• Node values are also utility tuples
• Each player maximizes its own
component
• Can give rise to cooperation and
competition
dynamically…
8
4
Utilities
◉ Utilities are functions from outcomes (states of the world) to real numbers that
describe an agent’s preferences
◉ For average-case expectimax reasoning, we need magnitudes to be meaningful (we’ll talk more about
utilities in the future)
8
7
Summary
◉ In two-player, discrete, deterministic, turn-taking zero-sum games with perfect information, the minimax
algorithm can select optimal moves by a depth-first enumeration of the game tree.
◉ The alpha–beta search algorithm computes the same optimal move as minimax, but achieves much
greater efficiency by eliminating subtrees that are provably irrelevant.
◉ Usually, it is not feasible to consider the whole game tree (even with alpha–beta), so we need to cut the
search off at some point and apply a heuristic evaluation function that estimates the utility of a state.
◉ An alternative called Monte Carlo tree search (MCTS) evaluates states not by applying a heuristic
function, but by playing out the game all the way to the end and using the rules of the game to see who
won. Since the moves chosen during the playout may not have been optimal moves, the process is
repeated multiple times and the evaluation is an average of the results.
◉ Games of chance can be handled by expectiminimax, an extension to the minimax algorithm that
evaluates a chance node by taking the average utility of all its children, weighted by the probability of
each child.
Thanks!
Q&A
88