2024 Mth058 Lecture06 Mcts
2024 Mth058 Lecture06 Mcts
TREE SEARCH
2
Monte Carlo tree search
Game tree and search strategies
• A game tree represents a hierarchy of moves in a game.
• Each node of a game tree represents a particular state in a game.
• A move makes a transition from a node to its successors.
4
Game tree and search strategies
• Many AI problems can be cast as search problems, which
are solved by finding the best plan, model or function.
• A search algorithm finds the best path to win the game.
10
MCTS: Expansion step
2. Expansion: expand the tree by adding one (or more)
children, according to the available actions.
𝑠′
𝑡𝑙
11
MCTS: Simulation step
3. Simulation: run a simulated play from the new node(s)
using the default policy to produce an outcome.
the action 𝑎 that leads to the best child of the root node 𝑣0
14
MCTS: Child selection
• An action 𝑎 of the root 𝑡0 is selected by some mechanism.
• Max child: select the root child with the highest reward.
• Robust child: select the most visited root child.
• Max-Robust child: select the root child with both the highest
visit count and the highest reward.
• If none exist, search until an acceptable visit count is achieved.
• Secure child: select the child which maximizes a lower
confidence bound.
15
MCTS: Applications
18
Roll out
• Randomly pick an action at
each step and simulate the
action to receive an average
reward when the game ends
Loop:
if Si is a terminal state:
return Value(Si)
Ai = random(available_actions(Si))
Si = Simulate(Si, Ai)
Until a terminal state is reached. 19
An example: Iteration 1
• Let’s start with an initial state 𝑆0 .
• The actions, 𝑎1 and 𝑎2 , lead to
states 𝑆1 and 𝑆2 , each has the total
score 𝑡 and the number of visits 𝑛.
Initial state
20
An example: Iteration 1
• The leaf node taken has not been
visited before.
→ Do a rollout all the way down
to the terminal state.
• Let’s say the value of this rollout
is 20 (i.e., just an example)
Rollout from S1
21
An example: Iteration 1
• The value 20 is backed up all the way to the root.
• So now, 𝑡 = 20 and 𝑛 = 1 for nodes 𝑆1 and 𝑆0 .
• That’s the end of the first iteration
After
backpropagation
22
An example: Iteration 2
𝑙𝑛2 𝑙𝑛2
𝑈𝐶𝐵1 = 20 + 2 𝑈𝐶𝐵1 = 10 + 2
1 1
= 21.67 = 11.67
• A rollout is done till the leaf node to get the value and backpropagate.
25
A deeper insight
into MCTS
Simulation / Playout
• Simulation (or playout) is a sequence of moves that starts in
a current node and ends in a terminal node.
27
Simulation / Playout
• During the simulation, the rollout policy function consumes a
game state 𝒔𝒊 and produces the next move/action 𝒂𝒊 .
• The default rollout policy function is a uniform random.
• In practice it is designed to be fast to allow many simulations being
played quickly.
28
Simulation in AlphaGo and AlphaZero
• In AlphaGo, the evaluation of the leaf 𝑆𝐿 is defined a
𝑉(𝑆𝐿 ) = (1 − 𝛼)𝑣0 (𝑆𝐿 ) + 𝛼𝑧𝐿
• 𝑧𝐿 : a standard rollout evaluation with custom fast rollout policy,
which is a shallow softmax neural network with handcrafted features.
• 𝑣0 : a Value Network that evaluates positions by a 13-layer CNN,
trained on 30mils distinct positions extracted from self-plays.
29
Node expansion on the game tree
• A node is considered visited if a playout has been started in
that node, i.e., has been evaluated at least once.
30
Backpropagation
• Backpropagation carries simulation results up to the root.
• For every node on the path, certain statistics are updated.
31
Statistics in a node 𝑣
• 𝑄 𝑣 : the total simulation reward
• In a simplest form, it is a sum of simulation results that passed
through considered node.
• 𝑁 𝑣 : the total number of visits
• That is, a counter of how many times a node has been on the
backpropagation path.
32
Exploration – Exploitation Dilemma
𝑸 𝒗𝒊 𝒍𝒏 𝑵(𝒗)
𝕌ℂ𝕋 𝒗𝒊 , 𝒗 = +𝑪
𝑵 𝒗𝒊 𝑵 𝒗𝒊
• 𝐶 is a tunable bias parameter (𝐶 = 1/ 2 for rewards in [0,1]).
34
35
def monte_carlo_tree_search(root):
while resources_left(time, computational power):
leaf = traverse(root) # leaf = unvisited node
simulation_result = rollout(leaf)
backpropagate(leaf, simulation_result)
return best_child(root)
def traverse(node):
while fully_expanded(node):
node = best_uct(node)
return pick_univisted(node.children) or node # in case no
children are present / node is terminal
def rollout(node):
while non_terminal(node):
node = rollout_policy(node)
return result(node)
MCTS
def rollout_policy(node):
return pick_random(node.children)
pseudo-code
def backpropagate(node, result):
if is_root(node) return
node.stats = update_stats(node, result)
backpropagate(node.parent)
def best_child(node):
pick child with highest number of visits
36
List of references
• Browne, Cameron B., et al. "A survey of monte carlo tree search
methods." IEEE Transactions on Computational Intelligence and AI in
games 4.1 (2012): 1-43.
• mctspy : python implementation of Monte Carlo Tree Search algorithm
(link)
• Introduction to Monte Carlo Tree Search: The Game-Changing Algorithm
behind DeepMind’s AlphaGo (link)
• Monte Carlo tree search: beginners guide (link)
• Bruno Bouzy. Monte-Carlo Tree Search (MCTS) for Computer GO.
Lecture notes for AOA class, Université Paris Descartes (link)
37
38