100% found this document useful (1 vote)

59 views38 pages

2024 Mth058 Lecture06 Mcts

Monte Carlo tree search (MCTS) is a heuristic search algorithm that combines tree search and random sampling to find optimal decisions. MCTS iteratively builds a search tree through selection, expansion, simulation, and backpropagation steps within a computational budget. At each iteration, it selects a node for expansion, simulates a random playout from that node to a terminal state, and backs up the result to update node values. This guides the tree growth towards more promising areas of the search space. MCTS has been successfully applied to many complex games and planning problems.

Uploaded by

Mark Mystery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

59 views38 pages

2024 Mth058 Lecture06 Mcts

Uploaded by

Mark Mystery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

MONTE CARLO

TREE SEARCH

Nguyễn Ngọc Thảo

[email protected]
Outline
• An introduction to MCTS
• A complete walkthrough with example
• A deeper insight to MCTS

2
Monte Carlo tree search
Game tree and search strategies
• A game tree represents a hierarchy of moves in a game.
• Each node of a game tree represents a particular state in a game.
• A move makes a transition from a node to its successors.

4
Game tree and search strategies
• Many AI problems can be cast as search problems, which
are solved by finding the best plan, model or function.
• A search algorithm finds the best path to win the game.

Minimax search for two-

players adversarial games

Best-first search for

single-player games
5
Minimax and Expectimax
• Minimax explores all the nodes available, which is infeasible
for complex games in a finite amount of time.
• It is not for imperfect information and stochastic games either.

• Expectimax generalizes minimax to stochastic games.

• The pruning is harder because of chances nodes.
6
Monte Carlo tree search (MCTS)
• MCTS is a heuristic search method that links the precision of
tree search with the generality of random sampling.
• It finds optimal decisions in a domain by taking random
samples in the decision space to grow the search tree.

The basic MCTS process: A tree is built in

an incremental and asymmetric manner.
7
MCTS: Fundamental idea
• MCTS progressively builds a partial game tree, guided by
the results of previous exploration on the same tree.

The longer the tree grows, the more

accurate the estimates become.

The tree grows in an asymmetric

manner, heading towards more
promising moves.

• The tree is used estimate the values of moves via random

simulation → adjust the policy towards a best-first strategy.
8
MTCS: Steps in an iteration
• MCTS iteratively grows a search tree within some predefined
computational budget (e.g., time, memory, or iteration).

Node: a state of the domain

Each node contains statistics describing at Link: an action leading to a
least a reward value and number of visits subsequent state
9
MCTS: Selection step
1. Selection: recursively apply a child selection policy (tree
policy), from the root till the most urgent expandable node.
𝑡0

Starting at the root node 𝑡0 ,

we reach the node 𝑡𝑛 .
𝑡𝑛

10
MCTS: Expansion step
2. Expansion: expand the tree by adding one (or more)
children, according to the available actions.

An unvisited action 𝑎 from state

𝑠 is selected and a new leaf
𝑠
node 𝑡𝑙 is added to the tree.

𝑠′
𝑡𝑙

11
MCTS: Simulation step
3. Simulation: run a simulated play from the new node(s)
using the default policy to produce an outcome.

A simulation is run from the node 𝑡𝑙 to produce a

reward value ∆, which may be
• A discrete (win/draw/loss) result or continuous
reward value for simple domains
• A vector of reward values relative to each agent
p for more complex multiagent domains.
12
MCTS: Backpropagation step
4. Backpropagation: back up the simulation result through the
selected nodes to update their statistics.

The reward value ∆ is backed up to

update the nodes along the path.

For each node, its visit count is

incremented, and its average reward
(or Q-value) updated according to ∆.

• As soon as the computation budget is reached or there is an

interruption, the search terminates.
13
• Tree policy: select or create a leaf node from the
nodes in tree (selection and expansion).
Monte Carlo tree
• Default policy: Play out the domain from a given
search (MCTS) non-terminal state to produce a value estimate
(simulation).

the action 𝑎 that leads to the best child of the root node 𝑣0
14
MCTS: Child selection
• An action 𝑎 of the root 𝑡0 is selected by some mechanism.
• Max child: select the root child with the highest reward.
• Robust child: select the most visited root child.
• Max-Robust child: select the root child with both the highest
visit count and the highest reward.
• If none exist, search until an acceptable visit count is achieved.
• Secure child: select the child which maximizes a lower
confidence bound.

15
MCTS: Applications

Realtime games and

Nondeterministic games
Physics Simulations

Bus regulation problem

Travelling Salesman Problem 16
A complete walkthrough
with example
Upper confidence bound value
• The upper confidence bound for a node is defined as
𝐥𝐧 𝑵
𝕌ℂ𝔹𝟏 = 𝑽𝒊 + 𝑪
𝒏𝒊

• 𝑉𝑖 : the value estimate of the node 𝑖, which is the average reward of

all nodes beneath this node
• 𝐶 is a tunable bias parameter (in this example, 𝐶 = 2)
• 𝑁: number of times the parent node has been visited
• 𝑛𝑖 : number of times the current node 𝑖 has been visited

• UCB1 can be served as a tree policy.

18
Roll out
• Randomly pick an action at
each step and simulate the
action to receive an average
reward when the game ends

Loop:
if Si is a terminal state:
return Value(Si)
Ai = random(available_actions(Si))
Si = Simulate(Si, Ai)
Until a terminal state is reached. 19
An example: Iteration 1
• Let’s start with an initial state 𝑆0 .
• The actions, 𝑎1 and 𝑎2 , lead to
states 𝑆1 and 𝑆2 , each has the total
score 𝑡 and the number of visits 𝑛.

Initial state

• Q: How do we choose between the two child nodes?

• A: Compute the UCB1 values for both child nodes and take whichever
node that maximizes the UCB1 value.
→ Simply take the first node when no node has been visited yet.

20
An example: Iteration 1
• The leaf node taken has not been
visited before.
→ Do a rollout all the way down
to the terminal state.
• Let’s say the value of this rollout
is 20 (i.e., just an example)

Rollout from S1

21
An example: Iteration 1
• The value 20 is backed up all the way to the root.
• So now, 𝑡 = 20 and 𝑛 = 1 for nodes 𝑆1 and 𝑆0 .
• That’s the end of the first iteration

After
backpropagation

22
An example: Iteration 2

• The UCB values are 20 for 𝑆1 and infinity

for 𝑆2 → visit 𝑆2 next.
• Rollout at 𝑆2 to get to the value 10.
• The value 10 is then backed up to the
root → the value at root node now is 30.
23
An example: Iteration 3

𝑙𝑛2 𝑙𝑛2
𝑈𝐶𝐵1 = 20 + 2 𝑈𝐶𝐵1 = 10 + 2
1 1
= 21.67 = 11.67

• 𝑆1 has a higher UCB1 value

→ the expansion will be done here
• We do a rollout from 𝑆3 and get
a value of 0 at the leaf node.
24
An example: Iteration 4
• 𝑆1 has a higher UCB1 value
• The UCB values are 0 for 𝑆3 and
infinity for 𝑆4 → visit 𝑆4 next.
𝑙𝑛3
𝑈𝐶𝐵1 = 20 + 2
2 2
= 21.48
𝑙𝑛3
𝑈𝐶𝐵1 = 10 + 2
1
= 12.1

• A rollout is done till the leaf node to get the value and backpropagate.

25
A deeper insight
into MCTS
Simulation / Playout
• Simulation (or playout) is a sequence of moves that starts in
a current node and ends in a terminal node.

It is a tree node evaluation

approximation computed by
running somehow random
game starting at that node.

27
Simulation / Playout
• During the simulation, the rollout policy function consumes a
game state 𝒔𝒊 and produces the next move/action 𝒂𝒊 .
• The default rollout policy function is a uniform random.
• In practice it is designed to be fast to allow many simulations being
played quickly.

• Simulation always results in an evaluation.

• It is a win, loss or a draw for the games, but generally any value is a
legit result of a simulation.

28
Simulation in AlphaGo and AlphaZero
• In AlphaGo, the evaluation of the leaf 𝑆𝐿 is defined a
𝑉(𝑆𝐿 ) = (1 − 𝛼)𝑣0 (𝑆𝐿 ) + 𝛼𝑧𝐿
• 𝑧𝐿 : a standard rollout evaluation with custom fast rollout policy,
which is a shallow softmax neural network with handcrafted features.
• 𝑣0 : a Value Network that evaluates positions by a 13-layer CNN,
trained on 30mils distinct positions extracted from self-plays.

• In AlphaZero, a 19-layer CNN residual network directly rates

the node.
𝑉 𝑆𝐿 = 𝑓0 (𝑆𝐿 )
• It outputs both position evaluation and moves probability vector.

29
Node expansion on the game tree
• A node is considered visited if a playout has been started in
that node, i.e., has been evaluated at least once.

• A node is fully expanded if all its children are visited.

30
Backpropagation
• Backpropagation carries simulation results up to the root.
• For every node on the path, certain statistics are updated.

A node’s statistics reflects results of

simulation started in all its descendants.

31
Statistics in a node 𝑣
• 𝑄 𝑣 : the total simulation reward
• In a simplest form, it is a sum of simulation results that passed
through considered node.
• 𝑁 𝑣 : the total number of visits
• That is, a counter of how many times a node has been on the
backpropagation path.

• 𝑄 𝑣 indicates how promising the node is and 𝑁 𝑣 refers to

how intensively explored it has been.
• These values are maintained for every visited node.

32
Exploration – Exploitation Dilemma

Nodes with high reward are good candidates to follow (exploitation)

but those with low number of visits may be interesting too (because
they are not explored well). 33
Upper Confidence Bound for trees
• Upper Confidence Bound for trees (UCT) lets us choose the
next node among visited nodes to traverse through.

𝑸 𝒗𝒊 𝒍𝒏 𝑵(𝒗)
𝕌ℂ𝕋 𝒗𝒊 , 𝒗 = +𝑪
𝑵 𝒗𝒊 𝑵 𝒗𝒊
• 𝐶 is a tunable bias parameter (𝐶 = 1/ 2 for rewards in [0,1]).

• There is an essential balance between the first (exploitation)

and second (exploration) terms.
• The exploration term ensures that each child has a nonzero
probability of selection.

34
35
def monte_carlo_tree_search(root):
while resources_left(time, computational power):
leaf = traverse(root) # leaf = unvisited node
simulation_result = rollout(leaf)
backpropagate(leaf, simulation_result)
return best_child(root)
def traverse(node):
while fully_expanded(node):
node = best_uct(node)
return pick_univisted(node.children) or node # in case no
children are present / node is terminal
def rollout(node):
while non_terminal(node):
node = rollout_policy(node)
return result(node)
MCTS
def rollout_policy(node):
return pick_random(node.children)
pseudo-code
def backpropagate(node, result):
if is_root(node) return
node.stats = update_stats(node, result)
backpropagate(node.parent)
def best_child(node):
pick child with highest number of visits
36
List of references
• Browne, Cameron B., et al. "A survey of monte carlo tree search
methods." IEEE Transactions on Computational Intelligence and AI in
games 4.1 (2012): 1-43.
• mctspy : python implementation of Monte Carlo Tree Search algorithm
(link)
• Introduction to Monte Carlo Tree Search: The Game-Changing Algorithm
behind DeepMind’s AlphaGo (link)
• Monte Carlo tree search: beginners guide (link)
• Bruno Bouzy. Monte-Carlo Tree Search (MCTS) for Computer GO.
Lecture notes for AOA class, Université Paris Descartes (link)

37
38

Session 04 Monte Carlo Tree Search
No ratings yet
Session 04 Monte Carlo Tree Search
28 pages
Case Study On AlphaGo Zero
No ratings yet
Case Study On AlphaGo Zero
21 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
No ratings yet
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
58 pages
Monte Carlo Tree Search
No ratings yet
Monte Carlo Tree Search
19 pages
N3 Shinkanzen Master Tu Vung
No ratings yet
N3 Shinkanzen Master Tu Vung
186 pages
MCTS Ga
No ratings yet
MCTS Ga
5 pages
AI Question Paper Previous Y
No ratings yet
AI Question Paper Previous Y
11 pages
AI Notes - Module4
100% (1)
AI Notes - Module4
21 pages
Agz Unformatted Nature
No ratings yet
Agz Unformatted Nature
42 pages
Mod8 Slides
No ratings yet
Mod8 Slides
91 pages
Games Playing-2-57
No ratings yet
Games Playing-2-57
56 pages
MCS 3201-Intelligent Systems: Gihan Seneviratne Gps@ucsc - LK
No ratings yet
MCS 3201-Intelligent Systems: Gihan Seneviratne Gps@ucsc - LK
88 pages
MCTS Katef
No ratings yet
MCTS Katef
56 pages
2024 MTH058 Lecture02 Backpropagation
No ratings yet
2024 MTH058 Lecture02 Backpropagation
62 pages
Monte Carlo Tree Search A Review of Recent Modifications and Applications
No ratings yet
Monte Carlo Tree Search A Review of Recent Modifications and Applications
99 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
2024 MTH058 Lecture01 IntroductionToAI
No ratings yet
2024 MTH058 Lecture01 IntroductionToAI
52 pages
cs188 Su24 Lec06
No ratings yet
cs188 Su24 Lec06
79 pages
Applsci 11 02056 v2
No ratings yet
Applsci 11 02056 v2
18 pages
Applying Monte Carlo Tree Search To The Strategic Game HIVE
No ratings yet
Applying Monte Carlo Tree Search To The Strategic Game HIVE
46 pages
Game Search Algorithms in AI
No ratings yet
Game Search Algorithms in AI
5 pages
Optimal Control and Planning
No ratings yet
Optimal Control and Planning
46 pages
Week 3 C5 Adversarial Search and Games (Belano & Ong Chua)
No ratings yet
Week 3 C5 Adversarial Search and Games (Belano & Ong Chua)
57 pages
A Survey On Large Language Model Based Autonomous Agents
No ratings yet
A Survey On Large Language Model Based Autonomous Agents
42 pages
2024 MTH058 Lecture07 FederatedLearning
No ratings yet
2024 MTH058 Lecture07 FederatedLearning
25 pages
Ai Unit Ii
No ratings yet
Ai Unit Ii
74 pages
MCTSintro BR
No ratings yet
MCTSintro BR
33 pages
Lecture 7 - Expectmax and Mcts
No ratings yet
Lecture 7 - Expectmax and Mcts
40 pages
Module4 Chapter2
No ratings yet
Module4 Chapter2
30 pages
Unit 3 & 4 PDF Ai
No ratings yet
Unit 3 & 4 PDF Ai
70 pages
Mcts Survey Master PDF
No ratings yet
Mcts Survey Master PDF
49 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
4.2. Adversarial Search and Games
No ratings yet
4.2. Adversarial Search and Games
32 pages
04 Games PDF
No ratings yet
04 Games PDF
77 pages
2015 Convolutional Monte Carlo Rollouts in Go
No ratings yet
2015 Convolutional Monte Carlo Rollouts in Go
10 pages
AI in Entertainment
No ratings yet
AI in Entertainment
10 pages
Othello Reinforcement Learning
No ratings yet
Othello Reinforcement Learning
30 pages
Expect I Max
No ratings yet
Expect I Max
51 pages
AI Lec11
No ratings yet
AI Lec11
33 pages
FALLSEM2024-25 BEEE411L TH CH2024250101504 Reference Material I 02-09-2024 Module - 4 CSP
No ratings yet
FALLSEM2024-25 BEEE411L TH CH2024250101504 Reference Material I 02-09-2024 Module - 4 CSP
22 pages
InvitedTutorial CP12
No ratings yet
InvitedTutorial CP12
106 pages
Adversarial Search
No ratings yet
Adversarial Search
27 pages
Power Mean Estimation in Stochastic Monte-Carlo Tree Search: Tuan Dam
No ratings yet
Power Mean Estimation in Stochastic Monte-Carlo Tree Search: Tuan Dam
25 pages
Comp Go
No ratings yet
Comp Go
69 pages
AI Monte Carlo
No ratings yet
AI Monte Carlo
28 pages
Ai3 1
No ratings yet
Ai3 1
23 pages
Chess
No ratings yet
Chess
24 pages
DeepMind Models
No ratings yet
DeepMind Models
24 pages
MCTS and AlphaGo
No ratings yet
MCTS and AlphaGo
58 pages
Lecture Notes Adversarial Search
No ratings yet
Lecture Notes Adversarial Search
13 pages
Monte Carlo Tree Search
No ratings yet
Monte Carlo Tree Search
15 pages
ExACT - Teaching AI Agents To Explore With Reflective-MCTS and Exploratory Learning
No ratings yet
ExACT - Teaching AI Agents To Explore With Reflective-MCTS and Exploratory Learning
25 pages
16.MCTS Tutorial
No ratings yet
16.MCTS Tutorial
28 pages
Ppo Mcts相关概念
No ratings yet
Ppo Mcts相关概念
20 pages
Language Agent Tree Search
No ratings yet
Language Agent Tree Search
26 pages
Game-Tree Properties and MCTS Performance: Hilmar Finnsson and Yngvi BJ Ornsson
No ratings yet
Game-Tree Properties and MCTS Performance: Hilmar Finnsson and Yngvi BJ Ornsson
8 pages
Hexmage - Encounter Balancing in Hex Arena: Jakub Arnold
No ratings yet
Hexmage - Encounter Balancing in Hex Arena: Jakub Arnold
45 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
Accessing Gpt-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-Refine With Llama-3 8B
No ratings yet
Accessing Gpt-4 Level Mathematical Olympiad Solutions Via Monte Carlo Tree Self-Refine With Llama-3 8B
12 pages
Seed-CTS:: Unleashing The Power of Tree Search For Superior Performance in Competitive Coding Tasks
No ratings yet
Seed-CTS:: Unleashing The Power of Tree Search For Superior Performance in Competitive Coding Tasks
19 pages
Bdo Co1 Session 3
No ratings yet
Bdo Co1 Session 3
25 pages
A Survey On LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components
No ratings yet
A Survey On LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components
20 pages
Biasing Monte-Carlo Simulations Through RAVE Values
No ratings yet
Biasing Monte-Carlo Simulations Through RAVE Values
11 pages
AI in Human-Computer Gaming: Techniques, Challenges and Opportunities
No ratings yet
AI in Human-Computer Gaming: Techniques, Challenges and Opportunities
14 pages
Monte Carlo Tree Self Refine For Math Llama 3
No ratings yet
Monte Carlo Tree Self Refine For Math Llama 3
12 pages
Enhancing Decision-Making For LLM Agents Via Step-Level Q-Value Models
No ratings yet
Enhancing Decision-Making For LLM Agents Via Step-Level Q-Value Models
14 pages
Ai 2048 Game Playing
No ratings yet
Ai 2048 Game Playing
11 pages
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
No ratings yet
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
6 pages
Reinforcement Learning For Combinatorial Optimization: A Survey
No ratings yet
Reinforcement Learning For Combinatorial Optimization: A Survey
24 pages
RL - Unit III
No ratings yet
RL - Unit III
12 pages
NeurIPS 2019 Maximum Entropy Monte Carlo Planning Paper
No ratings yet
NeurIPS 2019 Maximum Entropy Monte Carlo Planning Paper
9 pages
Paper 49
No ratings yet
Paper 49
8 pages
Gaojie Ai Monte Carlo
No ratings yet
Gaojie Ai Monte Carlo
30 pages
Patra Et Al. - 2020 - Accelerating Copolymer Inverse Design Using Monte
No ratings yet
Patra Et Al. - 2020 - Accelerating Copolymer Inverse Design Using Monte
10 pages
Consumer Electronics Product Manufacturing Time Reduction and Optimization Using AI-Based PCB and VLSI Circuit Designing
No ratings yet
Consumer Electronics Product Manufacturing Time Reduction and Optimization Using AI-Based PCB and VLSI Circuit Designing
11 pages
p314 Brand
No ratings yet
p314 Brand
9 pages
Learning Compositional Neural Programs With Recursive Tree Search and Planning
No ratings yet
Learning Compositional Neural Programs With Recursive Tree Search and Planning
19 pages
Notes RL
No ratings yet
Notes RL
12 pages
Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS
No ratings yet
Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS
7 pages
Alpha Zero Connectx
No ratings yet
Alpha Zero Connectx
8 pages
Monte Carlo Tree Search
No ratings yet
Monte Carlo Tree Search
9 pages
C-MCTS: Safe Planning With Monte Carlo Tree Search: Preprint. Under Review
No ratings yet
C-MCTS: Safe Planning With Monte Carlo Tree Search: Preprint. Under Review
13 pages
Player Preference and Style in A Leading Mobile Card Game
No ratings yet
Player Preference and Style in A Leading Mobile Card Game
10 pages
1 PB
No ratings yet
1 PB
7 pages
Monte Carlo Tree Search
No ratings yet
Monte Carlo Tree Search
8 pages
Monte Carlo Tree Search
No ratings yet
Monte Carlo Tree Search
3 pages
Automatic Design of PM Motor Using Monte Carlo Tree Search in Conjunction With Topology Optimization
No ratings yet
Automatic Design of PM Motor Using Monte Carlo Tree Search in Conjunction With Topology Optimization
4 pages
EDAP01
No ratings yet
EDAP01
4 pages
AIM 30 Edwards Hart Alpha Beta Heuristic
No ratings yet
AIM 30 Edwards Hart Alpha Beta Heuristic
5 pages