Tutorial - AlphaGo PDF
Tutorial - AlphaGo PDF
Bo An
18/3/2016
2 /27
The game of Go
3 /27
AlphaGo vs European Champion (Fan Hui 2-Dan*)rank
October 5– 9, 2015
5 /27
Computer Go AI - Definition
d=1 d=2
Computer Go
s Artificial a s’
=
Intelligence
6 /27
Computer Go AI – An Implementation Idea?
d=1 d=2 d=3
... d = max D ~ 19x19=361
Process the simulation until the
game ends, then report win / lose
results
How about simulating all possible board
positions?e.g. it wins 13 times if the next stone gets placed here
…
37,839 times
Choose the “next action / stone”
that has the most win-counts in
the full-scale simulation
431320 times
7 /27
8 /27
AlphaGo: Key Ideas
9 /27
Reducing Searching Space
2.
1. Position
Reducingevaluation ahead of time
“action candidates” (Depth
(Breadth Reduction)
Reduction)
V = 10
…
If
If there
thereis of
Instead
Remove athese
is model that can
asimulating
function
from tell you
that
until
search can that
the
candidates
these moves V(s):
measure: are not“board
common / probable (e.g.
evaluation
maximum
in advance depth..
(breadth reduction)
by experts, etc.)…
of state s”
10 /27
1. Reducing“action candidates”
= P ( a | s)
11 /27
1. Reducing“action candidates”
(1)Imitating expert moves (supervised learning)
s1 s2
Prediction
s2 s3
Model
s3 s4
12 /27
1. Reducing“action candidates”
Prediction
Model
s f: s a a
There are 19×19 = 361 possible actions
(with different possibilities)
13 /27
1. Reducing“action candidates”
(1)Imitating expert moves (supervised learning)
Deep
Learning
Prediction
(13 layer
CNN)
Model
14 /27
Convolutional Neural Network (CNN)
Go: abstraction is the key to win
CNN: abstraction is its forte
15 /27
1. Reducing “action candidates”
Training:
16 /27
1. Reducing “action candidates”
17 /27
1. Reducing “action candidates”
Training:
18 /27
1. Reducing “action candidates”
Expert Moves
Updated Model Updated
Updated
Updated
Updated Model
Model
Model
Model
Imitator
ver 1.3
Model
1.1 VS ver1,000,000
1,000,000
ver 46235.2
ver 1.3
ver 1.5
3204.1 ver ver 2.0
1.7
It uses the same topology as the expert moves imitator model, and just uses the updated parameters
Return:
The finalboard
modelpositions,
wins 80%win/lose infowhen
of the time
playing against the first model
19 /27
2. Board Evaluation
Training:
20 /27
Reducing Search Space
1. Reducing“action candidates”
(Breadth Reduction)
Policy Network
Value Network
21 /27
Looking Ahead (Monte Carlo Search Tree)
22 /27
Looking Ahead (Monte Carlo Search Tree)
23 /27
Results
24 /27
AlphaGo
Lee Sedol 9-dan vs AlphaGo Energy Consumption
25 /27
AlphaGo
Taking CPU / GPU resources to virtually infinity
26 /27
Discussions
AlphaGo’s Weakness
make the state complicated
……
What is the next step?
Poker, Mahjong
Chess RoboCup
Environment Static Dynamic
Information
Complete Incomplete
accessibility
Sensor Non-
Symbolic
Readings symbolic
Control Central Distributed
27 /27