Exploring Gnugo'S Evaluation Function With A SVM: Christopher Fellows Yuri Malitsky Gregory Wojtaszczyk
Exploring Gnugo'S Evaluation Function With A SVM: Christopher Fellows Yuri Malitsky Gregory Wojtaszczyk
Exploring Gnugo'S Evaluation Function With A SVM: Christopher Fellows Yuri Malitsky Gregory Wojtaszczyk
1867
SVM Training Cut / Escape
GnuGo SVM
Training is done on a collection of board positions Connect Capture
Number of
evaluated by GnuGo to produce a set of the characteristic Games 187 187 50 59 15 21
values of the non-zero valued moves. First, these values Territorial
are normalized for each game by subtracting the values of Value 1 14.47 15.1 20.0 0.43 3.97
the correct move from all others. The SVM optimization Strategical
1 14.74 14.2 20.3 0.02 -2.8
technique then finds the hyperplane that separates the Value
correct moves from all others. Max Pos
Shape * 9.43 4.05 44.5 0.75 -2.5
Each game can potentially have up to 381 data points Max Neg
associated with it, resulting in exceedingly large data sets Shape * -8.93 -8.3 3.40 -0.3 0
for only a handful of games. SVM-light (Joachims 1999) is Num Pos
Shape * 3.75 2.58 -33 0.42 21.8
a program specifically designed for handling learning tasks
with many training examples. The overall optimization Num Neg
Shape * -4.46 -4.2 -60 -0.2 0
problem solved by the program is presented below: Follow-up
Value $ 1.85 0 5.62 0 2.03
Influence
$ 7.35 4.79 6.54 -0.3 2.16
Follow-up
Reverse
$ 9.46 0 0 0.01 5.03
In our case, the x is the vector of characteristic values for Follow-up
each move. A positive y denotes the correct move, while Secondary
Value 0.05 0 0 0 0 40.3
all other moves are classified as negative. To make sure
Num
that we find the correct separation, we use a hard margin Connected * 2.74 1.67 10.1 0.02 8.02
and so set C to be large (one thousand). The resulting w Min
# 5.22 0.22 21.3 0.01 -0.1
vector is then used as the weights. Value
Max
# 0.03 0.01 0.10 0 0
Value
Results Table 2: Comparison of GnuGos values and those calculated by the
SVM. The Cut/Connect and Escape Capture show the weights after
For generating our training set, we turn to the Computer training on solved problems and after unsolved problems were added.
Go Test Collection (Mueller 1995) which is maintained by
* values used in a non-linear combination (e.g. 1.05^x )
the Computer Go Group at the University of Alberta. The $ min-max combination (e.g. min(min(x+y+z , x+2y) , t+x) )
advantage of this collection is that it is subdivided into ten # values used in if-else statements
problem types that can occur in a game, including
endgame, life-death, escape capture, low liberties, etc.
Of the 542 problems present, GnuGo currently solves Future Work
380. Training however, is done only on the 187 correctly The project establishes a framework connecting the Gnu
solved problems requiring the actual correct move instead Go engine, a SVM training approach, and a collection of
of avoiding the provably bad one. For the resulting 7,000 expert games. Using a linear kernel it successfully
data points, the SVM finds a perfect separation (Table 2). approximates the hand-tuned heuristics of Gnu Go leading
To show that these weights were not over-fitted, they were to an identical level of play. We expect to employ this
tested on 2,000 positions compiled from a collection of approach as a platform for developing a more advanced
professional title matches. Of these, GnuGo solves 316, evaluation function based on non-linear kernels and an
while the trained weights solve 314 of these and nothing optimal combination of heuristic and mathematical models.
else. This result shows that the learned weights perform
close to GnuGos evaluation function.
To explore the scope of the characteristic values, Acknowledgments
unsolved problems were added to the training set. In all We would like to thank Prof. Joachims for suggesting the
cases, the SVM was able to improve, but to a varying investigation of the SVM approach.
degree for different puzzle types. Table 2 shows results
from two problem type categories: cut connect and escape
capture. The first column presents weights from the References
previous training based on GnuGo solved games, while the Bump, D. et al. 2005. GnuGo
second is a new extended set resolved with the SVM http://www.gnu.org/software/gnugo/gnugo.html
approach. The thing to note is the magnitude of the change Joachims, T. 1999. SVM-light
of some of the weights. It is likely that the static values are http://svmlight.joachims.org/
the ones that are constraining the learning and must be Mueller, M. 1995. Computer Go Test Collection
modified. http://www.cs.ualberta.ca/~games/go/cgtc/
1868