Exploring Gnugo'S Evaluation Function With A SVM: Christopher Fellows Yuri Malitsky Gregory Wojtaszczyk

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Exploring GnuGos Evaluation Function with a SVM

Christopher Fellows Yuri Malitsky Gregory Wojtaszczyk


Computer Science Computer Science Computer Science
Cornell University Cornell University Cornell University
[email protected] [email protected] [email protected]

Abstract GnuGos Evaluation Function


While computers have defeated the best human players in GnuGo (Bump et al. 2005) is a publicly available Go
many classic board games, progress in Go remains elusive. program that is the currently ranked as the fourth best
The large branching factor in the game makes traditional
engine. To select the played move, the program follows a
adversarial search intractable while the complex interaction
of stones makes it difficult to assign a reliable evaluation multistage approach. First, an inner representation of the
function. This is why most existing programs rely on hand- board, consisting of the chains of connected stones, is
tuned heuristics and pattern matching techniques. Yet none compiled. These chains are then analyzed by a variety of
of these solutions perform better than an amateur player. Our pattern matching algorithms called move generators. If the
work introduces a composite approach, aiming to integrate generators determine that a move fulfills a certain criteria,
the strengths of the proved heuristic algorithms, the AI- like finishing a known combination or attacking an
based learning techniques, and the knowledge derived from opponents chain of stones, the move is assigned a
expert games. Specifically, this paper presents an application corresponding reason.
of the Support Vector Machine (SVM) for training the
When the generators are finished, each moves reasons
GnuGo evaluation function.
are evaluated to produce the characteristic values shown in
Introduction Table 1. The values are then evaluated and combined using
a set of hand-tuned rules, and the move with the highest
Go is a deterministic, perfect information, zero sum game
resulting value is played. Our goal is to determine the
between two players. The rules are simple, the two players
scope of positions these hand-tuned rules can cover.
alternate placing black and white stones on a 19x19 grid
board trying to surround as much territory (empty Value Description
intersections) as possible. The only time a stone is removed Territorial The total amount of territory the player is expected
from the board is when it is completely surrounded by the Value to gain by playing this move
opposing players pieces. The game ends when both Strategical Measure of the effect the move has on the safety of
players pass on successive turns. However, despite the Value all chains of stones on the board
simplicity of the rules, the game is very difficult to master. Max Pos How good is the best shape formed by this move as
Shape determined by the pattern matching code
Up to this date the best programs have not reached yet
Max Neg How bad is the worst shape formed by this move as
the strength of amateur dan players. The large branching Shape determined by the pattern matching code
factor makes searching ahead infeasible while the Num Pos Number of good shapes formed by this move as
complicated net of influences among the stones makes it Shape determined by the pattern matching code
difficult to create a reliable evaluation function. Yet these Num Neg Number of bad shapes formed by this move as
complications only make Go an ideal testing ground for AI Shape determined by the pattern matching code
techniques. Follow-up Value which may accrue if the player gets two
In this paper we experimented with training an Value moves in a row in local area
evaluation function with supervised learning. As a basis, Influence
How much territory can be gained if the player gets
Follow-up
the publicly available GnuGo (Bump et al. 2005) was used, Value
two consecutive moves in local area
currently one of the top-ranking Go programs which relies Reverse
The value the opposing player gains if allowed two
mainly on hand-tuned heuristics and pattern matching Follow-up
consecutive moves in a local area
techniques. For the learning algorithm, we turned to Value
Thorsten Joachims SVM-light (Joachims 1999) because of Secondary Given for move reasons which by themselves are
Value not sufficient to play the move
the programs ability to handle large data sets and easy
Min The minimum value this move can get as
access to controlling parameters. Value determined by the pattern matching code
The results suggest that the SVM could be used for Max The maximum value this move can get as
analysis and selection of the optimal evaluation function Value determined by the pattern matching code
based on the combination of heuristic and mathematical Table 1: Description of GnuGos characteristic values
models.
Copyright 2006, American Association for Artificial Intelligence
(www.aaai.org). All rights reserved.

1867
SVM Training Cut / Escape
GnuGo SVM
Training is done on a collection of board positions Connect Capture
Number of
evaluated by GnuGo to produce a set of the characteristic Games 187 187 50 59 15 21
values of the non-zero valued moves. First, these values Territorial
are normalized for each game by subtracting the values of Value 1 14.47 15.1 20.0 0.43 3.97
the correct move from all others. The SVM optimization Strategical
1 14.74 14.2 20.3 0.02 -2.8
technique then finds the hyperplane that separates the Value
correct moves from all others. Max Pos
Shape * 9.43 4.05 44.5 0.75 -2.5
Each game can potentially have up to 381 data points Max Neg
associated with it, resulting in exceedingly large data sets Shape * -8.93 -8.3 3.40 -0.3 0
for only a handful of games. SVM-light (Joachims 1999) is Num Pos
Shape * 3.75 2.58 -33 0.42 21.8
a program specifically designed for handling learning tasks
with many training examples. The overall optimization Num Neg
Shape * -4.46 -4.2 -60 -0.2 0
problem solved by the program is presented below: Follow-up
Value $ 1.85 0 5.62 0 2.03
Influence
$ 7.35 4.79 6.54 -0.3 2.16
Follow-up
Reverse
$ 9.46 0 0 0.01 5.03
In our case, the x is the vector of characteristic values for Follow-up
each move. A positive y denotes the correct move, while Secondary
Value 0.05 0 0 0 0 40.3
all other moves are classified as negative. To make sure
Num
that we find the correct separation, we use a hard margin Connected * 2.74 1.67 10.1 0.02 8.02
and so set C to be large (one thousand). The resulting w Min
# 5.22 0.22 21.3 0.01 -0.1
vector is then used as the weights. Value
Max
# 0.03 0.01 0.10 0 0
Value
Results Table 2: Comparison of GnuGos values and those calculated by the
SVM. The Cut/Connect and Escape Capture show the weights after
For generating our training set, we turn to the Computer training on solved problems and after unsolved problems were added.
Go Test Collection (Mueller 1995) which is maintained by
* values used in a non-linear combination (e.g. 1.05^x )
the Computer Go Group at the University of Alberta. The $ min-max combination (e.g. min(min(x+y+z , x+2y) , t+x) )
advantage of this collection is that it is subdivided into ten # values used in if-else statements
problem types that can occur in a game, including
endgame, life-death, escape capture, low liberties, etc.
Of the 542 problems present, GnuGo currently solves Future Work
380. Training however, is done only on the 187 correctly The project establishes a framework connecting the Gnu
solved problems requiring the actual correct move instead Go engine, a SVM training approach, and a collection of
of avoiding the provably bad one. For the resulting 7,000 expert games. Using a linear kernel it successfully
data points, the SVM finds a perfect separation (Table 2). approximates the hand-tuned heuristics of Gnu Go leading
To show that these weights were not over-fitted, they were to an identical level of play. We expect to employ this
tested on 2,000 positions compiled from a collection of approach as a platform for developing a more advanced
professional title matches. Of these, GnuGo solves 316, evaluation function based on non-linear kernels and an
while the trained weights solve 314 of these and nothing optimal combination of heuristic and mathematical models.
else. This result shows that the learned weights perform
close to GnuGos evaluation function.
To explore the scope of the characteristic values, Acknowledgments
unsolved problems were added to the training set. In all We would like to thank Prof. Joachims for suggesting the
cases, the SVM was able to improve, but to a varying investigation of the SVM approach.
degree for different puzzle types. Table 2 shows results
from two problem type categories: cut connect and escape
capture. The first column presents weights from the References
previous training based on GnuGo solved games, while the Bump, D. et al. 2005. GnuGo
second is a new extended set resolved with the SVM http://www.gnu.org/software/gnugo/gnugo.html
approach. The thing to note is the magnitude of the change Joachims, T. 1999. SVM-light
of some of the weights. It is likely that the static values are http://svmlight.joachims.org/
the ones that are constraining the learning and must be Mueller, M. 1995. Computer Go Test Collection
modified. http://www.cs.ualberta.ca/~games/go/cgtc/

1868

You might also like