Novel Regression-Based Chess Evaluation Function Challenges Conventional Evaluation Function
Novel Regression-Based Chess Evaluation Function Challenges Conventional Evaluation Function
Novel Regression-Based Chess Evaluation Function Challenges Conventional Evaluation Function
Abstract
21st century chess engines use an evaluation function to determine the best move in a given board state. This evaluation function uses weighted metrics created by chess experts. Without these experts, building a world-class evaluation function is difficult. In this paper, we present a method to create a chess evaluation function that does not require chess expertise. The evaluation function that we present uses 55 metrics that are based only on piece weights and attacking relationships in a position. The weights of each of these metrics were obtained by applying a least squares regression to evaluations of 6588 board positions using a well-known chess engine, Rybka. We replaced the evaluation function in Stockfish, an open source chess engine, with our regression-based evaluation function and played against other computer bots on the Internet Chess Club (ICC). The weights of the pieces determined through our regression model differed from the conventional weights of these pieces, suggesting that the conventional piece weights used to evaluate board states may not be entirely accurate. After playing 31 games, our engine obtained a rating of 2346, which is better than 99.5% of active players on the ICC. Based on the engines success, it appears that someone with little knowledge of chess can in fact engineer a chess engines evaluation function by using regression analysis. This same analysis may be tested in fields like medicine to weight risk factors for diseases like heart disease, cancer, and diabetes.
1. Introduction
21st century chess engines use an evaluation function to find the best move in a board state. These evaluation functions use a set of weighted metrics, created by chess experts. Without access to experts, creating a chess engine is difficult. In this project, we attempt to create a chess engine without a team of experts that can perform at a reasonable playing level.
Figure 1: In this example, only three metrics are used to evaluate the board state. The first metric, with weight a, is the number of white bishops minus black bishops on the board, the second, with weight b is the number of white rooks minus black rooks on the board, and the third, with weight c, is the number of white bishops attacking black rooks minus black bishops attacking white rooks. The first metric value is +1, because white has one bishop on the board and black has zero. The second metric value is -1, because white has zero rooks on the board and black has one. The third metric value is +1 because white has one bishop attacking a black rook, while black has none. From whites perspective the evaluation of this board state is (+1) a + (-1) b + (+1) c or a b + c. In Figure 1, the each metric value was multiplied by the corresponding weight and summed for the three metrics. Equation 1 generalizes an evaluation function for n metrics: Equation 1: mi is the value of metric i; wi is the weight of metric i; e is the evaluation of the board state
every possible board state n moves after the current board state using the engines evaluation function and determines which move would result in the best scenario given perfect play from the opponent. The Minimax Algorithm has runtime complexity O (2n), where n is the depth of evaluation. Due to the expensive nature of the Minimax Algorithm, most modern chess engines use an alpha-beta algorithm to prune branches in the tree of possible continuations. The algorithm stops evaluating a possible move when at least one continuation has been found that proves the move to be worse than a previous move. Although the pruned tree still has an exponential runtime complexity, fewer branches of the tree are examined, so each branch can be explored to a deeper depth.
2. Methods
2.1 Collecting Board States and Evaluations in Rybka
We used the tool AutoIT6 to collect p = 6588 board states and evaluations from Deep Rybka 47, a world champion chess engine. The AutoIT script performs the following tasks: 1. Choose a random board state from Rybkas database of 1.5 million games 2. Use Rybkas Copy Position to export board state as a FEN string 3. Copy the FEN string to a data file 4. Start Rybkas analysis engine 5. Wait for 30 seconds for the evaluation value to converge 6. Match the evaluation of the board state with the FEN string obtained in Step 3 7. Do the same process for another random board state
)(
( )
Equation 2: mpn is the value of the nth metric in the pth board state, wn is the unknown metric weight, ep is the known evaluation of the pth board state
//create a matrix of metric values // with p rows and n columns metricsValues[p][n]; // the first FEN string to be read boardState = 0; while (boardState < p) { read boardState for (metricNumber = 0; metricNumber < n; metricNumber++) { metrics[lineNumber][metricNumber] = <compute metric value of metric> } // next boardState boardState++; } Figure 2: Pseudo code describing how the matrix of metric values is created using the database of board states.
errors between our evaluations and the database evaluations and automate the weights of our metrics. The table in the Appendix displays the 55 metrics, their weights, and method of computing the metric value. We replaced the evaluation function in Stockfish, an open source chess engine, with our weighted metrics. Figure 3 shows how our modified version of Stockfish evaluates a board state. weights[n] = <import weights> metricValues[n]; for (metricNumber = 0; metricNumber < n; metricNumber++) { metricValues[metricNumber] = <compute metric value of the metric> } eval = <dot product of metricsValues and weights> Figure 3: Pseudo code describing how a board state is evaluated. A list of metric values is computed and dotted with a list of weights to obtain the evaluation.
Weight (nonregression) 1 3 3 5 9
How metric value is computed? Number of white pawns number of black pawns Number of white knights number of black knights Number of white bishops number of black bishops Number of white rooks number of black rooks Instance of white queen instance of black queen
Table 1: 5 metrics and their known weights for the non-regression-based evaluation function
10
Graph 1: Illustrates the difference between expected piece weight and regression-based piece weights when the weight of a pawn is normalized (set to 1). The expected weight is larger than the regression-based weight for each piece, but the magnitude of the difference increases with the weight of the piece. Our attacking relationship metrics can be divided into 4 categories: 1. Side to moves lower weighted piece attacking a higher weighted piece; 2. Side to moves higher weighted piece attacking a lower weighted piece; 3. Defending sides lower weighted piece attacking a higher weighted piece; 4. Defending sides higher weighted piece attacking a lower weighted piece. Table 2 shows the average weights for these four metric categories: Side to Move Defending Side Lower Weighted -> Higher Weighted Higher Weighted -> Lower Weighted 2.62 (1) 0.11 (2) 0.39 (3) 0. (4)
Table 2: Shows the average weights of metrics in each attacking relationship category. Illustrates the importance of attacking relationships in the 1st category (Side to moves lower weighted pieces attacking opponents higher weighted pieces)
11
There is a piece weight difference between the two pieces in an attacking relationship. Going by the weights in the Appendix, the piece weight difference for a pawn attacking a queen is 6.64 (the weight of a queen) 1.04 (the weight of a pawn) or 5.60. Graph 2 highlights the correlation between the regression-based weight of the attacking relationship and the piece weight difference for attacking relationships in the 1st category.
Graph 2: Illustrates the strong R^2 = 0.82 positive correlation between piece weight difference and regression-based weight for attacking relationships in the 1st category. The regression-based engine obtained an ICC rating 2346, which is better than 99.5% of active players on ICC, while the non-regression based engine has ICC rating 2213, which is better than 98.6% of active players on ICC. Graph 3 depicts trends in score due to opponent rating.
12
Average Score
0.5 0.4 0.3 0.2 0.1 0 Under 2400 2400 Over 2400 Regression Non Regression
Opponent Playing Level Graph 3: Illustrates the average score of both the regression based and non-regression based engines against opponents rated under 2400, 2400, and above 2400. For both engines, the average score decreased as the playing level increased. The regression based engine scored higher against opponents of all categories, especially opponents rated above 2400, scoring an average of 0.36 against opponents rated over 2400, versus the 0.00 scored by the non-regression-based engine.
4. Discussion
4.1 Results from the Games
Our regression-based model played better than 99.5% people on the ICC, while the non-regression-based engine played better than 98.6% of people on the ICC. The regression-based engine held its ground against higher rated players much more consistently than the non-regression based engine. Also, the regression-based engine both lost to and won against players of a higher caliber than the non-regression based engine. Both the regression-based and non-regression-based engines performed at a reasonable playing level, which suggests that someone with little knowledge of chess can make a decent chess evaluation function. The regression-based engine outperformed the 13
non-regression-based engine, so using a regression to model metrics weights was more effective than using basic chess conventions to model these weights.
14
correlation between the weights of an attacking relationship and the difference between the weights of the two pieces.
4.2.2 Correlation between Piece Weights and Attacking Relationship Weights in 1st Category
The metric weights in the appendix mostly conformed to the expected metrics weights. For the side to move, a large positive weight was given to lesser-weighted pieces attacking pieces with larger weights. After the exchange of pieces, the evaluation of the board state should favor the attacking side, which traded off a lesser piece for the opponents better piece. As expected, the magnitude of the regression-based metric weights strongly correlated (R^2 = 0.82) with the piece weight difference. The strong correlation suggests that we could have used piece weight difference instead of a regression to weight attacking relationships in the 1st category, the most significant category.
15
regression piece weights should match those shown in Table 1. While the regression weighted each piece in this increasing order of magnitude, the weights did not conform to this ratio.
16
weighted metrics, the hypothesis that the existing 1, 3, 3, 5, 9 weight ratio is not optimal would be supported.
17
weight and discrepancy between the weights of the pieces, we must collect more board states in the midst of a piece exchange.
18
6. Acknowledgements
Thanks to my mentor, Dr. Peter Danzig, for his guidance throughout the development of the paper, Mr. Richard Page, whose class project inspired me to pursue this project and who worked with me over the summer on my project goals, and Mr. Christopher Spenner for his advice in writing this paper. Also thanks to the very helpful support staff at stockfishchess.org.
7. References
1
A short history of computer chess. Accessed November 11, 2012. http://www.chessbase.com/columns/column.asp?pid=102. FEN Standard. Accessed November 11, 2012. http://www.chessville.com/ Reference_Center/FEN_Description.htm. Shannon, Claude E. "Programming a Computer for Playing Chess." Philosophical Magazine, November 8, 1949. Accessed November 11, 2012. http://vision.unipv.it/IA1/ProgrammingaComputerforPlayingChess.pdf Minimax search and Alpha-Beta Pruning. Last modified 2002. Accessed November 11, 2012. http://www.cs.cornell.edu/courses/cs312/2002sp/lectures/rec21.htm. Romstad, Tord, Marco Costalba, Joona Kiiski, Daylen Yang, Salvo Spitaleri, and Jim Ablett. Stockfish. Version 2.3.1. 2012. Stockfish. Accessed November 11, 2012. http://stockfishchess.org/. AutoIT. Version 3.3.8.1. 2012. autoitscript. Accessed November 11, 2012. http://www.autoitscript.com/site/. Rajilich, Vasik. Rybka. Version 4. 2010. CD-ROM. Papoulis, Athanasios, and S. Unnikrishna Pillai. Probability, Random Variables and Stochastic Processes. 4th ed. New York, NY: McGraw-Hill, 2002. Wolfram, Stephen. Mathematica. Version 8.0.4. Champaign, IL: Wolfram, 2011. CD-ROM. BlitzIn. Version 3.0.5. 2011. ICC. Accessed November 11, 2012. http://www.chessclub.com/download-software.
10
19
11
Rating Estimator. Last modified August 4, 2012. Accessed November 11, 2012. http://www.uschess.org/content/view/9177/679/.
8. Appendix
Metric Side to Move Pawn X Pawn Pawn X Knight Pawn X Bishop Pawn X Rook Pawn X Queen Knight X Pawn Knight X Knight Knight X Bishop Knight X Rook Knight X Queen Bishop X Pawn Bishop X Knight Bishop X Bishop Bishop X Rook Bishop X Queen Rook X Pawn Rook X Knight Rook X Bishop Rook X Rook Rook X Queen Queen X Pawn Queen X Knight Queen X Bishop Queen X Rook Queen X Queen Defending Side Pawn X Pawn Pawn X Knight Pawn X Bishop Pawn X Rook Pawn X Queen Knight X Pawn Knight X Knight Knight X Bishop Regression (weight) -1.08354 1.33703 1.74922 1.02793 4.7279 0.0956523 -0.228428 0.760167 1.02338 5.36788 0.150581 0.297229 0.0591714 1.79835 4.54408 0.188971 0.426902 0.760438 0.528506 3.86158 0.196079 -0.00773957 0.453631 -0.279459 0.18281 -1.14423 -0.0870649 0.230158 0.863338 0.363773 0.0560502 -0.281736 0.180345 How metric value is computed Instances of pawns attacking pawns Instances of pawns attacking knights Instances of pawns attacking bishops Instances of pawns attacking rooks Instances of pawns attacking queen Instances of knights attacking pawns Instances of knights attacking knights Instances of knights attacking bishops Instances of knights attacking rooks Instances of knights attacking queen Instances of bishops attacking pawns Instances of bishops attacking knights Instances of bishops attacking bishops Instances of bishops attacking rooks Instances of bishops attacking queen Instances of rooks attacking pawns Instances of rooks attacking knights Instances of rooks attacking bishops Instances of rooks attacking rooks Instances of rooks attacking queen Instances of queen attacking pawns Instances of queen attacking knights Instances of queen attacking bishops Instances of queen attacking rooks Instance of queen attacking queen Instances of pawns attacking pawns Instances of pawns attacking knights Instances of pawns attacking bishops Instances of pawns attacking rooks Instances of pawns attacking queen Instances of knights attacking pawns Instances of knights attacking knights Instances of knights attacking bishops 20
Knight X Rook Knight X Queen Bishop X Pawn Bishop X Knight Bishop X Bishop Bishop X Rook Bishop X Queen Rook X Pawn Rook X Knight Rook X Bishop Rook X Rook Rook X Queen Queen X Pawn Queen X Knight Queen X Bishop Queen X Rook Queen X Queen Piece Value Pawn Knight Bishop Rook Queen
-0.10701 0.859854 0.120128 0.1253 -0.422367 0.233419 0.529539 0.109619 0.573432 0.421293 0.119157 0.878515 0.039703 0.020088 0.499491 -0.0405067 -0.18281 1.03723 2.27615 2.38974 3.35919 6.64456
Instances of knights attacking rooks Instances of knights attacking queen Instances of bishops attacking pawns Instances of bishops attacking knights Instances of bishops attacking bishops Instances of bishops attacking rooks Instances of bishops attacking queen Instances of rooks attacking pawns Instances of rooks attacking knights Instances of rooks attacking bishops Instances of rooks attacking rooks Instances of rooks attacking queen Instances of queen attacking pawns Instances of queen attacking knights Instances of queen attacking bishops Instances of queen attacking rooks Instance of queen attacking queen Number of white pawns number of black pawns Number of white knights number of black knights Number of white bishops number of black bishops Number of white rooks number of black rooks Instance of white queen instance of black queen
21