Fdocuments - Us - Large Scale Analysis of Chess Games With Chess Engines Chess Games With Chess
Fdocuments - Us - Large Scale Analysis of Chess Games With Chess Engines Chess Games With Chess
Large-scale Analysis of
Chess Games with Chess
Engines: A Preliminary
Report
Mathieu Acher, François Esnault
ISRN INRIA/RT--479--FR+ENG
TECHNICAL
REPORT
ISSN 0249-0803
N° 479
April 2016
Project-Teams DiverSE
Large-scale Analysis of Chess Games with
Chess Engines: A Preliminary Report
Abstract: The strength of chess engines together with the availability of numerous chess games
have attracted the attention of chess players, data scientists, and researchers during the last
decades. State-of-the-art engines now provide an authoritative judgement that can be used in
many applications like cheating detection, intrinsic ratings computation, skill assessment, or the
study of human decision-making. A key issue for the research community is to gather a large
dataset of chess games together with the judgement of chess engines. Unfortunately the analysis
of each move takes lots of times. In this paper, we report our effort to analyse almost 5 millions
chess games with a computing grid. During summer 2015, we processed 270 millions unique played
positions using the Stockfish engine with a quite high depth (20). We populated a database of
1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on
a single machine. Our effort is a first step towards the replication of research results, the sup-
ply of open data and procedures for exploring new directions, and the investigation of software
engineering/scalability issues when computing billions of moves.
Key-words: chess game, data analysis, artificial intelligence
RESEARCH CENTRE
RENNES – BRETAGNE ATLANTIQUE
1 Introduction
Millions of chess games have been recorded from the very beginning of chess history to the last
tournaments of top chess players. Meanwhile chess engines have continuously improved up to the
point they cannot only beat world chess champions but also provide an authoritative assessment [7–
9, 14]. The strengths of chess engines together with the availability of numerous chess games
have attracted the attention of chess players, data scientists, and researchers during the last three
decades. For instance professional players use chess engines on a daily basis to seek strong novelties;
chess players in general confront the moves they played to the evaluation of a chess engine for
determining if they do not miss an opportunity or blunder at some points.
From a scientific point of view, numerous aspects of the chess game have been considered, being
for quantifying the complexity of a position, assessing the skills, ratings, or styles of (famous) chess
players [2, 3, 5, 11, 12, 15], or studying the chess engines themselves [4, 6]. Questions like ”Who
are the best chess players in history?” can potentially have a precise answer with the objective
(and hopefully optimal) judgement of a chess engine. So far numerous applications have been
considered, such as methods for detecting cheaters [1, 10], the computation of an intrinsic rating or
the identification of key moments chess players blunder [13].
A key issue for the research community is to gather a large dataset of chess games together with
the judgement of chess engines [2]. For doing so, scientists typically need to analyze millions of
games, moves, and combinations with chess engines. Unfortunately it still requires lots of compu-
tations since (1) there are numerous games and moves to consider while (2) chess engines typically
need seconds for fully exploring the space of combinations and thus providing a precise evaluation
for a given position. As a result and due to the limitation of computing storage or power, chess
engines have been executed on a limited number of games or with specific parameters to reduce the
amount of computation.
Our objective is to propose an open infrastructure for the large-scale analysis of chess games.
We hope to consider more players, games, moves, chess engines, parameters (e.g., the depth used by
a chess engine), and methods for processing the overall data. With the gathering of a rich and large
collection of chess engines’ evaluations, we aim to (1) replicate state-of-the-art research results [2]
(e.g., on cheat detection or intrinsic ratings); (2) provide open data and procedures for exploring
∗ Inria/IRISA, University of Rennes 1, France
RT n° 479
4 Acher and Esnault
new directions; (3) investigate software engineering/scalability issues when computing millions of
moves; (4) organize a community of potential contributors for fun and profit.
In this paper, we report our recent effort to analyse almost 5 millions chess games with a
computing grid. During summer 2015, we processed 270 millions of unique played positions using
the Stockfish [14] chess engine with a quite high depth (20). Overall we populated a database of
1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on
a single machine.
Data analysts or scientists can use the dataset as well as the procedures to gather novel insights,
revisit existing works, or address novel issues. The lessons and numbers of our experience report can
also be of interest for launching other large-scale analysis of chess games with other chess engines,
games, and settings.
1 https://github.com/jvarsoke/ictk
2 https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/
3 http://www.randalolson.com/2014/05/24/chess-tournament-matches-and-elo-ratings/
Inria
Large-scale Analysis of Chess Games with Chess Engines 5
RT n° 479
6 Acher and Esnault
The number of ply in some games can be suspicious or presents limited interests:
• Landa,K (2678) - Grall,G (1812) 1. e4 e5 2. Bc4 Bc5 3. Qh5 Nf6 4. Qxf7# 1-0 (2007)
• Strekelj,V (1843) - Kristovic,M (2328) 1. f3 e5 2. Kf2 d5 3. Kg3 Bc5 4. Nc3 Qg5# 0-1 (2011)
• Vera Gonzalez,J (1551) — Hernandez Carmenates,Hold (2573): 1-0 (Elo difference: 1022)
Inria
Large-scale Analysis of Chess Games with Chess Engines 7
Figure 3: Elo difference and games win (similar probabilities graph/outcomes of Elo rating system)
(a) White players, Elo rating, and games win (b) Games win, color, and Elo
RT n° 479
8 Acher and Esnault
(a) (b)
• the distribution of Elo ratings and difference of Elo ratings between two opponents
We also observe some differences; we comment two of them here. First, the number of moves
depending on difference Elo Rating (see Figure 2, page 5): Olson’s dataset does not exhibit an
increase curve between 0 and 200 difference Elo rating. Moreover the number of ply is slightly
higher all along the graph. Second, the proportion of games win by white players depending on
Inria
Large-scale Analysis of Chess Games with Chess Engines 9
(a) Years of games (b) Ply per game depending on the date
(a) Percentages games win by white player depending (b) Percentages games win by color depending on the date
on the date
RT n° 479
10 Acher and Esnault
the date (see Figure 7a, page 9). We concur with the conclusion that having white pieces is an
advantage for all periods. However we also observe a linear decrease that is not apparent in Olson’s
dataset, especially for old periods.
Our conclusion is that there is no fundamental difference, i.e., the number of games considered
in the two datasets can explain such differences.
2.8 Summary
Appendix B provides further results related to kinds of moves (promoted rates, queen side castling
rates, etc.). Several other information can be extracted from the dataset as well. Our results so far
suggest that (1) the database contains numerous interesting games with rather strong players; (2)
headers information such as Elo rating or results of the games are coherent. Though some games
can certainly be removed or corrected, we consider the dataset is representative of existing chess
databases and consistent with chess practices and trends. We also obtain similar properties than
in other datasets or databases. Finally, the number of games (almost 5 millions) is significant.
moves/lines. When k=1, Stockfish returns the best line and evaluation. The increase of multipv is possible and has
practical interests (see, e.g., [2]), but it has also a computational and storage cost.
Inria
Large-scale Analysis of Chess Games with Chess Engines 11
Structuring data. We used a simple database schema (see Figure ??, page 13). PGN
headers informations are stored and structured for retrieving games, positions, players, etc. For
each position (FEN), we associate the score and log (multipv=1) computed by Stockfish. We de-
veloped several proof-of-concepts to validate the schema: https://github.com/ChessAnalysis/
chess-analysis-database. For instance, we can gather all positions of a game and depict the
scores’ evolution.
Distributing the computation. Our experiments suggested that it takes about 6 seconds to
analyze a FEN on a basic machine. The use of a single machine was simply not an option since we
have to analyze 270 millions position. It would require 270 ∗ 106 ∗ 6 = 1620 ∗ 106 seconds, 450K
hours, 18K days, and around 50 years of computation. The third step of our process was thus to
distribute the computation on a cluster of machines. We used IGRIDA6 , a computing grid available
to research teams at IRISA / INRIA, in Rennes. The computing infrastructure has 125 computing
nodes and 1500 cores.
Computational and storage cost. We split the FEN positions for distributing the compu-
tation on different nodes. We processed in batch (without user intervention) and the analysis was
incremental. We used in average 200+ cores during night and day during 2 months. We gathered
around 1,5 tera-octets of data (FEN logs).
6 http://igrida.gforge.inria.fr/
RT n° 479
12 Acher and Esnault
References
[1] David J. Barnes and Julio Hernandez-Castro. On the limits of engine analysis for cheating
detection in chess. Computers and Security, 48:58 – 73, 2015.
[2] T. Biswas, G. Haworth, and K. Regan. A comparative review of skill assessment: Performance,
prediction and profiling. In 14th Advances in Computer Games conference, 2015.
[3] Tamal T. Biswas and Kenneth W. Regan. Quantifying depth and complexity of thinking and
knowledge. In ICAART 2015 - Proceedings of the International Conference on Agents and
Artificial Intelligence, Volume 2, pages 602–607, 2015.
[4] Diogo R. Ferreira. The impact of search depth on chess playing strength. ICGA Journal No.
2, 36(2), june 2013.
[5] Matej Guid, Aritz Pérez, and Ivan Bratko. How trustworthy is crafty’s analysis of world chess
champions. ICGA journal, 31(3):131–144, 2008.
[6] Guy McCrossan Haworth et al. Gentlemen, stop your engines! ICGA Journal, 30(3):150–156,
2007.
[7] Feng-Hsiung Hsu. Behind Deep Blue: Building the computer that defeated the world chess
champion. Princeton University Press, 2002.
[8] Matthew Lai. Giraffe: Using deep reinforcement learning to play chess. CoRR, abs/1509.01549,
2015.
[9] CCRL 40/40 Rating List. http://www.computerchess.org.uk/ccrl/4040/, 2015.
Inria
Large-scale Analysis of Chess Games with Chess Engines 13
A Database schema
RT n° 479
14 Acher and Esnault
Figure 8: There are only a few checkmates during games because players typically resign when they
realize the defeat is near. The slight increase is surprising and deserves more investigations (e.g., a
possible explanation is the inclusion of recent rapid games like Blitz in the dataset).
Inria
Large-scale Analysis of Chess Games with Chess Engines 15
Figure 9: The ratio of captured pieces during a game has slowly decreased
RT n° 479
16 Acher and Esnault
(g)
Inria
RESEARCH CENTRE Publisher
RENNES – BRETAGNE ATLANTIQUE Inria
Domaine de Voluceau - Rocquencourt
Campus universitaire de Beaulieu BP 105 - 78153 Le Chesnay Cedex
35042 Rennes Cedex inria.fr
ISSN 0249-0803