Doing Your Own Engine Testing
How can you tell which chess engine is strongest? Or whether a proposed modification really makes the engine stronger? Or whether using the Syzygy table-base adds any value? Many of your questions can be answered by looking at other people's engine tests either on the Stockfish and Lc0 Discord channels, or by observing results of broadcast engine vs. engine matches (like Chess.com, TCEC, and Navs "Engine Battle" Twitch site). These tests will often be done on computers with extremely powerful (and expensive!) CPUs and GPUs. For example the Navs tests are done using two Nvidia 4090 GPUs with a total MSRP of $3,200 for running Lc0, and a AMD 3990x with 128 threads which costs around $4,000 for running Stockfish. These tests and matches can take days (or weeks) to complete, but they typically produce reliable and repeatable results.
However, there may also be times when you want to try and test something on your own computer. You may want to see if results others are getting can be reproduced on your own machine. Or you may want to try a new version of Lc0 that has not been tested in conditions that match how you would actually use it. Or do your own test of Syzygy's usefulness.
The good news is it easy to set up your own engine vs. engine tests. The bad news is that it takes a lot of time to run a test to produce statistically significant results.
Cute Chess
tar -xzf ./cutechess-cli-1.3.1-linux64.tar.gz
This will create a directory "cutechess-cli" where you executed the command above. You can move the GUI version you downloaded to that directory as well.
You can run the GUI (./Cute_Chess-1.3.1-x86_64.AppImage) if you want to set up a test and watch the moves being played out. However the GUI is also useful for setting up an "engines.json" file that you can use with the terminal client. In the GUI, go to settings and add your engines and configure them (e.g., threads, hash, Syzygy table path, etc.). This will automatically create an "engines.json" file in your ~/.config/cutechess/ directory. Copy this file to the cutechess-cli directory you created above.
Running the cutechess-cli client in the terminal is straightforward. To get a feel for all of the parameters you can set, run './cutechess-cli --help'.
Opening Books
For engine testing you will want to start from unbalanced opening positions in which one side already has an advantage. Balanced openings will produce too many draws, making it difficult to get statistically significant results. Stefan Pohl has created a series of opening books with increasing degrees of imbalance that you can download. Each book is quite large. For example the UHO_XXL_2022_+110_+139.pgn file has approximately 253,000 openings in which White has an opening advantage between 110 and 139 centipawns (100 centipawn advantage means one side is up approximately 1 pawn). Telling cutechess-cli to sample these randomly is useful in case you need to interrupt and resume your test as the chance your resumed games will be playing the same opening is quite low given the large number of openings in the book.
Stefan's books are not the only opening books however. Others include the opening books used in prior TCEC tournaments.
Running A Test From The Terminal
./cutechess-cli -tournament gauntlet -pgnout results.pgn -wait 1000 -event 'sf vs sf-noSyzgy' -tb /run/media/hugh/data/dtz -resign movecount=3 score=300 -draw movenumber=20 movecount=6 score=25 -concurrency 1 -openings file=UHO_XXL_2022_+110_+139.pgn format=pgn policy=round order=random -repeat -recover -rounds 100 -games 2 -engine conf=sf tc=G/120+3 timemargin=200 -engine conf=sf-noSyzygy tc=G/120+3 timemargin=200
Here we are using the Syzygy tablebases to help adjudicate wins and draws (-tb /run/media/hugh/data/dtz), we set up conditions for recognizing when a draw should be recognized, when one side must resign, we specify our opening book, the number of rounds and games per round. With repeat, each side will play the same opening before moving on to the next opening. Our engine parameters have already been set up in "engines.json" in the same directory where we are running cutechess-cli. So we can just refer to the engines by the name used in the json file. We set the timecontrol (there are many options here; G/120+3 says the engine needs to complete all of their moves in two minutes, but they get 3 seconds more for each move they make). It's also useful to set a timemargin (in milliseconds) to prevent a timeout loss caused by the overhead required to initiate each engine).
Once you invoke the command, the tournament starts and cutechess-cli will keep you informed of progress.
While your tournament/test is running, you can examine the finished games in your results.pgn file:
Comments
Post a Comment