Tricky Positions Engines Don't Get Right (At First....)

Chess engines can sometimes miss determining the best line. That is not surprising given the vast number of possible chess lines that exist. A crude estimate of the number of unique chess games is 1x10^120.  For comparison, an upper estimate of the number of stars in the Milky Way galaxy is 4*10^11.

Chess engines deal with this enormous complexity by discarding potential moves they judge to have little merit quite early in the search process (called "pruning").  In rare cases, they discard too early.


In the position shown above (FEN: 8/q2pB3/p2N2r1/p1N5/p1p5/k7/P1P1PPPP/K7 w  - - 0 1), my computer running Stockfish chooses the move Nd3 after 4 minutes of evaluation time to a depth of 63. It also judges the position to be a draw with a 0.0 engine score.

That is not the correct answer.  How do we find the correct move of pawn c2 to c3?

MultiPV Analysis Is The First Step

Essentially we need to "check" an engine's answer. The first step is to ask Stockfish to give us it's 5 best move ideas, instead of concentrating on what it thinks is the best answer. Essentially we are trying to counter it's tendency to quickly prune answers and focus only on one path.

When we do this, we see that the move c2 to c3 appears, but again Stockfish thinks it's just a draw on thus on par with the move Nd3. It also identifies some additional moves that are clearly worse for White.

Forward Analysis Is The Next Step

The next step is to actually take our chess engine's two best ideas and then get it's two best counter-moves for each of those ideas.  Then we evaluate how the original two best ideas hold up against those counter-moves.

While Stockfish's original best idea of Nd3 still evaluates to a draw with a centipawn score of 2, it now realizes that the move pawn to c3 will allow it to checkmate Black in 21 moves!

Check With Another Engine

While Stockfish has been the strongest engine for awhile, Lc0 is quite strong (and improving) and sometimes has insights Stockfish doesn't have (and vice versa).  In this case, Lc0 does in fact come up with the correct move on it's first try.
Lc0 evaluates c3 to have a 95% probability of winning.

A Cautionary Tale

Hopefully this example makes you a little less likely to blindly trust a chess engine's answer. In this example there were really no warning signs: Stockfish originally evaluated it as a tame draw and its evaluation was stable as evaluation depth increased.  

These kinds of mis-evaluations don't happen very often, but the point is you can't know when they will. The solution is to check, check, check!

Comments

Popular Posts