Dev Builds » 20181101-1500

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host	Duration	Avg Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo

Test Detail

ID	Host	Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo	CLI	PGN

Commit

Commit ID	3f1eb85a1ceb1b408f8f51cb82064b69e095399d
Author	Joost VandeVondele
Date	2018-11-01 15:00:56 UTC
Fix issues from using adjustedDepth too broadly The recently committed Fail-High patch (081af9080542a0d076a5482da37103a96ee15f64) had a number of changes beyond adjusting the depth of search on fail high, with some undesirable side effects. 1) Decreasing depth on PV output, confusing GUIs and players alike as described in issue #1787. The depth printed is anyway a convention, let's consider adjustedDepth an implementation detail, and continue to print rootDepth. Depth, nodes, time and move quality all increase as we compute more. (fixing this output has no effect on play). 2) Fixes go depth output (now based on rootDepth again, no effect on play), also reported in issue #1787 3) The depth lastBestDepth is used to compute how long a move is stable, a new move found during fail-high is incorrectly considered stable if based on adjustedDepth instead of rootDepth (this changes time management). Reverting this passed STC and LTC: STC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 82982 W: 17810 L: 17808 D: 47364 http://tests.stockfishchess.org/tests/view/5bd391a80ebc595e0ae1e993 LTC LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 109083 W: 17602 L: 17619 D: 73862 http://tests.stockfishchess.org/tests/view/5bd40c820ebc595e0ae1f1fb 4) In the thread voting scheme, the rank of the fail-high thread is now artificially low, incorrectly since the quality of the move is much better than what adjustedDepth suggests (e.g. if it takes 10 iterations to find VALUE_KNOWN_WIN, it has very low depth). Further evidence comes from a test that showed that the move of highest depth is not better than that of the last PV (which is potentially of much lower adjustedDepth). I.e. this test http://tests.stockfishchess.org/tests/view/5bd37a120ebc595e0ae1e7c3 failed SPRT[0, 5]: LLR: -2.95 (-2.94,2.94) [0.00,5.00] Total: 10609 W: 2266 L: 2345 D: 5998 In a running 5+0.05 th 8 test (more than 10000 games) a positive Elo estimate is shown (strong enough for a [-3,1], possibly not [0,4]): http://tests.stockfishchess.org/tests/view/5bd421be0ebc595e0ae1f315 LLR: -0.13 (-2.94,2.94) [0.00,4.00] Total: 13644 W: 2573 L: 2532 D: 8539 Elo 1.04 [-2.52,4.61] / LOS 71% Thus, restore old behavior as a bugfix, keeping the core of the fail-high patch idea as resolving scheme. This is non-functional for bench, but changes searches via time management and in the threaded case. Bench: 3556672