NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.
Host | Duration | Avg Base NPS | Games | WLD | Standard Elo | Ptnml(0-2) | Gamepair Elo |
---|
ID | Host | Base NPS | Games | WLD | Standard Elo | Ptnml(0-2) | Gamepair Elo | CLI | PGN |
---|
Commit ID | 54cf226604cfc9d17f432fa0b5bca56277e5561c |
---|---|
Author | FauziAkram |
Date | 2024-11-13 19:09:13 UTC |
Revert VLTC regression from #5634
https://tests.stockfishchess.org/tests/view/671bf61b86d5ee47d953cf23
And thanks to @xu-shawn for suggesting running a VLTC regress test since
depth modifications affect scaling. Also, the LTC was showing a slight
regress after 680+k games ~= -0.34 , for reference:
https://tests.stockfishchess.org/tests/view/67042b1f86d5ee47d953be7c
closes https://github.com/official-stockfish/Stockfish/pull/5663
Bench: 1307308
|