NCM plays each Stockfish dev build 20,000 times against Stockfish 15. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.
Host | Duration | Avg Base NPS | Games | WLD | Standard Elo | Ptnml(0-2) | Gamepair Elo |
---|
ID | Host | Base NPS | Games | WLD | Standard Elo | Ptnml(0-2) | Gamepair Elo | CLI | PGN |
---|
Commit ID | e2226cbb20342bc23ed42cfe538b4e3ca3697291 |
---|---|
Author | Marco Costalba |
Date | 2015-02-22 11:59:34 UTC |
Use only 'level' as late join metric
It seems other metric are useless, this allow us
to simplify the code and to prune useless stuff.
STC 20K games 4 threads
ELO: -0.76 +-2.8 (95%) LOS: 29.9%
Total: 20000 W: 3477 L: 3521 D: 13002
STC 10K games 16 threads
ELO: 1.36 +-3.9 (95%) LOS: 75.0%
Total: 10000 W: 1690 L: 1651 D: 6659
bench: 8253813
|