Dev Builds » 20180330-0848

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 7. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host Duration Avg Base NPS Games Wins Losses Draws Elo
ncm-et-3 06:02:07 2001281 2386 1095 85 1206 +156.94 ± 9.54
ncm-et-4 06:00:36 1991918 2365 1079 53 1233 +161.42 ± 9.3
ncm-et-5 01:47:10 2006101 705 326 31 348 +154.89 ± 17.92
ncm-et-6 01:49:30 2026472 728 330 28 370 +153.37 ± 17.26
ncm-et-7 01:47:06 2024508 705 330 16 359 +166.4 ± 17.32
ncm-et-8 01:49:23 1923781 685 303 19 363 +153.27 ± 17.18
ncm-et-9 06:02:13 2001891 2386 1119 80 1187 +162.12 ± 9.63
ncm-et-10 06:04:10 1971643 2393 1126 60 1207 +166.43 ± 9.47
ncm-et-11 01:49:09 2015314 720 318 17 385 +154.72 ± 16.59
ncm-et-12 01:49:07 2018967 728 344 20 364 +166.25 ± 17.32
ncm-et-13 06:00:02 1978950 2382 1040 79 1263 +148.62 ± 9.24
ncm-et-14 01:49:11 2020430 726 331 20 375 +159.09 ± 16.97
ncm-et-15 06:06:28 1978033 2388 1095 72 1221 +159.1 ± 9.43
ncm-et-16 01:45:52 1984247 703 315 22 366 +154.19 ± 17.21
20000 9151 602 10247 +158.7 ± 3.25

Test Detail

ID Host Started (UTC) Duration Base NPS Games Wins Losses Draws Elo CLI PGN
51906 ncm-et-13 2018-08-31 14:28 00:24:44 1931053 158 81 3 74 +187.93 ± 38.71
51905 ncm-et-9 2018-08-31 14:28 00:25:12 1990349 165 74 5 86 +154.78 ± 35.61
51904 ncm-et-4 2018-08-31 14:27 00:25:17 1990981 164 92 1 71 +217.29 ± 39.44
51903 ncm-et-3 2018-08-31 14:27 00:26:11 1990664 173 82 3 88 +171.31 ± 34.98
51902 ncm-et-15 2018-08-31 14:26 00:26:32 1949746 168 73 7 88 +144.25 ± 35.42
51901 ncm-et-10 2018-08-31 14:25 00:27:12 1919622 172 89 3 80 +190.85 ± 37.2
51900 ncm-et-9 2018-08-31 13:10 01:16:49 1989559 500 234 17 249 +161.49 ± 21.05
51899 ncm-et-13 2018-08-31 13:09 01:17:31 1931451 500 219 15 266 +150.51 ± 20.1
51898 ncm-et-10 2018-08-31 13:09 01:14:54 1980442 500 255 14 231 +182.61 ± 21.98
51897 ncm-et-4 2018-08-31 13:09 01:16:53 1988770 500 209 17 274 +140.62 ± 19.74
51896 ncm-et-3 2018-08-31 13:09 01:16:13 1990507 500 224 17 259 +153.02 ± 20.52
51895 ncm-et-15 2018-08-31 13:09 01:15:40 1981381 500 236 17 247 +163.21 ± 21.16
51600 ncm-et-15 2018-08-28 22:41 01:17:15 1952686 500 233 14 253 +163.21 ± 20.75
51599 ncm-et-9 2018-08-28 22:39 01:14:40 1987982 500 237 20 243 +161.49 ± 21.45
51598 ncm-et-10 2018-08-28 22:39 01:18:04 1909316 500 226 9 265 +161.49 ± 19.95
51597 ncm-et-13 2018-08-28 22:38 01:14:46 1966472 500 219 22 259 +144.72 ± 20.64
51596 ncm-et-3 2018-08-28 22:38 01:15:39 1989086 500 239 14 247 +168.4 ± 21.08
51595 ncm-et-4 2018-08-28 22:38 01:16:33 1989243 500 224 10 266 +158.93 ± 19.93
51594 ncm-et-10 2018-08-28 21:22 01:15:48 1980601 500 234 14 252 +164.07 ± 20.81
51593 ncm-et-13 2018-08-28 21:21 01:15:46 1991455 500 208 16 276 +140.62 ± 19.61
51592 ncm-et-3 2018-08-28 21:21 01:15:45 1989716 500 208 22 270 +135.76 ± 20.08
51591 ncm-et-4 2018-08-28 21:21 01:15:12 1988769 500 242 10 248 +174.55 ± 20.89
51590 ncm-et-9 2018-08-28 21:21 01:16:17 1990348 500 235 14 251 +164.93 ± 20.86
51589 ncm-et-15 2018-08-28 21:21 01:18:19 1949386 500 224 13 263 +156.39 ± 20.19
13760 ncm-et-16 2018-03-30 19:44 00:29:50 2018396 203 100 6 97 +174.13 ± 33.93
13759 ncm-et-8 2018-03-30 19:44 00:29:45 1917962 185 90 4 91 +174.93 ± 34.7
13758 ncm-et-4 2018-03-30 19:44 00:30:14 1995451 201 90 5 106 +156.76 ± 31.84
13757 ncm-et-7 2018-03-30 19:43 00:30:59 2023609 205 87 1 117 +155.34 ± 29.15
13756 ncm-et-5 2018-03-30 19:43 00:31:08 2023283 205 98 11 96 +157.4 ± 34.57
13755 ncm-et-10 2018-03-30 19:41 00:32:55 2019370 221 91 9 121 +135.37 ± 29.91
13754 ncm-et-3 2018-03-30 19:41 00:33:05 2025410 213 104 10 99 +164.64 ± 33.98
13753 ncm-et-11 2018-03-30 19:40 00:33:27 2017096 220 94 7 119 +145.31 ± 30.05
13752 ncm-et-14 2018-03-30 19:40 00:33:29 2021977 226 88 11 127 +123.3 ± 29.17
13751 ncm-et-13 2018-03-30 19:40 00:33:34 2026882 224 103 9 112 +155.39 ± 31.56
13750 ncm-et-15 2018-03-30 19:40 00:33:40 2018558 220 100 4 116 +162.51 ± 30.25
13749 ncm-et-9 2018-03-30 19:40 00:33:50 2025901 221 108 9 104 +167.52 ± 33.02
13748 ncm-et-6 2018-03-30 19:39 00:35:03 2026390 228 114 11 103 +169.17 ± 33.41
13747 ncm-et-12 2018-03-30 19:39 00:34:52 2016609 228 111 4 113 +176.9 ± 30.97
13746 ncm-et-16 2018-03-30 18:27 01:16:02 1950099 500 215 16 269 +146.36 ± 19.97
13745 ncm-et-4 2018-03-30 18:26 01:16:27 1998297 500 222 10 268 +157.24 ± 19.83
13744 ncm-et-7 2018-03-30 18:26 01:16:07 2025408 500 243 15 242 +171.02 ± 21.38
13743 ncm-et-5 2018-03-30 18:26 01:16:02 1988919 500 228 20 252 +153.86 ± 20.97
13742 ncm-et-13 2018-03-30 18:25 01:13:41 2026390 500 210 14 276 +143.89 ± 19.55
13741 ncm-et-10 2018-03-30 18:25 01:15:17 2020511 500 231 11 258 +164.07 ± 20.39
13740 ncm-et-3 2018-03-30 18:24 01:15:14 2022304 500 238 19 243 +163.21 ± 21.43
13739 ncm-et-15 2018-03-30 18:24 01:15:02 2016446 500 229 17 254 +157.24 ± 20.78
13738 ncm-et-11 2018-03-30 18:24 01:15:42 2013532 500 224 10 266 +158.93 ± 19.93
13737 ncm-et-14 2018-03-30 18:24 01:15:42 2018883 500 243 9 248 +176.33 ± 20.86
13736 ncm-et-12 2018-03-30 18:23 01:14:15 2021325 500 233 16 251 +161.49 ± 20.92
13735 ncm-et-9 2018-03-30 18:23 01:15:25 2027210 500 231 15 254 +160.64 ± 20.73
13734 ncm-et-6 2018-03-30 18:23 01:14:27 2026554 500 216 17 267 +146.36 ± 20.1
13733 ncm-et-8 2018-03-30 18:23 01:19:38 1929601 500 213 15 272 +145.54 ± 19.79

Commit

Commit ID c8ef80f466a95ee54e032b289094db0f22a2b956
Author Ondrej Mosnáček
Date 2018-03-30 08:48:57 UTC
Use per-thread dynamic contempt We now use per-thread dynamic contempt. This patch has the following effects: * for Threads=1: **non-functional** * for Threads>1: * with MultiPV=1: **no regression, little to no ELO gain** * with MultiPV>1: **clear improvement over master** First, I tried testing at standard MultiPV=1 play with [0,5] bounds. This yielded 2 yellow and 1 red test: 5+0.05, Threads=5: LLR: -2.96 (-2.94,2.94) [0.00,5.00] Total: 82689 W: 16439 L: 16190 D: 50060 http://tests.stockfishchess.org/tests/view/5aa93a5a0ebc5902952892e6 5+0.05, Threads=8: LLR: -2.96 (-2.94,2.94) [0.00,5.00] Total: 27164 W: 4974 L: 4983 D: 17207 http://tests.stockfishchess.org/tests/view/5ab2639b0ebc5902a6fbefd5 5+0.5, Threads=16: LLR: -2.97 (-2.94,2.94) [0.00,5.00] Total: 41396 W: 7127 L: 7082 D: 27187 http://tests.stockfishchess.org/tests/view/5ab124220ebc59029516cb62 Then, I tested with Skill Level=17 (implicitly MutliPV=4), showing a clear improvement: 5+0.05, Threads=5: LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 3498 W: 1316 L: 1135 D: 1047 http://tests.stockfishchess.org/tests/view/5ab4b6580ebc5902932aeca2 Next, I tested the patch with MultiPV=1 again, this time checking for non-regression ([-3, 1]): 5+0.5, Threads=5: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 65575 W: 12786 L: 12745 D: 40044 http://tests.stockfishchess.org/tests/view/5ab4e8500ebc5902932aecb3 Finally, I ran some tests with fixed number of games, checking if reverting dynamic contempt gains more elo with Skill Level=17 (i.e. MultiPV) than applying the "prevScore" fix and this patch. These tests showed, that this patch gains 15 ELO when playing with Skill Level=17: 5+0.05, Threads=3, "revert dynamic contempt" vs. "WITHOUT this patch": ELO: -11.43 +-4.1 (95%) LOS: 0.0% Total: 20000 W: 7085 L: 7743 D: 5172 http://tests.stockfishchess.org/tests/view/5ab636450ebc590295d88536 5+0.05, Threads=3, "revert dynamic contempt" vs. "WITH this patch": ELO: -26.42 +-4.1 (95%) LOS: 0.0% Total: 20000 W: 6661 L: 8179 D: 5160 http://tests.stockfishchess.org/tests/view/5ab62e680ebc590295d88524 --- ***FAQ*** **Why should this be commited?** I believe that the gain for multi-thread MultiPV search is a sufficient justification for this otherwise neutral change. I also believe this implementation of dynamic contempt is more logical, although this may be just my opinion. **Why is per-thread contempt better at MultiPV?** A likely explanation for the gain in MultiPV mode is that during search each thread independently switches between rootMoves and via the shared contempt score skews each other's evaluation. **Why were the tests done with Skill Level=17?** This was originally suggested by @Hanamuke and the idea is that with Skill Level Stockfish sometimes plays also moves it thinks are slightly sub-optimal and thus the quality of all moves offered by the MultiPV search is checked by the test. **Why are the ELO differences so huge?** This is most likely because of the nature of Skill Level mode -- since it slower and weaker than normal mode, bugs in evaluation have much greater effect. --- Closes https://github.com/official-stockfish/Stockfish/pull/1515. No functional change -- in single thread mode.
Copyright 2011–2024 Next Chess Move LLC