Dev Builds » 20240528-1634

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 15. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host Duration Avg Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo
ncm-dbt-01 06:52:42 584762 3998 1450 608 1940 +74.28 ± 5.02 1 138 890 958 12 +153.74 ± 11.39
ncm-dbt-02 06:51:52 585446 4018 1513 614 1891 +79.07 ± 5.04 1 138 841 1019 10 +165.37 ± 11.74
ncm-dbt-03 06:50:54 585547 4000 1488 538 1974 +84.12 ± 5.04 1 119 828 1033 19 +175.44 ± 11.82
ncm-dbt-04 06:51:33 570895 4000 1487 610 1903 +77.43 ± 4.94 3 114 898 973 12 +161.49 ± 11.3
ncm-dbt-05 06:52:48 584089 3984 1498 594 1892 +80.23 ± 4.87 2 101 891 987 11 +168.1 ± 11.33
20000 7436 2964 9600 +79.02 ± 2.23 8 610 4348 4970 64 +164.76 ± 5.15

Test Detail

ID Host Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo CLI PGN
386864 ncm-dbt-02 586097 18 7 3 8 +78.47 ± 60.12 0 0 5 4 0 +165.92 ± 155.98
386863 ncm-dbt-05 583908 484 186 83 215 +75.08 ± 13.92 0 14 112 115 1 +156.17 ± 32.06
386862 ncm-dbt-04 570388 500 193 72 235 +85.78 ± 15.21 1 17 96 132 4 +178.11 ± 34.94
386861 ncm-dbt-01 581943 498 194 81 223 +80.23 ± 13.04 0 9 119 120 1 +168.31 ± 30.72
386860 ncm-dbt-03 585759 500 189 75 236 +80.63 ± 14.66 0 19 100 129 2 +167.53 ± 34.22
386859 ncm-dbt-02 583237 500 182 75 243 +75.52 ± 14.79 0 21 103 124 2 +155.54 ± 33.71
386858 ncm-dbt-05 586012 500 188 65 247 +87.26 ± 13.8 0 12 105 131 2 +183.51 ± 33.24
386857 ncm-dbt-04 570509 500 185 78 237 +75.52 ± 14.65 1 16 111 119 3 +155.54 ± 32.36
386856 ncm-dbt-01 584201 500 176 69 255 +75.52 ± 14.51 0 20 104 125 1 +157.24 ± 33.54
386855 ncm-dbt-03 585885 500 183 72 245 +78.43 ± 14.17 0 17 106 126 1 +164.07 ± 33.19
386854 ncm-dbt-02 585042 500 182 77 241 +74.06 ± 15.15 0 26 93 131 0 +155.54 ± 35.33
386853 ncm-dbt-05 585295 500 192 70 238 +86.52 ± 14.38 0 15 101 131 3 +179.9 ± 34.03
386852 ncm-dbt-04 570268 500 184 84 232 +70.43 ± 13.91 0 17 117 115 1 +145.54 ± 31.41
386851 ncm-dbt-01 586139 500 185 75 240 +77.7 ± 14.58 0 17 110 119 4 +157.24 ± 32.52
386850 ncm-dbt-03 586139 500 189 72 239 +82.83 ± 14.0 0 14 107 127 2 +172.78 ± 32.96
386849 ncm-dbt-02 585590 500 201 69 230 +93.95 ± 14.53 0 15 91 141 3 +198.34 ± 35.92
386848 ncm-dbt-05 581985 500 189 74 237 +81.37 ± 14.11 1 12 110 125 2 +171.02 ± 32.42
386847 ncm-dbt-03 584748 500 184 71 245 +79.9 ± 14.21 0 17 104 128 1 +167.53 ± 33.52
386846 ncm-dbt-04 570268 500 191 76 233 +81.37 ± 13.2 0 11 113 126 0 +172.78 ± 31.83
386845 ncm-dbt-01 581818 500 178 67 255 +78.44 ± 13.73 0 15 109 126 0 +165.8 ± 32.64
386844 ncm-dbt-02 586604 500 186 73 241 +79.9 ± 13.62 1 11 112 126 0 +171.02 ± 32.04
386843 ncm-dbt-05 582987 500 192 76 232 +82.1 ± 13.98 1 13 105 131 0 +176.33 ± 33.3
386842 ncm-dbt-04 572114 500 186 63 251 +87.26 ± 13.17 0 10 107 133 0 +187.16 ± 32.81
386841 ncm-dbt-01 585126 500 177 92 231 +59.64 ± 14.46 0 24 118 107 1 +121.46 ± 31.39
386840 ncm-dbt-02 582402 500 191 72 237 +84.3 ± 14.33 0 17 98 134 1 +178.11 ± 34.58
386839 ncm-dbt-03 585000 500 186 60 254 +89.48 ± 14.0 0 12 103 132 3 +187.16 ± 33.6
386838 ncm-dbt-05 582736 500 185 87 228 +68.99 ± 13.71 0 14 127 106 3 +138.99 ± 29.78
386837 ncm-dbt-04 570949 500 189 81 230 +76.25 ± 14.1 0 16 112 120 2 +157.24 ± 32.18
386836 ncm-dbt-03 584453 500 187 60 253 +90.22 ± 14.31 0 12 104 129 5 +185.33 ± 33.42
386835 ncm-dbt-02 584411 500 189 75 236 +80.63 ± 14.09 0 14 111 122 3 +165.8 ± 32.28
386834 ncm-dbt-01 584034 500 175 77 248 +68.99 ± 13.71 0 17 118 115 0 +143.89 ± 31.26
386833 ncm-dbt-04 570509 500 178 71 251 +75.52 ± 13.34 0 12 120 117 1 +157.24 ± 30.75
386832 ncm-dbt-05 583698 500 193 68 239 +88.74 ± 12.2 0 4 117 129 0 +190.85 ± 30.63
386831 ncm-dbt-03 581444 500 185 64 251 +85.78 ± 14.07 0 14 103 131 2 +179.9 ± 33.65
386830 ncm-dbt-02 586266 500 181 90 229 +63.94 ± 14.23 0 21 118 110 1 +130.94 ± 31.35
386829 ncm-dbt-01 584958 500 178 72 250 +74.79 ± 14.9 1 18 108 120 3 +153.86 ± 32.88
386828 ncm-dbt-02 589368 500 194 80 226 +80.63 ± 13.49 0 13 110 127 0 +171.02 ± 32.42
386827 ncm-dbt-04 572155 500 181 85 234 +67.55 ± 13.94 1 15 122 111 1 +140.62 ± 30.61
386826 ncm-dbt-05 586097 500 173 71 256 +71.88 ± 13.81 0 17 114 119 0 +150.51 ± 31.88
386825 ncm-dbt-01 589882 500 187 75 238 +79.17 ± 14.48 0 18 104 126 2 +164.07 ± 33.53
386824 ncm-dbt-03 590953 500 185 64 251 +85.78 ± 14.65 1 14 101 131 3 +179.9 ± 34.03

Commit

Commit ID a169c78b6d3b082068deb49a39aaa1fd75464c7f
Author Tomasz Sobczyk
Date 2024-05-28 16:34:15 UTC
Improve performance on NUMA systems Allow for NUMA memory replication for NNUE weights. Bind threads to ensure execution on a specific NUMA node. This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc. Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies. ----------------- A new UCI option `NumaPolicy` is introduced. It can take the following values: ``` system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node none - assumes there is 1 NUMA node, never binds threads auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count) [[custom]] - // ':'-separated numa nodes // ','-separated cpu indices // supports "first-last" range syntax for cpu indices, for example '0-15,32-47:16-31,48-63' ``` Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT. The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better. Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided. On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly. ----------------- We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary. MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway. Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux. Linux is supported, **without** libnuma requirement. `lscpu` is expected. ----------------- Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8 ``` LLR: 2.95 (-2.94,2.94) <0.00,10.00> Total: 278 W: 110 L: 29 D: 139 Ptnml(0-2): 0, 1, 56, 82, 0 ``` Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd ``` LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 67152 W: 17354 L: 17177 D: 32621 Ptnml(0-2): 64, 7428, 18408, 7619, 57 ``` Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c ``` LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 131648 W: 34155 L: 34045 D: 63448 Ptnml(0-2): 426, 13878, 37096, 14008, 416 ``` fixes #5253 closes https://github.com/official-stockfish/Stockfish/pull/5285 No functional change
Copyright 2011–2024 Next Chess Move LLC