Dev Builds » 20220210-1854

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 7. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host Duration Avg Base NPS Games Wins Losses Draws Elo
ncm-et-3 11:24:24 1960465 3344 2849 7 488 +436.28 ± 15.42
ncm-et-4 11:24:24 1960242 3348 2916 9 423 +460.72 ± 16.59
ncm-et-9 11:24:14 1960698 3330 2891 3 436 +459.29 ± 16.32
ncm-et-10 11:24:08 1956385 3310 2843 5 462 +445.92 ± 15.85
ncm-et-13 11:24:06 1959712 3344 2883 4 457 +450.62 ± 15.94
ncm-et-15 11:24:18 1961573 3324 2863 4 457 +449.5 ± 15.94
20000 17245 32 2723 +450.22 ± 6.52

Test Detail

ID Host Started (UTC) Duration Base NPS Games Wins Losses Draws Elo CLI PGN
156971 ncm-et-10 2022-02-13 04:08 01:04:32 1952980 310 271 0 39 +469.24 ± 55.94
156970 ncm-et-15 2022-02-13 04:06 01:07:22 1964013 324 280 0 44 +455.03 ± 52.45
156969 ncm-et-9 2022-02-13 04:05 01:07:56 1965074 330 280 0 50 +434.54 ± 48.99
156968 ncm-et-13 2022-02-13 04:03 01:10:01 1956065 344 289 1 54 +421.01 ± 47.16
156967 ncm-et-3 2022-02-13 04:02 01:11:10 1963687 344 289 1 54 +421.01 ± 47.16
156966 ncm-et-4 2022-02-13 03:59 01:13:46 1934353 348 304 1 43 +464.15 ± 53.24
156965 ncm-et-10 2022-02-13 02:24 01:43:52 1957708 500 429 0 71 +446.7 ± 40.89
156964 ncm-et-9 2022-02-13 02:23 01:41:16 1966309 500 446 1 53 +494.03 ± 47.75
156963 ncm-et-15 2022-02-13 02:23 01:42:09 1959694 500 442 1 57 +481.1 ± 45.96
156962 ncm-et-13 2022-02-13 02:22 01:39:56 1968468 500 442 1 57 +481.1 ± 45.96
156961 ncm-et-3 2022-02-13 02:19 01:42:32 1950129 500 426 1 73 +436.43 ± 40.36
156960 ncm-et-4 2022-02-13 02:16 01:42:13 1968931 500 422 0 78 +429.05 ± 38.92
156959 ncm-et-15 2022-02-13 00:41 01:41:20 1960770 500 431 0 69 +452.04 ± 41.5
156958 ncm-et-10 2022-02-13 00:40 01:42:59 1953738 500 424 2 74 +429.05 ± 40.11
156957 ncm-et-9 2022-02-13 00:40 01:42:51 1957252 500 434 0 66 +460.32 ± 42.48
156956 ncm-et-13 2022-02-13 00:38 01:43:44 1958326 500 433 0 67 +457.52 ± 42.15
156955 ncm-et-3 2022-02-13 00:37 01:40:48 1965535 500 430 2 68 +444.09 ± 41.92
156954 ncm-et-4 2022-02-13 00:34 01:41:19 1961015 500 446 1 53 +494.03 ± 47.75
156953 ncm-et-15 2022-02-12 22:56 01:43:57 1956639 500 432 0 68 +454.76 ± 41.82
156952 ncm-et-3 2022-02-12 22:55 01:41:07 1965690 500 418 1 81 +417.32 ± 38.22
156951 ncm-et-13 2022-02-12 22:55 01:42:15 1959700 500 416 0 84 +415.04 ± 37.43
156950 ncm-et-10 2022-02-12 22:55 01:44:42 1954837 500 425 1 74 +433.94 ± 40.08
156949 ncm-et-9 2022-02-12 22:54 01:44:40 1949787 500 431 1 68 +449.35 ± 41.89
156948 ncm-et-4 2022-02-12 22:52 01:42:00 1968932 500 441 2 57 +474.93 ± 45.97
156947 ncm-et-13 2022-02-12 21:13 01:41:15 1959695 500 436 2 62 +460.32 ± 43.99
156946 ncm-et-15 2022-02-12 21:13 01:42:55 1967395 500 431 1 68 +449.35 ± 41.89
156945 ncm-et-3 2022-02-12 21:12 01:42:35 1958794 500 433 1 66 +454.76 ± 42.55
156944 ncm-et-9 2022-02-12 21:11 01:42:14 1963083 500 435 0 65 +463.15 ± 42.83
156943 ncm-et-4 2022-02-12 21:11 01:40:30 1962472 500 440 0 60 +477.98 ± 44.67
156942 ncm-et-10 2022-02-12 21:10 01:44:27 1954352 500 440 1 59 +474.93 ± 45.13
156941 ncm-et-13 2022-02-12 19:29 01:43:51 1957102 500 435 0 65 +463.15 ± 42.83
156940 ncm-et-9 2022-02-12 19:28 01:42:28 1962616 500 428 1 71 +441.5 ± 40.95
156939 ncm-et-3 2022-02-12 19:28 01:43:20 1956347 500 417 0 83 +417.31 ± 37.67
156938 ncm-et-15 2022-02-12 19:28 01:44:25 1955572 500 427 1 72 +438.95 ± 40.66
156937 ncm-et-4 2022-02-12 19:27 01:42:35 1960462 500 427 2 71 +436.43 ± 40.99
156936 ncm-et-10 2022-02-12 19:27 01:42:05 1959088 500 424 1 75 +431.48 ± 39.8
156935 ncm-et-15 2022-02-12 17:45 01:42:10 1966931 500 420 1 79 +421.93 ± 38.73
156934 ncm-et-13 2022-02-12 17:45 01:43:04 1958629 500 432 0 68 +454.76 ± 41.82
156933 ncm-et-9 2022-02-12 17:45 01:42:49 1960771 500 437 0 63 +468.95 ± 43.54
156932 ncm-et-4 2022-02-12 17:45 01:42:01 1965534 500 436 3 61 +457.52 ± 44.35
156931 ncm-et-10 2022-02-12 17:45 01:41:31 1961997 500 430 0 70 +449.35 ± 41.19
156930 ncm-et-3 2022-02-12 17:45 01:42:52 1963073 500 436 1 63 +463.16 ± 43.6

Commit

Commit ID cb9c2594fcedc881ae8f8bfbfdf130cf89840e4c
Author Tomasz Sobczyk
Date 2022-02-10 18:54:31 UTC
Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue. Architecture: The diagram of the "SFNNv4" architecture: https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png The most important architectural changes are the following: * 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster. * The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16. * The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future. Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better. First session: The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session. The training was done using the following command: python3 train.py \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.992 \ --lr=8.75e-4 \ --max_epochs=400 \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2 Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones. The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view Second session: The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600). The training was done using the following command: The training was done using the following command: python3 train.py \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.995 \ --lr=4.375e-4 \ --max_epochs=800 \ --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest. The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way: Get the https://github.com/official-stockfish/Stockfish/blob/5640ad48ae5881223b868362c1cbeb042947f7b4/script/interleave_binpacks.py script. Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack Tests: STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7 LLR: 2.94 (-2.94,2.94) <0.00,2.50> Total: 16952 W: 4775 L: 4521 D: 7656 Ptnml(0-2): 133, 1818, 4318, 2076, 131 LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85 LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 14944 W: 4138 L: 3907 D: 6899 Ptnml(0-2): 21, 1499, 4202, 1728, 22 closes https://github.com/official-stockfish/Stockfish/pull/3927 Bench: 4919707
Copyright 2011–2024 Next Chess Move LLC