Update default net to nn-a3d1bfca1672.nnue
faster permutation of master net weights
Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing
Permutation found using https://github.com/Ergodice/nnue-pytorch/blob/836387a0e5e690431d404158c46648710f13904d/ftperm.py
See also https://github.com/glinscott/nnue-pytorch/pull/254
The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method.
Measured speedup:
```
CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Result of 50 runs
base (./stockfish.master ) = 1561556 +/- 5439
test (./stockfish.patch ) = 1575788 +/- 5427
diff = +14231 +/- 2636
speedup = +0.0091
P(speedup > 0) = 1.0000
```
closes https://github.com/official-stockfish/Stockfish/pull/4640
No functional change