HQ attacks for AVX2
passed STC (https://tests.stockfishchess.org/tests/view/6a1157dc818cacc1db0ac172):
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 29792 W: 7759 L: 7465 D: 14568
Ptnml(0-2): 75, 3206, 8033, 3514, 68
Also passed STC after cleanups https://tests.stockfishchess.org/tests/view/6a121beb818cacc1db0ac35e
vondele's local test:
Result of 100 runs
base (./stockfish.master ) = 1136025 +/- 2816
test (./stockfish.patch2 ) = 1171370 +/- 2848
diff = +35346 +/- 3275
speedup = +0.0311
P(speedup > 0) = 1.0000
Basically we just do hyperbola quintessence in parallel. AVX2 doesn't have efficient bit reversal so we only do it for file and bishop attacks, then do rank attacks separately with a lookup table. This LUT is much smaller which is why this seems to be faster than standard magics.
closes https://github.com/official-stockfish/Stockfish/pull/6845
No functional change