Use tiling to speed up accumulator refreshes and updates
Perform the update and refresh operations tile by tile in a local
array of vectors. By selecting the array size carefully, we
achieve that the compiler keeps the whole array in vector registers.
Idea and original implementation by @sf-x.
STC: https://tests.stockfishchess.org/tests/view/5f623eec912c15f19854b855
LLR: 2.94 (-2.94,2.94) {-0.25,1.25}
Total: 4872 W: 623 L: 477 D: 3772
Ptnml(0-2): 14, 350, 1585, 450, 37
LTC: https://tests.stockfishchess.org/tests/view/5f62434e912c15f19854b860
LLR: 2.94 (-2.94,2.94) {0.25,1.25}
Total: 25808 W: 1565 L: 1401 D: 22842
Ptnml(0-2): 23, 1186, 10332, 1330, 33
closes https://github.com/official-stockfish/Stockfish/pull/3130
No functional change