Use 128 bit multiply for TT index
Remove super cluster stuff from TT and just use a 128 bit multiply.
STC https://tests.stockfishchess.org/tests/view/5ee719b3aae8aec816ab7548
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 12736 W: 2502 L: 2333 D: 7901
Ptnml(0-2): 191, 1452, 2944, 1559, 222
LTC https://tests.stockfishchess.org/tests/view/5ee732d1aae8aec816ab7556
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 27584 W: 3431 L: 3350 D: 20803
Ptnml(0-2): 173, 2500, 8400, 2511, 208
Scheme back to being derived from https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
Also the default optimized version of the index calculation now uses fewer instructions.
https://godbolt.org/z/Tktxbv
Might benefit from mulx (requires -mbmi2)
closes https://github.com/official-stockfish/Stockfish/pull/2744
bench: 4320954