Optimize find_nnz() by reducing the size of lookup_indices
https://tests.stockfishchess.org/tests/view/67896b688082388fa0cbfdee
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 452800 W: 118213 L: 117300 D: 217287
Ptnml(0-2): 1638, 50255, 121864, 50842, 1801
It's faster to shrink lookup_indices[] to 8 bit and zero extend to 16 bit using _mm_cvtepu8_epi16() than to read the larger 16 bit version. I suspect that having the constants available at compile time isn't too valuable and can be simplified back to generating at initialization time since this version also almost passed. https://tests.stockfishchess.org/tests/view/67863057460e2910c51de7e0 I will try that as a follow up.
closes https://github.com/official-stockfish/Stockfish/pull/5793
No functional change