Speed up max_piece_type()
Write code in the way that allows compiler to perform loop unrolling.
My measurement (32 cycles each):
Orig:
Time (Mean: 2466.59375, Trimmed mean: 2464.25, Std: 12.6869487803348)
Nodes (Mean: 4294458, Trimmed mean: 4294458, Std: 0)
Speed (Mean: 1741.09247987678, Trimmed mean: 1742.72879715475, Std: 8.93612608292678)
Time (Mean: 2470.15625, Trimmed mean: 2468.75, Std: 12.7484581610433)
Nodes (Mean: 4294458, Trimmed mean: 4294458, Std: 0)
Speed (Mean: 1738.58176151341, Trimmed mean: 1739.54618465403, Std: 8.95585822316946)
Mod:
Time (Mean: 2449.90625, Trimmed mean: 2445.9375, Std: 12.1000116635508)
Nodes (Mean: 4294458, Trimmed mean: 4294458, Std: 0)
Speed (Mean: 1752.94829372932, Trimmed mean: 1755.75934908231, Std: 8.61478453124504)
Time (Mean: 2442.78125, Trimmed mean: 2441.1875, Std: 8.17839157228837)
Nodes (Mean: 4294458, Trimmed mean: 4294458, Std: 0)
Speed (Mean: 1758.03872783803, Trimmed mean: 1759.16825356261, Std: 5.81131316346191)
No functional change