split affine_transform on VNNI as well
```
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 184544 W: 47990 L: 47474 D: 89080
Ptnml(0-2): 551, 20238, 50206, 20698, 579
```
Torom measured
```
sf_base = 2538696 +/- 2915 (95%)
sf_test = 2546510 +/- 3011 (95%)
diff = 7814 +/- 4146 (95%)
speedup = 0.30782% +/- 0.163% (95%)
```
I get something similar. The benefit would be larger if we ever decide
to further increase the L2 size I think
closes https://github.com/official-stockfish/Stockfish/pull/6683
No functional change