Enable popcount and prefetch for ppc-64
PowerPC has had popcount instructions for a long time, at least as far
back as POWER5 (released 2004). Enable them via a gcc builtin.
Using a gcc builtin has the added bonus that if compiled for a processor
that lacks a hardware instruction, gcc will include a software popcount
implementation that does not use the instruction. It might be slower
than the table lookups (or it might be faster) but it will certainly work.
So this isn't going to break anything.
On my POWER8 VM, this leads to a ~4.27% speedup.
Fir prefetch, the gcc builtin generates a 'dcbt' instruction, which is
supported at least as far back as the G5 (2002) and POWER4 (2001).
This leads to a ~5% speedup on my POWER8 VM.
No functional change