macOS universal binary, take 2
- Add new target `macos-lipo`.
- Created by compiling a universal x86 binary (no PGO) and a standard Apple silicon binary (with PGO), then combining them into a Mach-O fat binary
- To keep only one copy of the net, we add custom loading logic in the x86 section. The executable reads its own path and mmaps the net that's in the ARM section.
- The offset and size (from the executable base) of the mapping is injected after compilation in `patch_x86_slice.sh`
- avx512 on macOS isn't advertised in the xcr0 register by default. The simple solution I came up with is to execute a dummy AVX512 instruction, which sets up the register, before calling `__builtin_cpu_init`.
Some housekeeping as well:
- Rename `armv8-universal` -> `arm64-universal`.
- Add standard copyright headers to the files we've added recently.
Potential follow-ups:
- Disservin's Makefile cleanup
- Alternative ideas for the net loading. In particular, this will error out if the user strips the binary (since that'll invalidate the offset).
closes https://github.com/official-stockfish/Stockfish/pull/6860
No functional change
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>