by Michael Larabel on (#6TC13)
NVIDIA compiler engineers have spent the past several months working on a proposed GCC option -flto-partition=locality for having the compiler optimize the code layout for locality between callees and callers as part of the link-time optimization (LTO) process. For some workloads NVIDIA is finding this -flto-partition=locality compiler option being of significant help for bettering the CPU performance...