Tracking down a 25% Regression on LLVM RISC-V
This post meticulously tracks down a 25% performance regression in LLVM's RISC-V backend, revealing how a seemingly beneficial compiler optimization inadvertently broke a subsequent narrowing pass. The author's deep dive into assembly and LLVM IR, coupled with a successful patch, offers a masterclass in complex compiler debugging. This type of detailed technical exploration and problem-solving resonates strongly with the Hacker News community.
The Lowdown
A blog post details the intricate process of identifying and resolving a significant performance regression within the LLVM compiler for RISC-V targets. The issue led to LLVM generating less efficient code compared to GCC, costing nearly 25% in additional cycles for a specific benchmark.
- The problem was first identified when benchmarking LLVM against GCC on a SiFive P550 CPU, where LLVM showed an ~8% higher cycle count.
- Assembly analysis revealed LLVM was using
fdiv.d(double precision float division, 33 cycles latency) instead offdiv.s(single precision, 19 cycles latency) in a critical loop, unlike GCC or older LLVM builds. - Using
llvm-mcaand comparing LLVM IR at different optimization stages, the author pinpointed that the middle-end optimization pipeline was failing to narrow a double-precision calculation to single-precision. - The root cause was a recent LLVM commit (190235) that improved
isKnownExactCastIntToFPto fold certainfpextoperations. While an improvement in itself, this change removed an intermediatefpextinstruction that a downstreamvisitFPTruncpass relied upon to perform the float narrowing. - The fix involved extending
getMinimumFPTypewith range analysis and introducingcanBeCastedExactlyIntToFPto allowvisitFPTruncto recognize and perform the necessary narrowing optimization even without the explicitfpextinstruction. - The successful patch (190550) restored the optimization, eliminating the
fdiv.dinstruction and resulting in a 25% performance improvement for the benchmark.
This detailed forensic investigation into compiler behavior highlights the delicate balance and complex interactions within optimization passes, where improvements in one area can inadvertently create regressions elsewhere, underscoring the continuous challenge of compiler development.