HN
Today

Tracking down a 25% Regression on LLVM RISC-V

This post meticulously tracks down a 25% performance regression in LLVM's RISC-V backend, revealing how a seemingly beneficial compiler optimization inadvertently broke a subsequent narrowing pass. The author's deep dive into assembly and LLVM IR, coupled with a successful patch, offers a masterclass in complex compiler debugging. This type of detailed technical exploration and problem-solving resonates strongly with the Hacker News community.

19
Score
2
Comments
#7
Highest Rank
14h
on Front Page
First Seen
Apr 13, 5:00 PM
Last Seen
Apr 14, 6:00 AM
Rank Over Time
7791013181718182122262829

The Lowdown

A blog post details the intricate process of identifying and resolving a significant performance regression within the LLVM compiler for RISC-V targets. The issue led to LLVM generating less efficient code compared to GCC, costing nearly 25% in additional cycles for a specific benchmark.

  • The problem was first identified when benchmarking LLVM against GCC on a SiFive P550 CPU, where LLVM showed an ~8% higher cycle count.
  • Assembly analysis revealed LLVM was using fdiv.d (double precision float division, 33 cycles latency) instead of fdiv.s (single precision, 19 cycles latency) in a critical loop, unlike GCC or older LLVM builds.
  • Using llvm-mca and comparing LLVM IR at different optimization stages, the author pinpointed that the middle-end optimization pipeline was failing to narrow a double-precision calculation to single-precision.
  • The root cause was a recent LLVM commit (190235) that improved isKnownExactCastIntToFP to fold certain fpext operations. While an improvement in itself, this change removed an intermediate fpext instruction that a downstream visitFPTrunc pass relied upon to perform the float narrowing.
  • The fix involved extending getMinimumFPType with range analysis and introducing canBeCastedExactlyIntToFP to allow visitFPTrunc to recognize and perform the necessary narrowing optimization even without the explicit fpext instruction.
  • The successful patch (190550) restored the optimization, eliminating the fdiv.d instruction and resulting in a 25% performance improvement for the benchmark.

This detailed forensic investigation into compiler behavior highlights the delicate balance and complex interactions within optimization passes, where improvements in one area can inadvertently create regressions elsewhere, underscoring the continuous challenge of compiler development.