HN
Today

io_uring, libaio performance across Linux kernels and an unexpected IOMMU trap

This technical benchmark delves into the performance disparities between io_uring and libaio across various Linux kernel versions, revealing io_uring's ~2x advantage. The most striking discovery, however, is a ~30% performance hit due to IOMMU being enabled by default in newer kernels, prompting deep technical discussions. It's a prime example of how subtle system-level changes can have significant, unexpected impacts on I/O throughput.

14
Score
6
Comments
#14
Highest Rank
5h
on Front Page
First Seen
Mar 24, 3:00 PM
Last Seen
Mar 24, 7:00 PM
Rank Over Time
1416192526

The Lowdown

The article presents a detailed performance comparison between io_uring and libaio, two asynchronous I/O interfaces, across a range of Linux kernel versions (5.4 to 7.0-rc3). While io_uring demonstrated expected superior performance, offering approximately a 2x improvement over libaio, the most significant finding was an unexpected performance regression.

  • The study primarily focused on 4K random write operations, chosen as a representative workload for database patterns and for its effectiveness in measuring software latency on NVMe devices.
  • A substantial ~30% performance degradation was observed in newer kernels.
  • This regression was directly attributed to the IOMMU (Input/Output Memory Management Unit) being enabled by default in these kernel versions.

In conclusion, this research provides valuable insights into the evolution of Linux I/O performance, underscoring not only the benefits of modern interfaces like io_uring but also the critical, sometimes hidden, performance implications of default kernel configurations like IOMMU.

The Gossip

I/O Interrogations: Understanding IOMMU's Impact & Benchmark Choices

The Hacker News discussion immediately zeros in on the technical specifics of the findings. Commenters question the choice of 4K random writes as the primary workload, prompting the author to explain it as a common DBMS pattern that simplifies software latency measurement. A key thread involves tanelpoder providing an extremely detailed breakdown of how to diagnose IOMMU-related interrupt overhead, suggesting specific procfs and bcc-tools metrics, and perf record analysis. The author acknowledges these suggestions and expresses interest in incorporating such measurements into future experiments, confirming the IOMMU overhead is related to interrupt-based I/O completion. There's also speculation about how IOPOLL mode, despite avoiding interrupts, might still be impacted by IOMMU's DMA translation.