Epoll vs. io_uring in Linux

This post dives deep into Linux's asynchronous I/O mechanisms, contrasting epoll with the modern io_uring for high-performance applications. It expertly details how io_uring significantly reduces syscall overhead by shifting from a readiness to a completion model. Hacker News readers appreciate this type of low-level system programming content, eagerly discussing practical performance optimizations and potential adoption challenges.

Score

Comments

Highest Rank

19h

on Front Page

First Seen

Jun 20, 11:00 PM

Last Seen

Jun 21, 5:00 PM

Rank Over Time

The Lowdown

The author shares their journey of optimizing a reverse proxy server, TinyGate, detailing the evolution from an initial simple design to one leveraging epoll, and finally to a full rewrite using io_uring. This process highlighted the inherent limitations of epoll and the substantial performance gains offered by its successor.

epoll's Overhead: epoll operates on a readiness model, notifying applications when I/O is possible. This requires separate read()/write() syscalls, leading to two syscalls per I/O event and significant context-switching overhead under heavy load.
io_uring's Efficiency: Introduced in 2019, io_uring uses a completion model, informing applications when I/O is done. It leverages shared memory ring buffers between user and kernel space, allowing for batching of I/O operations into a single io_uring_enter() syscall, or even near-zero syscalls with IORING_SETUP_SQPOLL (at the cost of CPU burn).
Architectural Shift: The transition from epoll's readiness-based polling to io_uring's completion-based notification represents a fundamental architectural change, moving more I/O management work into the kernel.
Practical Examples: The article provides clear C code examples demonstrating the implementation of both epoll and io_uring for a simple stdin event, illustrating their respective complexities and syscall counts.
Advanced io_uring Features: Key benefits include true zero-copy I/O with registered buffers and asynchronous error handling, but SQPOLL's continuous CPU usage is noted.

Ultimately, the author asserts that io_uring is the definitive modern standard for asynchronous I/O on Linux, recommending its use for new projects on contemporary kernel versions.

The Gossip

Pinning & Proxies: Advanced Performance Plays

Beyond `io_uring`, commenters offered additional, highly technical performance optimizations for proxy servers. Suggestions included CPU pinning for threads and listen sockets, managing source port selection to align with NIC hashing, and exploring libraries like `concurrencykit`, `mimalloc`, and `libxdp` for zero-copy and memory-aligned operations, all aimed at minimizing cross-CPU communication and maximizing efficiency.

Security Scrutiny & Widespread Woes

A significant point of discussion revolved around `io_uring`'s security. Critics raised concerns about its direct memory sharing between kernel and user-land, pointing to a history of exploits as a reason why projects, like Go, are hesitant to fully integrate it. However, others clarified that the ring buffers are shared memory, not kernel-private, and highlighted that major enterprise Linux distributions like RHEL 9 and 10 now fully support `io_uring` by default, suggesting that runtime feature detection could mitigate some adoption risks.

Benchmark Baselessness & CPU Conundrums

Some commenters questioned the article's 'benchmark focus,' though the post itself contained no explicit benchmarks. A notable observation was a user reporting increased CPU utilization after switching to `io_uring` in a database server. This was clarified as a common and often beneficial outcome: rather than the CPU idling during I/O waits, it's now actively processing work, potentially leading to higher throughput. The consensus was that throughput, not merely CPU usage, is the ultimate measure of performance improvement.