Epoll vs. io_uring in Linux
This post dives deep into Linux's asynchronous I/O mechanisms, contrasting epoll with the modern io_uring for high-performance applications. It expertly details how io_uring significantly reduces syscall overhead by shifting from a readiness to a completion model. Hacker News readers appreciate this type of low-level system programming content, eagerly discussing practical performance optimizations and potential adoption challenges.
The Lowdown
The author shares their journey of optimizing a reverse proxy server, TinyGate, detailing the evolution from an initial simple design to one leveraging epoll, and finally to a full rewrite using io_uring. This process highlighted the inherent limitations of epoll and the substantial performance gains offered by its successor.
- epoll's Overhead:
epolloperates on a readiness model, notifying applications when I/O is possible. This requires separateread()/write()syscalls, leading to two syscalls per I/O event and significant context-switching overhead under heavy load. - io_uring's Efficiency: Introduced in 2019,
io_uringuses a completion model, informing applications when I/O is done. It leverages shared memory ring buffers between user and kernel space, allowing for batching of I/O operations into a singleio_uring_enter()syscall, or even near-zero syscalls withIORING_SETUP_SQPOLL(at the cost of CPU burn). - Architectural Shift: The transition from
epoll's readiness-based polling toio_uring's completion-based notification represents a fundamental architectural change, moving more I/O management work into the kernel. - Practical Examples: The article provides clear C code examples demonstrating the implementation of both
epollandio_uringfor a simplestdinevent, illustrating their respective complexities and syscall counts. - Advanced io_uring Features: Key benefits include true zero-copy I/O with registered buffers and asynchronous error handling, but
SQPOLL's continuous CPU usage is noted.
Ultimately, the author asserts that io_uring is the definitive modern standard for asynchronous I/O on Linux, recommending its use for new projects on contemporary kernel versions.
The Gossip
Pinning & Proxies: Advanced Performance Plays
Beyond `io_uring`, commenters offered additional, highly technical performance optimizations for proxy servers. Suggestions included CPU pinning for threads and listen sockets, managing source port selection to align with NIC hashing, and exploring libraries like `concurrencykit`, `mimalloc`, and `libxdp` for zero-copy and memory-aligned operations, all aimed at minimizing cross-CPU communication and maximizing efficiency.
Security Scrutiny & Widespread Woes
A significant point of discussion revolved around `io_uring`'s security. Critics raised concerns about its direct memory sharing between kernel and user-land, pointing to a history of exploits as a reason why projects, like Go, are hesitant to fully integrate it. However, others clarified that the ring buffers are shared memory, not kernel-private, and highlighted that major enterprise Linux distributions like RHEL 9 and 10 now fully support `io_uring` by default, suggesting that runtime feature detection could mitigate some adoption risks.
Benchmark Baselessness & CPU Conundrums
Some commenters questioned the article's 'benchmark focus,' though the post itself contained no explicit benchmarks. A notable observation was a user reporting increased CPU utilization after switching to `io_uring` in a database server. This was clarified as a common and often beneficial outcome: rather than the CPU idling during I/O waits, it's now actively processing work, potentially leading to higher throughput. The consensus was that throughput, not merely CPU usage, is the ultimate measure of performance improvement.