Linux latency measurements and compositor tuning
A Linux enthusiast dives deep into the intricate world of gaming latency, employing custom hardware to meticulously measure click-to-photon delays. The author uncovers surprising culprits like idle applications and KWin compositor inefficiencies, then crafts precise patches to reclaim precious milliseconds. This detailed, data-driven quest provides a significant stride towards making Linux a more competitive platform for low-latency gaming.
The Lowdown
The article details a comprehensive investigation into gaming latency on Linux, motivated by the author's observation of "floaty" mouse movements after switching from Windows. Utilizing a custom Open Source LDAT setup with a Teensy microcontroller and light sensor, the author performed extensive click-to-photon latency measurements across synthetic tests, various games, and even network streaming scenarios, on both desktop and laptop Linux (NixOS) and Windows 11 systems.
- Initial Findings: Synthetic tests revealed an idle Zed editor application added 3ms of latency, highlighting unexpected system-wide impacts. Display settings like Black Frame Insertion and HDR had measurable but varying effects.
- Gaming Performance: In-game tests across Doom Eternal (Vulkan), Borderlands 3 (DX11/DX12), and Hades 2 (DX12) consistently showed Windows often delivered lower latency. Key Linux-side optimizations identified included prioritizing
wine_wayland, using late FPS limiting, and applyingVKD3D_SWAPCHAIN_LATENCY_FRAMES=1for DX12 titles. - Network Gaming: Experiments with USB/IP and Moonlight for streaming input demonstrated that both could achieve near-local latency over a 2.5GbE network for input, though overall Moonlight streaming showed Windows as slightly more responsive.
- KWin Compositor Deep Dive: The author performed a detailed analysis of the KWin Wayland compositor, identifying several sources of avoidable latency: an overestimation of compositing time, a fixed 1ms "scheduler inaccuracy" term due to Qt's timer rounding to milliseconds, an overly cautious safety margin, and a hard 2ms floor for GPU compositing workload estimations.
- KWin Patches and Results: The author developed and applied patches to KWin, replacing Qt's timer with a nanosecond-precise
timerfd, reducing the safety margin, and replacing the 2ms GPU floor with an adaptive p95 estimator. These changes successfully reduced input-to-present latency to the 3ms range in synthetic tests, restoring fairness between applications and narrowing the latency gap with Windows by approximately 1.1-1.2ms.
This meticulous investigation and subsequent targeted optimizations demonstrate that significant gains in Linux gaming latency are achievable, particularly within the compositor stack. The work paves the way for a more performant Linux desktop gaming experience, with plans to upstream these improvements and continue exploring further optimizations.