thunderbolt-ibverbs: We have InfiniBand at home
A clever hacker engineered a Linux kernel module to transform standard USB4/Thunderbolt ports into high-performance InfiniBand devices, enabling distributed AI workloads on consumer hardware. This innovative project offers enthusiasts a cost-effective alternative to expensive enterprise networking for local AI model training and inference. Hacker News is captivated by this ingenious technical deep dive that democratizes access to powerful computing setups.
The Lowdown
This fascinating project, dubbed 'thunderbolt-ibverbs', details the creation of a custom Linux kernel module that allows ordinary USB4/Thunderbolt ports on consumer AMD mini PCs to emulate InfiniBand devices. The overarching goal is to facilitate distributed AI workloads, such as inference and training with runtimes like vLLM and RCCL, across multiple machines at home, bypassing the need for costly enterprise networking gear.
- The core achievement is the development of experimental RDMA-over-USB4, enabling high-speed, low-latency communication between two consumer systems.
- Performance benchmarks demonstrate impressive results, including approximately 95 Gb/s bidirectional raw RDMA throughput and an ultra-low ~7 µs one-way latency.
- These figures significantly outperform conventional 2.5 GbE (around 2.3 Gb/s and 28 µs) and even soft-RoCE over
thunderbolt-net(about 9 Gb/s and 65 µs). - Practical applications include successfully running a tensor-parallel inference for a MiniMax-M2.7 model that previously wouldn't fit on a single machine. Additionally, a Gemma 3 27B LoRA FSDP step saw its completion time plummet from 1359 seconds over Ethernet to a mere 126 seconds using the 4-HCA USB4 RDMA setup.
- The author candidly notes that the code is research-grade, largely AI-generated, involves experimental kernel modules, and comes with no warranty or support.
Ultimately, this initiative presents a highly technical and effective method for bringing high-bandwidth, low-latency networking capabilities—typically reserved for enterprise InfiniBand systems—to readily available consumer hardware via USB4/Thunderbolt, thereby making distributed AI more accessible to home users, albeit in a research-oriented, unsupported capacity.