HN
Today

thunderbolt-ibverbs: We have InfiniBand at home

A clever hacker engineered a Linux kernel module to transform standard USB4/Thunderbolt ports into high-performance InfiniBand devices, enabling distributed AI workloads on consumer hardware. This innovative project offers enthusiasts a cost-effective alternative to expensive enterprise networking for local AI model training and inference. Hacker News is captivated by this ingenious technical deep dive that democratizes access to powerful computing setups.

9
Score
1
Comments
#14
Highest Rank
9h
on Front Page
First Seen
Jun 4, 9:00 AM
Last Seen
Jun 4, 5:00 PM
Rank Over Time
151416141719202425

The Lowdown

This fascinating project, dubbed 'thunderbolt-ibverbs', details the creation of a custom Linux kernel module that allows ordinary USB4/Thunderbolt ports on consumer AMD mini PCs to emulate InfiniBand devices. The overarching goal is to facilitate distributed AI workloads, such as inference and training with runtimes like vLLM and RCCL, across multiple machines at home, bypassing the need for costly enterprise networking gear.

  • The core achievement is the development of experimental RDMA-over-USB4, enabling high-speed, low-latency communication between two consumer systems.
  • Performance benchmarks demonstrate impressive results, including approximately 95 Gb/s bidirectional raw RDMA throughput and an ultra-low ~7 µs one-way latency.
  • These figures significantly outperform conventional 2.5 GbE (around 2.3 Gb/s and 28 µs) and even soft-RoCE over thunderbolt-net (about 9 Gb/s and 65 µs).
  • Practical applications include successfully running a tensor-parallel inference for a MiniMax-M2.7 model that previously wouldn't fit on a single machine. Additionally, a Gemma 3 27B LoRA FSDP step saw its completion time plummet from 1359 seconds over Ethernet to a mere 126 seconds using the 4-HCA USB4 RDMA setup.
  • The author candidly notes that the code is research-grade, largely AI-generated, involves experimental kernel modules, and comes with no warranty or support.

Ultimately, this initiative presents a highly technical and effective method for bringing high-bandwidth, low-latency networking capabilities—typically reserved for enterprise InfiniBand systems—to readily available consumer hardware via USB4/Thunderbolt, thereby making distributed AI more accessible to home users, albeit in a research-oriented, unsupported capacity.