A distributed queue in a single JSON file on object storage

The article outlines how Turbopuffer replaced its internal indexing job queue, a critical component for scheduling asynchronous work, by ingeniously leveraging object storage. Facing issues with sharded queues causing bottlenecks, the team sought a solution that offered better performance and reliability with simpler operational overhead. The result is a highly efficient distributed queue built around a single JSON file, demonstrating the power of understanding and utilizing fundamental object storage characteristics.

The initial design, queue.json, involved clients (pushers and workers) directly interacting with a single JSON file on object storage, using compare-and-set (CAS) for atomic updates. This provided strong consistency but was limited by object storage write latency (e.g., ~1 request per second on GCS).
To improve throughput, "group commit" was introduced, buffering requests in memory and batching writes to the object storage. This decoupled the write rate from the request rate, shifting the bottleneck to network bandwidth but still facing contention from multiple writers.
The solution to contention was a "brokered group commit," where a single stateless broker process handled all interactions with object storage. This centralized write operations, running a single group commit loop on behalf of all clients, effectively scaling write operations for many clients.
For high-availability, the system was enhanced with mechanisms for broker and worker fault tolerance. Broker addresses are stored in queue.json, allowing clients to find new brokers if one fails, with CAS ensuring correctness during transitions. Worker failures are handled via heartbeats, enabling other workers to reclaim uncompleted jobs. By incrementally building upon object storage's simple yet powerful primitives, Turbopuffer developed a distributed queue that is reliable, scalable, and provides at-least-once delivery with significant performance improvements. This approach highlights how a deep understanding of underlying infrastructure can lead to elegant and highly performant solutions, even for complex distributed system challenges.

A distributed queue in a single JSON file on object storage

The Lowdown