HN
Today

Quack: The DuckDB Client-Server Protocol

DuckDB, primarily known for its in-process analytical capabilities, has released Quack, a new client-server protocol. This innovation allows multiple DuckDB instances to communicate, enabling concurrent writes and distributed use cases for the first time. It's built on HTTP, optimized for both bulk data transfer and small transactions, and addresses a long-standing user desire to extend DuckDB's utility beyond single-process environments.

20
Score
0
Comments
#6
Highest Rank
17h
on Front Page
First Seen
May 12, 7:00 PM
Last Seen
May 13, 11:00 AM
Rank Over Time
1310867101314141516151417171818

The Lowdown

DuckDB, a database celebrated for its in-process architecture and interactive data science applications, has introduced a significant new feature: the "Quack" client-server protocol. This development marks a strategic expansion, allowing DuckDB instances to interact remotely and support multi-process concurrent writes, addressing limitations that previously necessitated custom workarounds.

  • Background and Motivation: Historically, DuckDB operated as an embedded database within a single process, ideal for analytics in Python notebooks. However, this architecture struggled with scenarios requiring multiple processes to modify the same database concurrently, leading users to build custom RPC solutions or use extensions.
  • Introducing Quack: The new protocol, playfully named Quack, enables DuckDB instances to communicate, with each instance capable of acting as both client and server. It's designed for ease of setup, requiring just an extension installation and simple commands to serve or attach to a remote DuckDB instance.
  • Protocol Design Principles: Quack is built atop HTTP, leveraging its proven efficiency and ecosystem for load balancing, authentication, and security. It employs a request-response pattern with optimized round-trips for latency-sensitive queries and uses DuckDB's battle-tested internal serialization primitives, giving the team full control over future innovations.
  • Security Considerations: By default, Quack generates random authentication tokens and binds to localhost. For internet exposure, it strongly recommends using reverse proxies like Nginx to handle SSL and other security measures.
  • Performance Benchmarks: Impressively, Quack outperforms both Apache Arrow Flight SQL and PostgreSQL in bulk data transfers, moving 60 million rows in under 5 seconds. For small transactional writes, Quack also exceeds PostgreSQL's performance up to 8 parallel threads, demonstrating its versatility.
  • Expanding Use Cases: Quack unlocks a "multiplayer" experience for DuckDB, supporting centralized state management and integrations like remote Catalog servers for DuckLake. This positions DuckDB as a more central component in modern data architectures.
  • Future Roadmap: Upcoming plans include tighter integration with DuckLake, a production release with DuckDB v2.0, improved syntax, enhanced transaction scaling, and potential replication capabilities.
  • Rationale for Custom Protocol: The DuckDB team opted to build Quack rather than using existing solutions like Arrow Flight SQL to maintain full control over internal serialization for innovation and to achieve single-round-trip query execution, which Arrow Flight SQL's design inherently limits.

Overall, the Quack protocol represents a pivotal evolution for DuckDB, enabling it to move beyond its in-process origins to tackle a broader spectrum of distributed and concurrent use cases while maintaining its hallmark performance and developer-friendly approach.