Show HN: TurboQuant for vector search – 2-4 bit compression

TurboQuant is a new open-source library, implemented in Rust with Python bindings, that provides a highly efficient method for compressing and searching high-dimensional vectors. It's an unofficial implementation of the "TurboQuant" paper from Google Research, set to be published at ICLR 2026. The key innovation lies in its ability to achieve significant compression (2-4 bits per coordinate) with near-optimal distortion, critically, without requiring any data training.

Key features and benefits of TurboQuant include:

Exceptional Compression: Achieves 15.8x compression for 2-bit and 8.0x for 4-bit vectors compared to FP32 representations.
Data-Oblivious Design: Unlike traditional methods such as FAISS Product Quantization, TurboQuant requires no training step, simplifying deployment and enabling online updates without index rebuilds.
Superior Performance: On ARM processors (like Apple Silicon M3 Max), TurboQuant matches or even beats FAISS in search speed, often with higher recall at 4-bit. On x86, it performs within 18-25% of FAISS, also with higher recall for 4-bit.
Faster Indexing: Index building is 3-4x faster than FAISS due to the lack of a training phase.
Mathematical Foundation: The core mechanism involves normalizing vectors, applying a random rotation to make coordinate distributions predictable, and then using Lloyd-Max scalar quantization. Search is performed directly against codebook values using SIMD intrinsics (NEON for ARM, AVX2 for x86).

In essence, TurboQuant offers a compelling alternative for vector search, especially in scenarios where training time, index updates, and memory footprint are critical concerns. Its unique data-oblivious approach and strong benchmark performance position it as a valuable tool for modern AI and machine learning applications.

Show HN: TurboQuant for vector search – 2-4 bit compression

The Lowdown