HN
Today

Fc, a lossless compressor for floating-point streams

fc is a new research-grade lossless compressor specifically designed for IEEE-754 64-bit floating-point streams, leveraging a novel competitive codec approach on adaptively sized blocks. While it sacrifices encoding speed for maximum compression, especially on structured data, its remarkably fast parallel decoding makes it ideal for read-heavy use cases like time-series databases. HN finds this compelling due to its intricate optimization details, benchmark-driven comparisons, and focus on a niche but critical data type.

13
Score
4
Comments
#10
Highest Rank
11h
on Front Page
First Seen
May 13, 1:00 AM
Last Seen
May 13, 11:00 AM
Rank Over Time
1011161721212021232425

The Lowdown

fc is a sophisticated, research-grade lossless compressor engineered for streams of IEEE-754 64-bit doubles. Its core innovation lies in its adaptive block processing and a "mode competition" where numerous specialized codecs vie to produce the smallest output for each data block. This highly optimized library, with its multi-threaded and hand-vectorized x86-64 implementation, aims for superior compression ratios in specific floating-point data scenarios.

  • Specialized Compression: Targets IEEE-754 64-bit double-precision floating-point numbers, offering lossless compression.
  • Adaptive Block Processing: Divides input into adaptively sized blocks (quanta), ranging from 256 KiB to 1 MiB, based on data entropy.
  • Competitive Codec Selection: For each block, fc evaluates a diverse set of 50 specialized compression modes (including predictors, XOR/delta, Lempel-Ziv, and floating-point specific algorithms) and selects the one yielding the best compression.
  • Performance Profile: Achieves an impressive average compression ratio of 3.07 in benchmarks, often significantly outperforming competitors like zstd and fpzip on structured or periodic floating-point data.
  • Asymmetric Throughput: Prioritizes compression ratio and decode speed over encode speed. It features very fast, parallel decoding at ~1.28 GB/s, making it suitable for write-once, read-many applications, though its encoding speed is slower than general-purpose compressors.
  • Hardware Requirements: Requires x86-64 CPUs with AVX2, SSE4.2, BMI, and LZCNT extensions due to its hand-vectorized hot paths; there is no portable fallback.
  • Research-Grade Status: Currently a single-file library with a versioned but unstable on-disk format, indicating its ongoing development and experimental nature.

In essence, fc presents itself as a potent tool for developers grappling with the storage and retrieval of large volumes of floating-point data where maximizing compression ratio and minimizing decode latency are paramount. While its niche focus and hardware dependencies mean it won't be a universal solution, its targeted optimization offers significant advantages for specific scientific, financial, and time-series data workloads.