How we made geo joins 400× faster with H3 indexes
Geospatial joins are notorious performance killers in databases, often forcing quadratic complexity on expensive spatial predicates. This post from Floe details how they achieve a stunning 400x speedup by automatically rewriting these queries. They leverage Uber's H3 indexes to transform slow spatial predicates into efficient, distributable integer hash joins, drastically reducing the number of exact checks required.
The Lowdown
Geospatial joins, while seemingly simple, notoriously cripple database performance at scale due to the computational cost of spatial predicates and the inability to use efficient hash joins. Floe's approach, detailed in this post, offers a dramatic solution by automatically rewriting these queries to leverage hierarchical geospatial indexing.
- The Problem: Standard geo joins (e.g., using
ST_Intersects) are inefficient because spatial predicates lack a clean join key, forcing databases into slow loop joins with quadratic complexity and expensive individual predicate evaluations. - The Solution: H3 Indexes: Floe utilizes Uber's H3 system, which divides the Earth into a hierarchy of hexagonal cells, each identified by a compact
BIGINTkey. This allows geometries to be represented as sets of cell IDs. - Rewrite Strategy: The core idea is to transform the
ST_Intersectspredicate into an equi-join on H3 cells. If two shapes intersect, their H3 cell sets will overlap. - Multi-Stage Query: Floe rewrites the original query into a process that first generates H3 coverage for both tables, performs a fast integer equi-join on these H3 cells (acting as a pre-filter), deduplicates the resulting candidates, and then applies the exact spatial predicate only to this much smaller set of potential matches.
- Key Benefits: This rewrite allows the database to perform efficient, distributable hash joins on cheap integer keys. The expensive spatial predicate is relegated to a final, targeted cleanup step, rather than being the primary bottleneck.
- On-the-Fly Indexing: Floe computes H3 coverage at query time, avoiding the need for extra storage or index maintenance, and supporting complex query structures like views and CTEs.
- Performance Gains: Benchmarks on a country-city join showed a 400x speedup (from 459 seconds to 1.17 seconds) at H3 resolution 3. This improvement stems from reducing the number of
ST_Intersectscalls by 99.6%. - Approximation and Correctness: The H3 pre-filter is an approximation, designed to produce false positives (extra candidates) but no false negatives, ensuring correctness is maintained by the final, exact spatial predicate recheck. The choice of H3 resolution is a trade-off between reducing false positives and the cost of generating more cells.
By intelligently leveraging H3 indexes to rewrite spatial predicates into efficient set operations, Floe dramatically optimizes geospatial joins, demonstrating a powerful method to overcome a significant database performance bottleneck.