HN
Today

What Are Skiplists Good For?

This article delves into skiplists, an often-overlooked randomized data structure, and reveals how a unique adaptation called a "skiptree" proved indispensable for Antithesis. It highlights a clever, albeit complex, solution to efficiently query hierarchical data in analytical databases like Google BigQuery. The story appeals to engineers who appreciate ingenious applications of fundamental computer science to solve real-world, high-performance problems.

16
Score
2
Comments
#4
Highest Rank
20h
on Front Page
First Seen
Apr 19, 5:00 AM
Last Seen
Apr 20, 12:00 AM
Rank Over Time
656444446811121111131621192022

The Lowdown

The author recounts his journey from dismissing skiplists as a niche data structure to discovering their profound utility in a generalized form, the "skiptree," at his company, Antithesis. This technical deep dive explains both the problem faced and the innovative solution derived from adapting an obscure data structure.

  • Skiplists Explained: A skiplist is a probabilistic data structure acting as a drop-in replacement for binary search trees, offering O(log n) performance. It functions as a linked list with multiple "express lanes" at progressively higher levels, allowing faster traversal, and is known for relatively simple concurrent implementations.
  • The Antithesis Problem: Antithesis needed to analyze branching timelines generated by their fuzzer, requiring frequent ancestor lookups in a tree-like data structure. Storing this in Google BigQuery, an analytical database optimized for scans, led to inefficient O(depth) point lookups for each step of an ancestor query.
  • Traditional Solutions Shortcomings: Using an OLTP database for the tree structure alongside BigQuery for bulk data would introduce complex two-phase commit consistency issues, which the author wanted to avoid. BigQuery's loose consistency further complicated such an approach.
  • Introducing Skiptrees: The solution was a novel data structure called a "skiptree," essentially a generalization of skiplists for trees. It involves a hierarchy of trees, where each path from root to leaf in the original tree forms a skiplist structure across these levels.
  • Implementation & Benefits: Skiptrees were stored across multiple SQL tables (one for each level), using next_level_ancestor and ancestors_between columns. This allowed ancestor lookups to be performed using a fixed number of JOIN operations rather than recursive point lookups. While the resulting SQL queries were large, this approach leveraged BigQuery's pricing model (data scanned, not compute) and significantly reduced costs and improved performance for Antithesis for six years.

Ultimately, the author acknowledges that his "skiptree" shared similarities with existing "skip graphs," reinforcing the idea that innovative solutions often echo prior work. The core message is the unexpected value of obscure data structures and how even a somewhat naive implementation of a skiplist concept can offer significant performance gains in challenging scenarios.