HN
Today

The real cost of random I/O

This post challenges the long-held PostgreSQL random_page_cost default of 4.0, demonstrating through experiments that the actual cost on modern SSDs is often significantly higher (25-35). The author shows how this discrepancy leads to suboptimal query plans where the database chooses slower sequential scans over faster index scans. It's a deep dive into database internals, appealing to performance-conscious developers and database administrators who tune PostgreSQL for optimal performance.

13
Score
0
Comments
#10
Highest Rank
9h
on Front Page
First Seen
Mar 1, 12:00 PM
Last Seen
Mar 1, 8:00 PM
Rank Over Time
121012161517151621

The Lowdown

The article investigates PostgreSQL's random_page_cost parameter, a crucial factor in query planning that has been set to 4.0 by default for 25 years. Despite common recommendations to lower this value for modern flash storage, the author presents empirical evidence suggesting the actual cost of random I/O relative to sequential I/O is much higher than the default.

  • Experimental Setup: The author sets up an experiment to measure the actual cost ratio of random to sequential page reads. This involves generating a large table, performing both sequential and index scans (which represent random I/O), and carefully controlling caching effects using direct I/O and specific effective_cache_size settings.
  • Initial Findings: A direct calculation from experiment timings reveals a random_page_cost ratio of approximately 25.2, significantly higher than the default 4.0.
  • Broader Experimentation: Repeating the experiment on various systems (local SSDs, remote Azure SSDs) consistently yields estimated random_page_cost values between 25-35 for local SSDs, and even higher for remote storage.
  • Impact on Query Planning: This misaligned cost parameter can cause the PostgreSQL query planner to make incorrect decisions, often selecting a sequential scan when an index scan would be considerably faster, leading to queries taking up to 10 times longer.
  • Bitmap Scans as Mitigation: The article notes that bitmap scans, which convert random I/O into more sequential patterns and support prefetching, often mitigate this issue by providing a more efficient access path for a wider range of selectivities.
  • The Role of Prefetching: Prefetching, while beneficial for performance, is not considered in the current cost model. Disabling prefetching in experiments significantly lowers the calculated random_page_cost (from ~30 to ~10) because it primarily slows down sequential scans while not affecting index scans (which don't yet support prefetching).
  • Justifications for Lowering random_page_cost: Despite the data, the author explores valid reasons why experienced professionals might still recommend lowering random_page_cost, such as scenarios with high cache hit ratios, the desire to reduce cache pressure, or as a blunt instrument to correct other estimation errors.

The author concludes that while random_page_cost tuning can be a powerful lever, it's crucial to understand the underlying mechanisms and monitor performance closely. The piece identifies several areas for future improvement in PostgreSQL's costing model, including separating non-I/O costs, improving statistics for cached data, and incorporating prefetching into cost calculations.