Designing DB partitions you don't have to babysit

This article dissects the common pitfalls of database partitioning, particularly when using created_at as the partition key, and proposes a more robust, 'babysit-free' approach. It highlights how naive partitioning choices can force storage decisions into application code, degrade query performance, and create operational headaches. The core recommendation is to partition by the primary key, leveraging its inherent presence in most queries for efficient pruning, and then automate the management of partition boundaries.

The Partition Key Problem: Partitioning by a non-primary key like created_at forces it into primary keys (e.g., (id, created_at)), compromising id's uniqueness guarantee and degrading id-based query performance from const to ref lookups. This also leaks the partition key into application code, requiring it in every WHERE clause for effective pruning.
Ineffective Pruning: Queries not filtering on the partition key scan all partitions, leading to silent performance degradation (no error, just slowness), often discovered only in slow-query logs.
Static Boundaries Don't Scale: Hardcoded partition boundaries quickly become outdated due to changing growth patterns, leading to imbalanced or overgrown partitions (like p_future) that require expensive, manual reorganization.
The Better Approach: Partition by the primary key (e.g., id). Since id is typically monotonically increasing, it's ideal for range partitioning. This ensures most queries (which naturally filter by id) get partition pruning for free, without application code changes or compromising id's unique property.
Automated Management: A small, scheduled background service can manage partition boundaries dynamically. This service periodically splits the MAXVALUE catch-all partition based on observed growth or by deriving time-aligned id boundaries, and automatically drops old partitions for retention.
Service Components: The automation involves inventorying partitions, sizing checks, boundary selection (e.g., SELECT MAX(id) WHERE created_at < 'time_boundary'), splitting partitions, retention pruning, concurrency guards, and comprehensive metrics/alerting.
Extending to Other Strategies: The 'service watches and adjusts' pattern also applies to hash partitioning (detecting and alerting on skew) and list partitioning (promoting hot values from a DEFAULT partition).
Key Takeaway: The optimal partition key is the one already present in every critical query, usually the primary key. Forcing a partition key like created_at into queries where it doesn't naturally belong creates unnecessary complexity and performance overhead.

By leveraging the primary key for partitioning and implementing an automated boundary management service, developers can build highly performant and scalable database systems that adapt to evolving workloads without constant operational intervention or compromising application design principles.

Designing DB partitions you don't have to babysit

The Lowdown