HN
Today

Healthchecks.io now uses self-hosted object storage

Healthchecks.io, a popular monitoring service, details its move from managed S3 storage to a self-hosted solution powered by Versity S3 Gateway on Btrfs. The post highlights the challenges faced with cloud providers' performance and reliability, making a strong case for taking back control of infrastructure at a specific scale. This technical deep dive into infrastructure choices, cost-benefit analysis, and filesystem reliability sparked a lively debate among Hacker News users about self-hosting trade-offs and S3 API compatibility.

88
Score
47
Comments
#4
Highest Rank
12h
on Front Page
First Seen
Apr 17, 3:00 PM
Last Seen
Apr 18, 4:00 AM
Rank Over Time
46971012172224273028

The Lowdown

Healthchecks.io, a service that monitors cron jobs and scheduled tasks, recently transitioned its object storage solution. Facing recurring reliability and performance issues with various managed S3 providers like OVHcloud and UpCloud, the team opted for a self-hosted approach for storing HTTP POST request bodies of ping endpoints.

  • The service stores ping request bodies up to 100KB, either in PostgreSQL for tiny payloads or S3-compatible object storage for larger ones.
  • Managed S3 options (AWS, OVHcloud, UpCloud) presented issues such as per-request pricing concerns (AWS), and deteriorating performance and reliability over time (OVHcloud, UpCloud).
  • Current object storage usage includes 14 million objects totaling 119GB, with an average size of 8KB, and around 30 uploads per second with constant churn.
  • Self-hosted options like Minio, SeaweedFS, and Garage were initially considered but rejected due to their operational complexity for a one-person team.
  • The chosen solution is Versity S3 Gateway, which converts a local filesystem into an S3 server, praised for its simplicity, direct file-based operations, and ease of upgrades.
  • The setup involves a dedicated server with NVMe drives in RAID 1, utilizing a Btrfs filesystem to avoid inode limits, and a robust backup strategy.
  • Post-migration, Healthchecks.io observed significant improvements in S3 operation latency and a reduction in the queue of ping bodies waiting to be uploaded.
  • While the self-hosted solution increased costs due to an additional dedicated server, the author believes the improved performance and reliability justify the expenditure.

The author expresses cautious optimism about the new system, acknowledging the trade-offs in cost and potential single-point-of-failure risks, but highlighting the operational simplicity and performance gains as a clear improvement over previous managed solutions.

The Gossip

Gateway Guidance & S3 Savvy

Many commenters were previously unaware of Versity S3 Gateway, highlighting its utility for turning local filesystems into S3-compatible storage. This led to a discussion on the broader implications of using an S3 API locally, with some questioning its necessity when direct filesystem access is available, while others defended it for maintaining API compatibility, reducing refactoring, and enabling shared access across multiple application servers. The debate also touched on whether AWS's S3 API inherently creates vendor lock-in, with proponents arguing its widespread adoption and open-source implementations negate this, while skeptics pointed to its custom nature compared to older standards like WebDAV.

Btrfs Blues & Filesystem Frights

A notable portion of the discussion revolved around the choice of Btrfs as the underlying filesystem, with several users expressing past negative experiences, often referring to 'Btrfs PTSD' due to historical corruption bugs and panics. Commenters recounted incidents of filesystem-level corruption under heavy write loads. While some questioned the reliability, others discussed potential performance concerns related to `fsync` operations and shared their own experiences with other file systems or distributed storage solutions like GlusterFS and Ceph.

Self-Hosting Scale & Strategic Savings

The decision to self-host and its associated trade-offs sparked a debate. Some commenters questioned the complexity introduced by self-hosting an S3 gateway for a relatively small dataset (119GB) and moderate throughput, suggesting that a simpler, vertically scaled local filesystem might suffice. Others challenged the value proposition, particularly if costs increased and users weren't explicitly complaining about performance. Conversely, proponents argued that self-hosting offers greater control, improved reliability, and prevents burnout for small teams managing high-availability services, even if it entails higher direct costs than 'cheap' cloud storage.