Running Out of Disk Space in Production

The author recounts a stressful production incident where a newly launched server, meant to distribute digital files, quickly ran out of disk space, leading to customer complaints and service disruption. What began as a simple static file server on a small Hetzner machine escalated into an urgent debugging session to restore service.

Initial Crisis: Minutes after launch, the 40GB disk on the NixOS server filled up, causing "Insufficient system storage" errors and service interruption for customers trying to download 2.2GB files.
Panic Debugging (Initial Attempts): The author frantically tried nix-collect-garbage -d and journalctl --vacuum-time=1s to clear space, but these offered only temporary relief or failed due to lack of space.
Temporary Solution: Unable to upgrade the server, the author moved the large /nix/store to a separate 12GB volume, following NixOS Wiki instructions. This stabilized the root partition, allowing the service to partially recover.
Large File Download Issue: Even with more space, customers reported that large 2.2GB files were failing to download halfway through.
Nginx Misconfiguration 1: Investigation revealed the Nginx proxy_max_temp_file_size default of 1024m was too small for the 2.2GB files. Increasing it to 5000m resolved this download issue.
Nginx Misconfiguration 2 (Root Cause): Disk space spikes reappeared. Using lsof +L1, the author discovered Nginx was holding 14.5GB of "deleted" temporary files. A closer look at Nginx documentation revealed proxy_buffering was enabled by default, causing Nginx to buffer entire responses to disk. Disabling proxy_buffering and setting proxy_max_temp_file_size to 0 finally stabilized disk usage at 20%.

This incident highlights the common pitfalls of rushing under pressure and the often-overlooked importance of thoroughly understanding the default behaviors and documentation of critical components like Nginx, especially when dealing with large files and resource constraints.

Running Out of Disk Space in Production

The Lowdown