Launch HN: Relvy (YC F24) – On-call runbooks, automated
Relvy AI, a YC F24 startup, has launched a platform to automate on-call runbooks, aiming to significantly reduce Mean Time To Resolution for software engineering teams. It tackles the inherent challenges of AI in root cause analysis by anchoring its AI agent around structured runbook execution. This targeted approach resonates on Hacker News by offering a practical, AI-powered solution to a common, high-stress operational problem for developers.
The Lowdown
Relvy AI introduces an innovative platform designed to automate on-call runbooks, providing software engineering teams with an AI agent capable of debugging and resolving production issues rapidly. Recognizing the limitations of current AI models in autonomous root cause analysis due to data volume, context dependency, and high stakes, Relvy focuses on specialized tools and structured execution.
- The Problem: Existing AI solutions struggle with autonomous root cause analysis, showing low accuracy (e.g., Claude Opus 4.6 at 36% on OpenRCA) due to telemetry data noise, enterprise context variability, and the time-critical nature of on-call incidents.
- Relvy's Solution: Relvy builds specialized tools for telemetry data analysis, enabling anomaly detection, log pattern searching, and span tree reasoning without overwhelming the AI's context.
- Runbook-Anchored AI: The system grounds its AI agent in deterministic runbook steps, mimicking experienced engineers' workflows. This minimizes exploratory AI behavior, leading to faster, more reliable analysis and reduced cognitive load.
- How it Works: Users install Relvy locally (Docker/Helm) or use their cloud service, connect their observability and code stacks, and define runbooks. Investigations are presented as interactive notebooks with data visualizations, and the system can automate responses to alerts.
- Automated Actions: Relvy automates common diagnostic steps, such as checking dashboards for error isolation, analyzing throughput surges, and reviewing recent code commits. It can also execute AWS CLI commands for mitigation, with human approval.
- Key Features: Offers centralized, executable Markdown runbooks, reliable AI on-call, extensive integrations, and collaborative debugging notebooks. It promises to resolve 70% of alerts in under 5 minutes and is enterprise-ready with SOC 2 Type II compliance and self-hosting options.
Relvy AI positions itself as a critical tool for operational excellence, enabling teams to standardize incident response and effectively offload on-call burdens, thereby freeing up engineers to focus on development rather than constant firefighting.