HN
Today

Testing distributed systems with AI agents

This GitHub project introduces AI agent skills designed to systematically test distributed and stateful systems, moving beyond traditional integration tests. It leverages a claim-driven approach, generating detailed test plans and findings reports with structured verdicts and blame classifications. The project aims to tackle the notoriously difficult challenge of verifying complex distributed systems using advanced, opinionated testing methodologies powered by AI agents.

12
Score
0
Comments
#9
Highest Rank
8h
on Front Page
First Seen
May 20, 3:00 PM
Last Seen
May 20, 10:00 PM
Rank Over Time
910111420222528

The Lowdown

This GitHub repository by 'shenli' presents a novel approach to testing distributed and stateful systems using AI agents. The project offers two distinct skills for these agents: one to design claim-driven test plans and another to execute them, ultimately producing structured Markdown test plans and comprehensive findings reports.

  • Two Core Skills: The system is composed of a 'designing' skill that generates test plans based on product claims and failure hypotheses, and an 'executing' skill that runs these plans, collecting evidence and assigning verdicts.
  • Opinionated Workflow: It enforces a workflow based on hard-won knowledge in the field, emphasizing claim-driven testing (not just test-driven), explicit coverage adequacy arguments, reuse of existing SUT tools, and a robust 'model + history + checker' approach for consistency-critical scenarios.
  • Detailed Outputs: The design skill produces a multi-section Markdown test plan, outlining architecture, claims, failure hypotheses, coverage matrices, and specific scenarios with detailed model/history/checker disciplines. The execution skill generates a session log, scenario logs, metrics, artifacts, and a findings report with a 9-state verdict taxonomy and SUT/harness/checker/environment blame classification.
  • Agent Compatibility: It's designed to work with various AI coding agents like Claude Code, Codex, Copilot CLI, Cursor, and Gemini, provided they can read Markdown and execute shell commands.
  • Technique Catalog: The project includes an extensive catalog of testing techniques distilled from academic literature, guiding the design skill in selecting appropriate methods for different failure modes.
  • Early, Exercised Status: Although in early stages, the skills have been exercised against a real-world distributed agent runtime (AgentDB), successfully surfacing several bugs and demonstrating its practical utility.

By providing a systematic, AI-augmented framework for generating and executing rigorous tests, this project offers a promising path forward for improving the reliability and robustness of distributed systems, addressing many of the complexities that traditional testing often misses.