Testing distributed systems with AI agents

This GitHub repository by 'shenli' presents a novel approach to testing distributed and stateful systems using AI agents. The project offers two distinct skills for these agents: one to design claim-driven test plans and another to execute them, ultimately producing structured Markdown test plans and comprehensive findings reports.

Two Core Skills: The system is composed of a 'designing' skill that generates test plans based on product claims and failure hypotheses, and an 'executing' skill that runs these plans, collecting evidence and assigning verdicts.
Opinionated Workflow: It enforces a workflow based on hard-won knowledge in the field, emphasizing claim-driven testing (not just test-driven), explicit coverage adequacy arguments, reuse of existing SUT tools, and a robust 'model + history + checker' approach for consistency-critical scenarios.
Detailed Outputs: The design skill produces a multi-section Markdown test plan, outlining architecture, claims, failure hypotheses, coverage matrices, and specific scenarios with detailed model/history/checker disciplines. The execution skill generates a session log, scenario logs, metrics, artifacts, and a findings report with a 9-state verdict taxonomy and SUT/harness/checker/environment blame classification.
Agent Compatibility: It's designed to work with various AI coding agents like Claude Code, Codex, Copilot CLI, Cursor, and Gemini, provided they can read Markdown and execute shell commands.
Technique Catalog: The project includes an extensive catalog of testing techniques distilled from academic literature, guiding the design skill in selecting appropriate methods for different failure modes.
Early, Exercised Status: Although in early stages, the skills have been exercised against a real-world distributed agent runtime (AgentDB), successfully surfacing several bugs and demonstrating its practical utility.

By providing a systematic, AI-augmented framework for generating and executing rigorous tests, this project offers a promising path forward for improving the reliability and robustness of distributed systems, addressing many of the complexities that traditional testing often misses.

Testing distributed systems with AI agents

The Lowdown