Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents
Cekura (YC F24) launches a platform designed to robustly test and monitor AI voice and chat agents, solving the challenge of ensuring consistent agent behavior across diverse user interactions. It introduces synthetic user simulations, mock tool platforms, and full-session evaluations to catch regressions that single-turn monitoring misses. This launch resonates with HN's interest in practical AI development tools and the complexities of building reliable LLM-powered applications.
The Lowdown
Cekura (YC F24) addresses the critical challenge of ensuring reliability and preventing regressions in AI voice and chat agents. With manual QA being impractical for the myriad ways users interact with agents, and traditional turn-based monitoring falling short, Cekura offers a comprehensive solution. It allows developers to simulate complex real-world conversations and evaluate agent performance across entire conversational arcs, rather than just individual turns, to detect subtle yet critical failures.
- Intelligent Scenario Generation: Cekura's agents bootstrap test suites from agent descriptions and ingest production conversations to automatically extract and evolve test cases, ensuring broad coverage.
- Mock Tool Platform: It provides a mock tool platform to simulate agent interactions with external APIs, allowing for robust testing of tool selection and decision-making without reliance on live, potentially flaky, production systems.
- Deterministic Testing: To combat the stochastic nature of LLMs, Cekura uses structured conditional action trees for evaluators. This ensures consistent synthetic user behavior across runs, meaning a test failure reliably indicates a regression, not just noise.
- Full-Session Monitoring: Unlike turn-based tracing platforms, Cekura monitors live agent traffic by evaluating entire conversational sessions. This allows it to detect failures where an agent might incorrectly proceed despite an earlier, crucial step (like verification) failing, a common "failure mode" in complex AI interactions.
- Preventing Regressions: By simulating and monitoring agents across their full conversational lifecycle, Cekura aims to help teams catch behavioral regressions early, significantly reducing the risk of deploying faulty agents and improving overall user experience.
Cekura offers a 7-day free trial, allowing teams to explore its capabilities in managing the complexities of AI agent development.
The Gossip
Evaluating Complex Conversational Flows
Commenters highlighted the difficulty of properly evaluating AI agents, especially regarding complex flows where an agent might correctly refuse to proceed or escalate to a human. This led to a discussion on how to define "correct" outcomes for incomplete sessions and prevent "common sense" failures. The founder and other users pointed to Cekura's full-session evaluation as the core solution, using structured checkpoints and state machines to track conversational progress and flag failures when expected steps are skipped.
Agent Training and "Common Sense"
The discussion touched upon the fundamental challenge of instilling "common sense" into AI agents and whether traditional training or fine-tuning is the optimal path for improvement. One commenter suggested an "Episodic memory with feedback system," implying continuous learning. The founder, however, leaned towards feedback loops, tools, and prompt optimization as more effective and less "overkill" than full training for current agent improvements, acknowledging the "fast brain, slow brain" architectural pattern.
Integration Versatility & Ecosystem Fit
Users inquired about Cekura's compatibility with various agent setups, specifically asking about supporting custom knowledge bases and testing chat agents that don't expose an API. The Cekura team confirmed existing support for knowledge base integrations (e.g., BigQuery, file uploads) and their ability to generate scenarios from these. They also indicated support for various chat agent providers and direct connections for SMS/WhatsApp agents, emphasizing flexibility in connecting to diverse agent architectures. One commenter also shared a link to their own open-source voice agent devkit, prompting an exchange with the founder about integrating existing frameworks.