Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

Alex Kim's blog post introduces "Research-Driven Agents," an advanced iteration of AI coding agents that drastically improve code optimization by integrating a dedicated research phase into their workflow. This approach allows agents to gather external knowledge from academic papers and competing projects before attempting to modify code.

Key takeaways from the research:

Knowledge-Augmented Optimization: Unlike "code-only" agents that struggle with complex, memory-bound bottlenecks, research-driven agents read papers and study competing projects to identify non-obvious optimization opportunities.
Real-World Application: Pointed at llama.cpp's CPU inference path for TinyLlama 1.1B, an agent leveraging 4 AWS VMs optimized Flash Attention text generation by +15% on x86 and +5% on ARM in approximately 3 hours.
Cost-Efficient Exploration: The entire optimization process incurred a minimal cost of around $29 (primarily for CPU VMs and API calls), demonstrating remarkable efficiency.
Research Yields Deeper Insights: Initial code-only attempts produced negligible gains. After a research phase, the agent pivoted from micro-optimizations to memory-access pattern improvements and operator fusions, recognizing that text generation was memory-bandwidth bound.
Successful Fusions: The agent successfully implemented five key optimizations, including single-pass softmax and RMS norm fusions, adaptive from_float parallelization, graph-level RMS_NORM + MUL fusion (inspired by CUDA/Metal backends), and Flash Attention KQ fusion.
Practical Challenges: The process encountered issues like compiler auto-vectorization negating some "optimizations," a benchmark parsing bug leading to incorrect baselines, and variance due to noisy cloud VMs, all of which the agent or its operators learned from.
Broader Implications: This research extends the autoresearch paradigm to problems where crucial optimization insights reside outside the immediate codebase, suggesting AI agents can now address higher-level engineering challenges that traditionally require senior human expertise and domain knowledge.

This work highlights a pivotal advancement for AI coding agents, demonstrating that equipping them with a "research" capability unlocks a new level of efficacy in complex performance optimization, moving them closer to autonomously tackling nuanced engineering problems with a deep understanding of external context.

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

The Lowdown