Research-Driven Agents: What Happens When Your Agent Reads Before It Codes
This post unveils "Research-Driven Agents," AI coding bots that significantly boost code optimization by incorporating a research phase before coding. By studying papers and competing projects, an agent optimized llama.cpp, achieving a +15% speedup in Flash Attention for $29. It showcases a powerful, cost-effective paradigm for AI to tackle complex engineering problems beyond mere code-tweaking.
The Lowdown
Alex Kim's blog post introduces "Research-Driven Agents," an advanced iteration of AI coding agents that drastically improve code optimization by integrating a dedicated research phase into their workflow. This approach allows agents to gather external knowledge from academic papers and competing projects before attempting to modify code.
Key takeaways from the research:
- Knowledge-Augmented Optimization: Unlike "code-only" agents that struggle with complex, memory-bound bottlenecks, research-driven agents read papers and study competing projects to identify non-obvious optimization opportunities.
- Real-World Application: Pointed at
llama.cpp's CPU inference path for TinyLlama 1.1B, an agent leveraging 4 AWS VMs optimized Flash Attention text generation by +15% on x86 and +5% on ARM in approximately 3 hours. - Cost-Efficient Exploration: The entire optimization process incurred a minimal cost of around $29 (primarily for CPU VMs and API calls), demonstrating remarkable efficiency.
- Research Yields Deeper Insights: Initial code-only attempts produced negligible gains. After a research phase, the agent pivoted from micro-optimizations to memory-access pattern improvements and operator fusions, recognizing that text generation was memory-bandwidth bound.
- Successful Fusions: The agent successfully implemented five key optimizations, including single-pass softmax and RMS norm fusions, adaptive
from_floatparallelization, graph-levelRMS_NORM + MULfusion (inspired by CUDA/Metal backends), and Flash Attention KQ fusion. - Practical Challenges: The process encountered issues like compiler auto-vectorization negating some "optimizations," a benchmark parsing bug leading to incorrect baselines, and variance due to noisy cloud VMs, all of which the agent or its operators learned from.
- Broader Implications: This research extends the
autoresearchparadigm to problems where crucial optimization insights reside outside the immediate codebase, suggesting AI agents can now address higher-level engineering challenges that traditionally require senior human expertise and domain knowledge.
This work highlights a pivotal advancement for AI coding agents, demonstrating that equipping them with a "research" capability unlocks a new level of efficacy in complex performance optimization, moving them closer to autonomously tackling nuanced engineering problems with a deep understanding of external context.