Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster
This post details how scaling Andrej Karpathy's autoresearch agent with a 16-GPU cluster drastically accelerated neural network optimization. By enabling parallel experimentation, the agent moved beyond greedy hill-climbing to factorial grids, discovering more effective architecture changes and even developing a sophisticated strategy for heterogeneous hardware. It's a compelling demonstration of AI agent autonomy and the power of parallel computing in deep learning research.
The Lowdown
The article describes an experiment where a coding agent, Claude Code, was given access to 16 GPUs on a Kubernetes cluster to scale Andrej Karpathy's autoresearch project. This setup allowed the agent to autonomously improve a neural network training script (train.py) by running experiments in parallel, significantly accelerating the optimization process and leading to notable performance improvements.
Key takeaways from the experiment include:
- The agent submitted approximately 910 experiments over 8 hours, achieving a 9x throughput increase compared to a single GPU setup.
- Parallelism fundamentally changed the agent's search strategy from sequential hill-climbing to running factorial grids, enabling it to catch interaction effects between parameters.
- It discovered that scaling model width (aspect ratio) yielded greater improvements than individual hyperparameter tweaks, a finding made efficient by parallel testing.
- The agent autonomously developed a sophisticated strategy to exploit heterogeneous hardware, screening ideas on H100s and promoting promising ones to H200s for validation, without explicit programming.
- The optimization progressed through distinct phases: initial hyperparameter sweeps, architecture discovery, fine-tuning the wider model, optimizer tuning, and finally, diminishing returns.
- The final optimized configuration achieved a 2.87% improvement in
val_bpb(from 1.003 to 0.974) over the baseline.
This experiment highlights the transformative potential of combining AI agents with scalable cloud infrastructure, allowing for rapid, autonomous research and optimization that can uncover non-obvious improvements and emergent strategies in complex systems.