HN
Today

Show HN: Auto-Architecture: Karpathy's Loop, Pointed at a CPU

This project applies Andrej Karpathy's autonomous research loop, typically used for software optimization, to the domain of CPU architecture design. In under ten hours, an AI agent significantly improved a RISC-V core's performance by +92% while reducing logic, surpassing human-tuned benchmarks. The story's core insight, however, is not the agent itself, but the critical importance of robust verification systems in making such autonomous design processes reliable and effective.

22
Score
3
Comments
#5
Highest Rank
9h
on Front Page
First Seen
Apr 29, 3:00 AM
Last Seen
Apr 29, 11:00 AM
Rank Over Time
75557791110

The Lowdown

The "Auto-Architecture" project explores the generalization of autonomous research loops, specifically inspired by Andrej Karpathy's work, by pointing one at CPU microarchitecture design. The goal was to see if an AI agent, previously successful in software optimization, could effectively optimize hardware.

  • Setup: The system used a 5-stage in-order RV32IM core in SystemVerilog. An orchestrator directs an LLM-based agent to propose microarchitectural hypotheses in YAML. An implementation agent then modifies RTL files. These changes are rigorously evaluated through a gate that includes: formal verification (riscv-formal), Verilator cosimulation against a Python ISS, 3-seed nextpnr Place & Route on an FPGA for fitness metrics (Fmax x CoreMark iter/cycle), and independent CoreMark CRC validation.
  • Results: Over 9 hours and 51 minutes, the agent processed 73 hypotheses, accepting 10. The winning design achieved a 92% performance increase over the baseline and 56% over a human-tuned VexRiscv equivalent in CoreMark iter/sec, while using 40% fewer LUTs. The automated process surpassed human benchmarks at iteration 6.
  • The Verifier's Crucial Role: The author emphasizes that the agent loop is becoming a commodity, but the verifier is the non-commodity, essential component. Of the 73 hypotheses, 63 were wrong (regressed, broke ISA, or failed timing). The robust verification system caught these issues, preventing corrupted runs or the agent learning from incorrect feedback. Examples of verifier efficacy include catching sandbox violations, schema errors, and significant performance regressions, ensuring only truly beneficial changes were adopted.

This project convincingly demonstrates that while autonomous agent loops can drive impressive optimization, their utility is entirely dependent on the quality and rigor of the verification system. The author concludes that the next wave of successful companies will be defined not by their agent loops, but by their ability to define and implement sharp, comprehensive verifiers that encode their business's definition of 'correctness'.