HN
Today

AutoKernel: Autoresearch for GPU Kernels

AutoKernel introduces an autonomous AI agent designed to optimize GPU kernels for PyTorch models, directly applying the 'autoresearch' philosophy to low-level performance tuning. This innovative tool profiles bottlenecks, extracts Triton kernels, and iteratively optimizes them through an automated edit-benchmark-revert loop. It promises significant speedups for deep learning models by automating a complex, time-consuming process, allowing developers to improve performance while they sleep.

11
Score
1
Comments
#7
Highest Rank
3h
on Front Page
First Seen
Mar 11, 8:00 AM
Last Seen
Mar 11, 10:00 AM
Rank Over Time
9147

The Lowdown

AutoKernel is an open-source project that pioneers the use of an autonomous AI agent to automatically optimize GPU kernels for PyTorch models. Drawing inspiration from Andrej Karpathy's 'autoresearch,' it aims to revolutionize the process of low-level performance tuning by allowing an agent to methodically explore a search space for optimal kernel configurations.

Here's how AutoKernel achieves its optimization goals:

  • Bottleneck Identification: It profiles any PyTorch model to pinpoint the GPU kernels that are causing performance bottlenecks.
  • Triton Kernel Extraction: Identified bottlenecks are then extracted and converted into standalone Triton kernels, a Python-like language ideal for GPU programming.
  • Autonomous Optimization Loop: An AI agent (e.g., Claude, Codex) continuously modifies kernel.py, benchmarks the changes using a fixed bench.py harness, and decides whether to keep or revert modifications based on performance and correctness.
  • Amdahl's Law Orchestration: A multi-kernel scheduler prioritizes optimizations based on Amdahl's law, ensuring that efforts are focused on kernels that yield the greatest end-to-end speedup.
  • Robust Correctness Checks: The benchmarking process includes five stages of correctness verification to prevent performance gains from sacrificing numerical stability or producing incorrect results.
  • Simplified Agent Interaction: The agent operates by reading comprehensive instructions from program.md and modifying only kernel.py, ensuring a manageable scope and clean reverts.
  • Wide Kernel Support: It supports 9 common deep learning kernel types, including matmul, softmax, layernorm, and flash_attention, along with provided PyTorch references and starter Triton implementations.
  • Model Compatibility: AutoKernel works with self-contained model definitions (like GPT-2, LLaMA, BERT) and can also integrate with HuggingFace models.

By leveraging autonomous agents and a rigorous optimization pipeline, AutoKernel offers a compelling solution for automatically enhancing the efficiency and performance of deep learning computations on GPUs, making advanced optimization techniques more accessible.