NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

Q Labs postulates that compute power is outstripping data availability, making data the future bottleneck for AI, particularly outside of large language models where fields like robotics and biology struggle due to massive data requirements. They aim to solve this by developing new learning algorithms optimized for limited data and practically infinite compute settings.

Q Labs launched NanoGPT Slowrun, an open repository designed for developing data-efficient learning algorithms.
The benchmark involves training on a fixed 100M tokens from FineWeb, with the objective of achieving the lowest validation loss, utilizing unlimited compute.
This approach is a deliberate inverse of 'speedrun' benchmarks, which optimize for wall-clock time, thereby encouraging computationally intensive but data-efficient ideas like heavy regularization or alternative optimizers.
Initial findings demonstrated that the Muon optimizer outperformed others, and multi-epoch training was critical. Aggressive regularization (e.g., 16x standard weight decay, dropout) allowed scaling to larger parameter counts with limited data.
The baseline achieved 2.4x data efficiency against modded-nanogpt, rapidly increasing to 5.5x through community contributions.
Key improvements included per-epoch shuffling, learned projections for value embeddings, SwiGLU activation, and ensembling multiple models.
Q Labs anticipates reaching 10x data efficiency soon and potentially 100x by year-end through further algorithmic exploration.
Future research directions include second-order optimizers, diffusion models, curriculum learning, evolutionary search, and optimizing for compression/model complexity.

Slowrun is presented as a crucial step towards developing AI that can generalize effectively even when data is scarce, shifting the focus of innovation in model training.

NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

The Lowdown