Chess engines do weird stuff
Chess engines, particularly those like lc0, employ surprisingly unconventional machine learning techniques beyond standard reinforcement learning. The article delves into methods such as distillation from search, runtime adaptation, and Simultaneous Perturbation Stochastic Approximation (SPSA) for directly optimizing win rates and tuning arbitrary C++ parameters. These insights offer valuable lessons for the broader field of AI, particularly for LLM development, by showcasing effective, albeit 'insane' and expensive, optimization strategies.
The Lowdown
This article explores several peculiar and highly effective techniques employed in modern chess engines, specifically highlighting those used by lc0, that often diverge from conventional machine learning paradigms. It suggests that insights from these methods could be valuable for other AI domains, notably Large Language Models (LLMs).
- Efficient Training via Distillation: While AlphaZero popularized Reinforcement Learning (RL), subsequent powerful engines like lc0's BT4 found that distilling knowledge from a strong model with search is often more efficient and effective than continuous RL, sometimes even leading to worse performance when reintegrating RL.
- Runtime Model Adaptation: A novel technique involves chess engines dynamically adjusting their neural network's evaluation of a position during a game, based on deeper search results, effectively adapting to the current board state.
- Direct Win Optimization with SPSA: Instead of solely optimizing for positional evaluation, lc0 uses Simultaneous Perturbation Stochastic Approximation (SPSA) to directly optimize for winning games. This 'insane' method randomly perturbs weights, plays many games, and updates in the direction that yields more wins, offering significant Elo improvements despite its computational cost and lack of gradients.
- 'Gradient Descent' Through C++ Code: SPSA's principle extends to tuning any numerical parameter within the engine's C++ source code. By perturbing these arbitrary parameters and observing win rates, engineers can optimize hand-tuned heuristics, achieving small but notable Elo gains.
- Unconventional Transformer Architecture: lc0's adoption of a transformer architecture significantly improved performance over older convolutional models. They also utilize a custom 'smolgen' system for attention biases, which provides a massive accuracy boost equivalent to a 2.5x model size increase, despite a throughput hit.
The article concludes by emphasizing how these 'weird' and often resource-intensive methods, particularly SPSA's ability to optimize non-differentiable objectives and arbitrary code parameters, offer profound lessons for AI development, illustrating that sometimes the most unconventional approaches yield the most powerful results.