Arithmetic Without Numbers – How LLMs Do Math
This article delves into the fascinating internal mechanics of how Large Language Models (LLMs) handle arithmetic, exploring whether they truly 'compute' or merely 'pattern match' using their matrix-based architecture. It details the 'Rune' project's strict interpretability experiments to observe and manipulate these hidden arithmetic states, moving beyond simple external tool calls. For anyone curious about the black box of LLM cognition, this offers a compelling, nuanced look at their machine-native approach to numbers.
The Lowdown
The article "Arithmetic Without Numbers" by Alvaro Videla investigates the fundamental question of how Large Language Models (LLMs) perform arithmetic calculations, given that they lack the embodied experiences (like counting on fingers) that humans use. Instead, LLMs operate solely with matrices, tokens, vectors, and activations.
The author explains that understanding LLM arithmetic requires looking inside the model to see if it's recalling patterns, running algorithms, or just producing plausible next tokens. Key insights from the article include:
- LLM Mechanisms: LLMs use a complex interplay of tokens, vectors, activations, residual streams, attention, and MLPs (feed-forward blocks) to process information, including numbers.
- The "No Fingers" Problem: Unlike humans, LLMs have no physical or symbolic aids for arithmetic, forcing them to invent a machine-native way of representing and manipulating numbers within their matrix-based structure.
- Next-Token Constraint: The model's left-to-right generation of answers poses a challenge, especially for multi-digit arithmetic with carries, as seen in experiments where deep-carry cases often caused failures.
- The Rune Project's Goal: The project aimed to determine if the model's internal activations could reveal the operation and operands of an arithmetic problem (e.g.,
gcd(84, 36)), rather than relying on prompt parsing or external tool calls from text. - Claim Ladder: The article outlines five ways an LLM might produce a correct answer, ranging from simple prompt parsing to the ambitious "Residual JIT replacement" (writing computed results back into the model's hidden state), with the latter proving difficult.
- Rendering vs. Computing: A crucial distinction is made between a model's ability to 'render' a known answer and actually 'compute' it. Strict controls were implemented to ensure experiments measured computation from internal states, not just rendering.
- Interpretability Toolbox: The project employed tools like probes (to read facts), sparse autoencoders (to name vector parts), activation patching (to test importance), and steering (to push states) to inspect the model's internal workings.
- Readable vs. Writable: A significant finding was that even if an internal variable is readable, it doesn't mean it's easily writable. Writing back into the residual stream often proved brittle and disrupted other model behaviors.
- Activation-Derived Tool Arguments: The most robust finding was the ability to derive arithmetic operation and operand arguments directly from the model's internal activations, allowing an external calculator to be called without parsing the original text prompt.
- Accuracy Lifts: This activation-derived route, tested on a frozen Llama model, showed significant accuracy improvements on tasks like GCD, LCM, and division with remainder, even demonstrating zero false positives on constructed 'hard-negative' prompts.
- Resolution Budget: Experiments showed that longer answers led to