Learning Pseudorandom Numbers with Transformers

This paper investigates the remarkable capacity of Transformer models to decipher and predict sequences generated by Permuted Congruential Generators (PCGs), a sophisticated family of pseudorandom number generators. Unlike simpler Linear Congruential Generators (LCGs), PCGs incorporate complex bit-wise operations like shifts, XORs, rotations, and truncations, making them significantly harder to predict using traditional methods. The study demonstrates that Transformers can successfully perform in-context prediction on various PCG variants, even exceeding the capabilities of published classical attacks.

Transformers proved capable of predicting PCG sequences, even when outputs were truncated to a single bit.
The models were scaled to handle moduli up to 2^22, utilizing up to 50 million parameters and datasets with 5 billion tokens.
When presented with multiple distinct PRNGs during training, the Transformer models successfully learned to identify and adapt to the different structures simultaneously.
A novel scaling law was observed: the number of in-context sequence elements required for near-perfect prediction grows proportionally to the square root of the modulus, m (sqrt(m)).
For larger moduli (m >= 2^20), curriculum learning—training with data from smaller moduli first—was found to be essential to overcome extended optimization stagnation phases.
Analysis of the embedding layers revealed a significant clustering phenomenon where top principal components spontaneously grouped integer inputs into bitwise rotationally-invariant clusters, providing insight into the models' internal representations and how they generalize across different moduli.

In essence, this research highlights the unexpected prowess of Transformer models in uncovering the underlying patterns of complex pseudorandom number generators, offering new avenues for understanding both PRNGs and the interpretability of advanced AI systems.

Learning Pseudorandom Numbers with Transformers

The Lowdown