HN
Today

Learning Pseudorandom Numbers with Transformers

This academic paper dives into the surprising capability of Transformer models to learn and predict sequences generated by Permuted Congruential Generators (PCGs), a notoriously complex family of pseudorandom number generators. The research showcases that these AI models can overcome challenges beyond classical attacks, even with truncated outputs, and jointly learn multiple generator structures. It further uncovers fascinating interpretability insights, revealing how Transformers form internal representations that group integer inputs based on bitwise rotational invariance, pushing the boundaries of AI's understanding of mathematical patterns.

3
Score
0
Comments
#29
Highest Rank
1h
on Front Page
First Seen
May 3, 10:00 AM
Last Seen
May 3, 10:00 AM

The Lowdown

This paper investigates the remarkable capacity of Transformer models to decipher and predict sequences generated by Permuted Congruential Generators (PCGs), a sophisticated family of pseudorandom number generators. Unlike simpler Linear Congruential Generators (LCGs), PCGs incorporate complex bit-wise operations like shifts, XORs, rotations, and truncations, making them significantly harder to predict using traditional methods. The study demonstrates that Transformers can successfully perform in-context prediction on various PCG variants, even exceeding the capabilities of published classical attacks.

  • Transformers proved capable of predicting PCG sequences, even when outputs were truncated to a single bit.
  • The models were scaled to handle moduli up to 2^22, utilizing up to 50 million parameters and datasets with 5 billion tokens.
  • When presented with multiple distinct PRNGs during training, the Transformer models successfully learned to identify and adapt to the different structures simultaneously.
  • A novel scaling law was observed: the number of in-context sequence elements required for near-perfect prediction grows proportionally to the square root of the modulus, m (sqrt(m)).
  • For larger moduli (m >= 2^20), curriculum learning—training with data from smaller moduli first—was found to be essential to overcome extended optimization stagnation phases.
  • Analysis of the embedding layers revealed a significant clustering phenomenon where top principal components spontaneously grouped integer inputs into bitwise rotationally-invariant clusters, providing insight into the models' internal representations and how they generalize across different moduli.

In essence, this research highlights the unexpected prowess of Transformer models in uncovering the underlying patterns of complex pseudorandom number generators, offering new avenues for understanding both PRNGs and the interpretability of advanced AI systems.