GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance

An intriguing investigation has uncovered an anomaly in the behavior of OpenAI's GPT-5.5 Codex model, where its reasoning_output_tokens frequently cluster at specific, fixed values. This observed phenomenon, particularly at 516, 1034, and 1552 tokens, suggests a potential underlying issue leading to degraded performance on complex tasks. The findings present a strong statistical case for a non-random pattern in the model's output generation.

The core observation reveals that GPT-5.5 Codex responses disproportionately terminate with exactly 516 reasoning_output_tokens, with other spikes at 1034 and 1552.
This clustering is model-specific, with GPT-5.5 accounting for 82.0% of exact-516 events despite comprising only 19.3% of all responses analyzed.
The anomaly coincides with a decline in overall reasoning-token intensity for GPT-5.5, suggesting a potential link to reduced problem-solving depth.
Statistical evidence from February to June 2026 shows a sharp increase in this exact-516 clustering for GPT-5.5, while mean and P90 reasoning tokens concurrently decreased.
The author postulates that this behavior might indicate a hidden reasoning-budget cap, truncation, routing, or scheduler mechanism within the model.
The report urges OpenAI's Codex team to investigate these thresholds and clarify whether this is expected behavior, a budget constraint, or a sign of degradation.

This detailed analysis raises important questions about the internal mechanisms and consistency of advanced AI models, offering a data-driven perspective on how hidden architectural choices can manifest as performance quirks.

GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance

The Lowdown

The Gossip

User Experience Echoes