VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

The paper introduces VibeThinker-3B, a compact 3-billion parameter dense model designed to explore the limits of verifiable reasoning in small language models. Leveraging an optimized post-training pipeline, VibeThinker-3B demonstrates performance comparable to, or even surpassing, much larger flagship models like DeepSeek V3.2, GLM-5, and Gemini 3 Pro on demanding reasoning tasks.

Model Architecture: VibeThinker-3B is a 3B parameter dense model, specifically focused on pushing verifiable reasoning capabilities within a small-model regime.
Training Methodology: It employs a novel Spectrum-to-Signal post-training paradigm, which includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation.
Key Performance Metrics: The model achieves impressive scores such as 94.3 on AIME26 (97.1 with claim-level test-time scaling), 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode contests, demonstrating strong out-of-distribution generalization. It also maintains instruction controllability with a 93.4 on IFEval.
Core Hypothesis: The research introduces the "Parametric Compression-Coverage Hypothesis," which posits that verifiable reasoning can be compressed into compact "reasoning cores," while open-domain knowledge requires broader parameter coverage.
Significance: These findings challenge the assumption that extreme scale is always necessary for top-tier reasoning, highlighting the potential for efficient, high-performance compact models in specialized areas.

This research posits that compact models are not merely deployment-efficient alternatives but a vital complementary direction for achieving frontier-level performance, particularly in parameter-dense reasoning tasks, by isolating and optimizing core reasoning capabilities.

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

The Lowdown

The Gossip

Reasoning vs. Rendering Reality