HN
Today

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

VibeThinker-3B, a compact 3-billion parameter AI model, has achieved frontier-level verifiable reasoning, matching or exceeding much larger flagship models. This research leverages a novel post-training paradigm to demonstrate impressive performance on demanding tasks. The Hacker News community is intrigued by the implications for efficient AI and the potential for powerful, smaller models focused on core reasoning capabilities.

19
Score
4
Comments
#3
Highest Rank
15h
on Front Page
First Seen
Jun 23, 3:00 AM
Last Seen
Jun 23, 5:00 PM
Rank Over Time
64355476711810141824

The Lowdown

The paper introduces VibeThinker-3B, a compact 3-billion parameter dense model designed to explore the limits of verifiable reasoning in small language models. Leveraging an optimized post-training pipeline, VibeThinker-3B demonstrates performance comparable to, or even surpassing, much larger flagship models like DeepSeek V3.2, GLM-5, and Gemini 3 Pro on demanding reasoning tasks.

  • Model Architecture: VibeThinker-3B is a 3B parameter dense model, specifically focused on pushing verifiable reasoning capabilities within a small-model regime.
  • Training Methodology: It employs a novel Spectrum-to-Signal post-training paradigm, which includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation.
  • Key Performance Metrics: The model achieves impressive scores such as 94.3 on AIME26 (97.1 with claim-level test-time scaling), 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on recent unseen LeetCode contests, demonstrating strong out-of-distribution generalization. It also maintains instruction controllability with a 93.4 on IFEval.
  • Core Hypothesis: The research introduces the "Parametric Compression-Coverage Hypothesis," which posits that verifiable reasoning can be compressed into compact "reasoning cores," while open-domain knowledge requires broader parameter coverage.
  • Significance: These findings challenge the assumption that extreme scale is always necessary for top-tier reasoning, highlighting the potential for efficient, high-performance compact models in specialized areas.

This research posits that compact models are not merely deployment-efficient alternatives but a vital complementary direction for achieving frontier-level performance, particularly in parameter-dense reasoning tasks, by isolating and optimizing core reasoning capabilities.

The Gossip

Reasoning vs. Rendering Reality

Early comments reveal a brief misunderstanding regarding the model's purpose. One user attempted to use VibeThinker-3B for SVG art generation and reported failure. This was quickly clarified by other users, who pointed out that the model is explicitly designed for 'reasoning' tasks, not creative generation or visual output, highlighting a common distinction in AI capabilities.