HN
Today

Qwen3.7-Max: The Agent Frontier

Alibaba Cloud introduces Qwen3.7-Max, an AI model touting frontier agent capabilities across coding, office automation, and complex long-horizon tasks. The new model showcases remarkable benchmark performance, including a 10x kernel optimization on previously unseen hardware and robust generalization across various agent frameworks. Hacker News is buzzing about these advancements and eagerly anticipating potential open-weight versions, despite some skepticism regarding benchmark comparisons against older models.

31
Score
8
Comments
#2
Highest Rank
12h
on Front Page
First Seen
May 20, 12:00 PM
Last Seen
May 20, 11:00 PM
Rank Over Time
752332345688

The Lowdown

Qwen3.7-Max, developed by the Qwen Team, is presented as Alibaba Cloud's latest proprietary AI model, engineered for the burgeoning "agent era." It boasts extensive capabilities, positioning itself as a versatile foundation for complex tasks ranging from advanced coding to multi-day autonomous execution.

  • Versatile Agent Foundation: Designed for writing and debugging code, automating office workflows, and sustaining autonomous execution across hundreds or thousands of steps.
  • Exceptional Performance: Achieves leading scores across numerous benchmarks for coding agents (e.g., Terminal Bench 2.0, SWE-Pro, SciCode), general-purpose agents (e.g., MCP-Mark, MCP-Atlas, Skillsbench), and reasoning (e.g., GPQA Diamond, HMMT 2026 Feb).
  • Long-Horizon Autonomous Optimization: Demonstrated a 35-hour autonomous kernel optimization run, achieving a 10x speedup over a Triton reference on an unseen hardware platform (T-Head ZW-M890 PPUs) via 1,158 tool calls.
  • Environment Scaling & Cross-Harness Generalization: Its robust performance stems from diverse agentic training environments and a decoupled training infrastructure that forces generalized problem-solving strategies over harness-specific shortcuts.
  • Real-World Productivity & Strategic Planning: Functions as an advanced coworker for streamlining workflows and exhibited strong long-horizon planning and execution in simulated startup management (YC-Bench), achieving significantly higher revenue than predecessors.
  • Availability & Integration: Qwen3.7-Max will be available via Alibaba Cloud Model Studio API, supporting OpenAI and Anthropic compatible protocols, and integrates with popular agent frameworks like Claude Code, OpenClaw, and Qwen Code.

Qwen3.7-Max represents a significant leap in AI agent technology, offering a powerful, generalized, and long-horizon capable foundation poised to drive the next generation of AI-driven productivity and automation.

The Gossip

Benchmark Bewilderment

Commenters expressed confusion and slight frustration regarding Qwen's practice of comparing their new models against older versions of competitors (e.g., Opus-4.6) rather than the latest releases. The discussion speculates on reasons such as timing, a strategic desire to present a favorable comparison, or the lag in publishing new benchmarks for the very latest competitor models, with one user suggesting it helps set initial expectations.

Open-Weight Optimism

Several users voiced a strong desire for open-weight releases from the Qwen team, particularly for models in the 60-150B parameter range. This size is highlighted as a 'sweet spot' for current 'prosumer' hardware, indicating a community interest in accessible, powerful models capable of running on local systems.

Agentic Application Appetite

One comment directly inquired about user experiences and reports on the practical performance of Qwen's coding agents. This reflects a clear interest within the community for real-world application insights and effectiveness of the agent beyond the presented benchmarks.