HN
Today

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

Meituan, the Chinese food delivery behemoth, has unveiled LongCat-2.0, a formidable 1.6 trillion-parameter Mixture-of-Experts (MoE) language model, trained and deployed entirely on AI ASIC superpods, sidestepping the NVIDIA ecosystem. This open-source release boasts novel architectural elements like LongCat Sparse Attention and N-gram Embedding, pushing the envelope on long-context processing and agentic capabilities. The Hacker News crowd is abuzz, scrutinizing both its technical prowess and its audacious hardware independence.

89
Score
27
Comments
#5
Highest Rank
17h
on Front Page
First Seen
Jun 30, 1:00 AM
Last Seen
Jun 30, 5:00 PM
Rank Over Time
9566665661014141519172427

The Lowdown

LongCat-2.0 emerges as a significant new player in the large language model arena, developed by Meituan. This Mixture-of-Experts (MoE) model features a staggering 1.6 trillion total parameters with approximately 48 billion activated per token, representing a substantial leap in scale and architectural innovation. What truly sets it apart is its complete development on AI ASIC superpods, demonstrating frontier-scale training capabilities on alternative hardware platforms.

  • Architectural Innovations: LongCat-2.0 introduces LongCat Sparse Attention (LSA), an evolution of DeepSeek's sparse attention, designed for efficient long-context processing through Streaming-aware Indexing (SI), Cross-Layer Indexing (CLI), and Hierarchical Indexing (HI). It also incorporates N-gram Embedding, expanding the embedding space dramatically to capture richer local context and improve parameter efficiency.
  • Hardware Independence: The model's entire training and deployment infrastructure is built on tens of thousands of AI ASIC superpods, signaling a move away from the dominant NVIDIA GPU ecosystem and showcasing a robust, scalable, and stable alternative.
  • Scalable Training & Inference: Significant engineering efforts went into optimizing training on these ASICs, including 6D parallelism (introducing EMBP for N-gram Embeddings), 'Superpods' for enhanced communication, and advanced memory management. Inference also saw deep optimization, tackling memory, I/O, and interconnect bandwidth constraints through model-specific, accelerator-oriented, and deployment-level strategies.
  • Post-training & Capabilities: A specialized expert-group design (Agent, Reasoning, and Interaction Experts) in the post-training pipeline, integrated via a MOPD architecture, aims to enhance overall performance in complex real-world tasks, from codebase migration to advanced reasoning.
  • Evaluation: LongCat-2.0 is benchmarked against leading proprietary models like Gemini and Claude Opus, showing competitive, albeit not always superior, performance across code, general agent, and foundational tasks.

In essence, LongCat-2.0 represents an ambitious stride in large language model development, not just for its scale and architectural novelties, but for its pioneering reliance on AI ASIC hardware, challenging the conventional infrastructure landscape.

The Gossip

Meituan's Machine & ASIC Ambitions

The discussion quickly turned to the unexpected origin of LongCat-2.0, with many commenters surprised that Meituan, a Chinese food delivery giant, was behind such an advanced AI model. This led to speculation that the model was trained on Huawei Ascend 910C ASICs, highlighting a significant achievement in operating at this scale without NVIDIA GPUs. The context expanded to discussions about how 'non-tech' companies like Uber or Amazon have historically built impressive underlying tech infrastructure, challenging preconceptions about who can drive innovation in this space.

Benchmarking & Bot Blunders

Hacker News users immediately put LongCat-2.0 to the test, with one notable example involving a nuanced nuclear physics question that the model answered incorrectly, contrasting it unfavorably with Gemini Flash and Qwen. Other users reported issues like Chinese responses despite English settings, suggesting potential biases or limitations. The conversation also included practical concerns about the sheer scale of the model, questioning the feasibility of running a 1.6T parameter model (even with sparse activation) on 'common hardware' and noting the initial unavailability of model weights.

Deep Dives & Distribution Doubts

Early comments noted architectural similarities to DeepSeek's work, particularly regarding sparse attention, though one user later retracted this, acknowledging LongCat-2.0's distinct innovations like LongCat Sparse Attention (LSA) and N-gram Embedding. A recurrent theme was skepticism regarding the 'open-source' claims, given that the HuggingFace and GitHub links initially led to 404s, and model weights were merely promised as 'coming soon.' This sparked debate on the true extent of its open-source nature and the availability of detailed technical reports.