GLM-5.2 is the new leading open weights model on Artificial Analysis

Z ai's GLM-5.2 has emerged as a frontrunner in open-weight AI models, topping the Artificial Analysis Intelligence Index with impressive scores, particularly in scientific reasoning. It achieves a unique position on the intelligence-cost Pareto frontier, offering high performance at a competitive price, stirring significant discussion among developers about its practical utility versus its token-heavy output.

153

Score

Comments

Highest Rank

on Front Page

First Seen

Jun 17, 10:00 AM

Last Seen

Jun 17, 6:00 PM

Rank Over Time

The Lowdown

GLM-5.2, developed by Z ai, has claimed the top spot as the leading open-weights model on the Artificial Analysis Intelligence Index v4.1, scoring an impressive 51. Despite being the same size as its predecessor, GLM-5.1, it shows significant intelligence gains and positions itself uniquely on the Pareto frontier for intelligence versus cost per task, making it an attractive option for many.

Leading Performance: GLM-5.2 surpasses other open-weights models like MiniMax-M3 and DeepSeek V4 Pro on the Intelligence Index, showing particular strength in scientific reasoning with notable gains in CritPt and HLE evaluations.
Proprietary Competitor: It scores 1524 on GDPval-AA v2, putting it on par with high-end proprietary models like GPT-5.5 (xhigh reasoning), showcasing its real-world agentic capabilities.
Cost-Efficiency: Despite using more output tokens per task (43k) than many peers, GLM-5.2 achieves the lowest cost per task among models at its intelligence level, costing approximately $0.46 per task.
Model Specifications: Licensed under MIT, it features 744B total and 40B active parameters, a 1M token context window, and competitive API pricing, with availability across numerous third-party providers.
Token Verbosity: A trade-off for its intelligence, the model is noted for its higher output token usage, meaning it can be more verbose in its reasoning processes.

In essence, GLM-5.2 represents a significant leap forward for open-weights models, delivering frontier-level intelligence with an appealing cost structure, even if it tends to be quite talkative in its problem-solving approach.

The Gossip

Token Talkativeness & Efficiency Trade-offs

Many commenters expressed both admiration for GLM-5.2's intelligence and concern over its verbosity and token usage. Users reported it spends considerable time and tokens on reasoning, sometimes reconsidering decisions multiple times. While some acknowledged its high intelligence, they questioned if its token efficiency was a drawback, especially compared to more concise models like GPT 5.5, even if GLM's overall cost per task might be lower.

Performance, Price & Proprietary Rivalry

The discussion heavily revolved around comparing GLM-5.2's performance and cost-effectiveness against both other open-weight models (like DeepSeek V4 Flash, Kimi) and proprietary giants (GPT, Opus, Fable). Many celebrated GLM-5.2 as a 'huge win' for open-weights, nearing 'Opus 4.7 quality' at 'stupid prices' and posing a significant challenge to established players. However, some shared personal experiences where GLM-5.2 didn't quite live up to frontier models, particularly in nuanced tasks or 'code taste'.

Local Lore & Deployment Difficulties

A recurring theme was the desire for and challenges associated with running powerful models like GLM-5.2 locally. Users pondered how close businesses are to buying hardware for on-premise AI for privacy benefits. While some predicted it's 'years' away due to hardware shortages and setup friction, others highlighted the benefits of local execution and community efforts to run models on consumer hardware, despite the upfront costs and learning curve.

Multimodal Musings & Benchmark Scrutiny

Commenters inquired about GLM-5.2's multimodal capabilities, specifically image input, noting its absence compared to some competitors. This led to discussions about combining different models for multimodal tasks or the potential benefits if GLM were trained with vision. Additionally, skepticism was voiced regarding the direct correlation between benchmarks and real-world utility, with some suggesting that official benchmarks might not always reflect practical performance.

API Arbitrage & Capacity Concerns

The conversation also touched on the availability and pricing discrepancies for GLM-5.2's API. Users noted that third-party providers often offer significantly lower rates than Z ai's official API, sometimes for quantized versions of the model. This brought up warnings about potential quality degradation from unofficial providers. Furthermore, Z ai's servers were reported to be struggling with capacity, leading to timeouts and slow speeds for direct API access.