Arm's Cortex X925: Reaching Desktop Performance
Arm's Cortex X925 microarchitecture is thoroughly dissected, demonstrating its impressive branch prediction and out-of-order engine deliver performance parity with leading AMD Zen 5 and Intel Lion Cove desktop CPUs. This technical deep dive highlights how Arm, traditionally focused on low power, is now a serious contender in the high-performance desktop market. The detailed analysis sparked a lively debate about its real-world implications and comparisons, particularly with Apple Silicon.
The Lowdown
Arm, historically known for low-power and low-area designs, has made a significant leap into the high-performance computing segment with its Cortex X925 core. This detailed analysis reveals how the X925 achieves desktop-level performance, matching and in some cases exceeding the capabilities of AMD's Zen 5 and Intel's Lion Cove in their fastest desktop implementations.
- The Cortex X925 is a massive 10-wide core, engineered purely for maximum performance, eschewing the power and area compromises of previous Arm cores.
- It demonstrates state-of-the-art branch prediction, performing comparably to AMD Zen 5 and often surpassing Intel Lion Cove in SPEC CPU2017 tests.
- Its out-of-order execution engine is remarkably large, capable of keeping approximately 525 instructions in flight, positioning it competitively against Intel and ahead of AMD.
- The frontend can sustain 10 instructions per cycle, but a lower clock speed (4 GHz) means its actual throughput can be less than higher-clocked x86 counterparts.
- The FPU boasts six pipes for vector floating-point operations, though its 128-bit vector width is narrower than the wider vector registers found in AMD and Intel designs.
- Memory subsystem features include a 64KB L1 data cache and configurable L2 cache options (2-3MB), with improved store forwarding capabilities.
- In SPEC CPU2017, the X925 shows excellent results in the integer suite, holding its own against x86. While slightly behind Zen 5 in floating-point due to higher instruction counts and narrower vectors, it maintains pace with Intel's Lion Cove.
Arm's achievement with the Cortex X925 signifies its successful entry into the desktop performance arena, proving that high IPC can offset clock speed deficits. However, challenges remain, particularly in areas like optimizing for gaming workloads, navigating the x86-dominated software ecosystem, and relying on partners for product integration.
The Gossip
Apple Arm Ambivalence
Many readers were surprised and questioned the article's lack of comparison to Apple Silicon's M-series chips, which are often seen as the benchmark for high-performance ARM. This led to a debate about the relevance of Apple's closed ecosystem versus general-purpose ARM designs, and whether Apple's proprietary nature makes direct comparisons less meaningful for the broader ARM market.
Memory Model Mayhem
The discussion highlighted a significant technical concern: the difference in memory ordering models between ARM (weak) and x86 (stronger, Total Store Order). Commenters worried that software developed primarily for x86 might harbor latent bugs related to race conditions, which would surface when run on ARM, leading to difficult-to-diagnose "Heisenbugs."
Vector and Cache Vestiges
Technical users delved into the specifics of Cortex X925's microarchitecture, particularly its vector processing capabilities and cache design. There was critical discussion about X925's narrower 128-bit vector width compared to x86's wider AVX-512, which could limit performance in highly optimized floating-point workloads. The intricacies of L1 cache sizing constraints due to common 4KB page sizes on x86, which ARM might circumvent, also surfaced.
RISC-V Ruminations
A segment of the conversation drifted to RISC-V, pondering its future as an alternative to both ARM and x86. Motivations discussed included avoiding ARM's licensing fees and intellectual property restrictions, as well as the appeal of an open-source architecture for pedagogical and embedded uses. However, commenters acknowledged that RISC-V has a long way to go to reach performance parity in the high-end desktop market.