LFM2-24B-A2B: Scaling Up the LFM2 Architecture
Liquid AI introduces LFM2-24B-A2B, their latest open-weight Mixture-of-Experts model, featuring 24 billion total parameters but an efficient 2 billion active per token. This model demonstrates impressive performance scaling and is specifically designed for deployment on edge devices and consumer hardware. It stands out for its fast inference, low memory footprint, and broad compatibility with popular inference engines.
The Lowdown
Liquid AI has unveiled LFM2-24B-A2B, their most substantial LFM2 model to date, demonstrating a successful scaling of their hybrid architecture. This new open-weight Mixture of Experts (MoE) model, with 24 billion total parameters but only 2 billion active per token, is engineered for efficient deployment across diverse environments, from cloud to consumer devices.
- Scalable Architecture: LFM2-24B-A2B is the largest in a family of models spanning nearly two orders of magnitude (350M to 24B), consistently showing quality gains on standard benchmarks.
- Hybrid MoE Design: It employs a unique hybrid architecture combining gated short convolution blocks with grouped query attention, optimized via hardware-in-the-loop search for fast prefill, decode, and low memory usage.
- Efficient Scaling Strategy: The model scales by increasing depth (from 24 to 40 layers) and expert count (32 to 64 per MoE block) while keeping the active parameter count lean (2.3B vs. 24B total), ensuring edge-friendly inference.
- Benchmark Performance: As an instruct model, LFM2-24B-A2B shows log-linear quality improvement with increased parameters across various academic and reasoning benchmarks.
- Broad Inference Support: It offers day-zero support for popular inference engines like llama.cpp, vLLM, and SGLang, compatible with both CPU and GPU, and provides multiple quantization options.
- Competitive Throughput: Benchmarks indicate superior prefill and decode throughput compared to similarly sized MoE models (Qwen3-30B-A3B, gpt-oss-20b) on both consumer hardware (AMD Ryzen AI Max+) and data center GPUs (H100 SXM5).
- Accessibility: The model is open-weight and available on Hugging Face, with documentation and a playground for users to fine-tune or test.
With its effective scaling, optimized hybrid MoE architecture, and strong performance benchmarks, LFM2-24B-A2B marks a significant advancement in deploying large language models efficiently on a wide range of hardware, including consumer-grade devices, as Liquid AI continues to push the boundaries of accessible AI.