Ternary Bonsai: Top Intelligence at 1.58 Bits
PrismML introduces Ternary Bonsai, a new family of 1.58-bit language models that achieve impressive intelligence density while dramatically reducing memory footprint. These models push the efficiency-performance Pareto frontier even further, enabling high-performing AI on resource-constrained devices. Hacker News enthusiasts will appreciate the technical innovation in model compression and the practical implications for widespread AI deployment.
The Lowdown
PrismML has unveiled "Ternary Bonsai," a new series of 1.58-bit language models designed to deliver a superior balance of memory efficiency and high accuracy. Building on their previous 1-bit Bonsai models, this release targets a sweet spot, offering a modest size increase for significant performance gains, crucial for deploying advanced AI on devices with limited resources.
- True Ternary Architecture: Unlike some quantized models, Ternary Bonsai employs 1.58-bit representation across its entire network, including embeddings, attention layers, MLPs, and the LM head, using weights constrained to {-1, 0, +1}.
- Exceptional Memory Compression: These models achieve a memory footprint approximately 9 times smaller than standard 16-bit models, making them highly efficient.
- Benchmark Dominance: The 8B parameter version of Ternary Bonsai scores 75.5 on average across benchmarks, outperforming most peers in its class despite being 9-10 times smaller, and ranks just behind Qwen3 8B.
- Extended Pareto Frontier: Ternary Bonsai further shifts the performance-versus-size curve established by its 1-bit predecessors, offering developers more flexible trade-offs between memory, throughput, and model quality.
- High Throughput & Energy Efficiency: The models demonstrate strong throughput (e.g., 82 toks/sec on M4 Pro, 27 toks/sec on iPhone 17 Pro Max) and are 3-4 times more energy-efficient than their 16-bit counterparts.
- Accessibility: Ternary Bonsai models run natively on Apple devices via MLX, and their weights are publicly available today under the Apache 2.0 License.
This release represents a significant step towards democratizing powerful AI, enabling sophisticated language models to operate effectively and efficiently on a broader range of hardware, from mobile phones to embedded systems.