Ternary Bonsai: Top Intelligence at 1.58 Bits

PrismML has unveiled "Ternary Bonsai," a new series of 1.58-bit language models designed to deliver a superior balance of memory efficiency and high accuracy. Building on their previous 1-bit Bonsai models, this release targets a sweet spot, offering a modest size increase for significant performance gains, crucial for deploying advanced AI on devices with limited resources.

True Ternary Architecture: Unlike some quantized models, Ternary Bonsai employs 1.58-bit representation across its entire network, including embeddings, attention layers, MLPs, and the LM head, using weights constrained to {-1, 0, +1}.
Exceptional Memory Compression: These models achieve a memory footprint approximately 9 times smaller than standard 16-bit models, making them highly efficient.
Benchmark Dominance: The 8B parameter version of Ternary Bonsai scores 75.5 on average across benchmarks, outperforming most peers in its class despite being 9-10 times smaller, and ranks just behind Qwen3 8B.
Extended Pareto Frontier: Ternary Bonsai further shifts the performance-versus-size curve established by its 1-bit predecessors, offering developers more flexible trade-offs between memory, throughput, and model quality.
High Throughput & Energy Efficiency: The models demonstrate strong throughput (e.g., 82 toks/sec on M4 Pro, 27 toks/sec on iPhone 17 Pro Max) and are 3-4 times more energy-efficient than their 16-bit counterparts.
Accessibility: Ternary Bonsai models run natively on Apple devices via MLX, and their weights are publicly available today under the Apache 2.0 License.

This release represents a significant step towards democratizing powerful AI, enabling sophisticated language models to operate effectively and efficiently on a broader range of hardware, from mobile phones to embedded systems.

Ternary Bonsai: Top Intelligence at 1.58 Bits

The Lowdown