HN
Today

Ternary Bonsai: Top Intelligence at 1.58 Bits

PrismML introduces Ternary Bonsai, a new family of 1.58-bit language models that achieve impressive intelligence density while dramatically reducing memory footprint. These models push the efficiency-performance Pareto frontier even further, enabling high-performing AI on resource-constrained devices. Hacker News enthusiasts will appreciate the technical innovation in model compression and the practical implications for widespread AI deployment.

11
Score
3
Comments
#6
Highest Rank
11h
on Front Page
First Seen
Apr 21, 12:00 AM
Last Seen
Apr 21, 10:00 AM
Rank Over Time
177666101114161414

The Lowdown

PrismML has unveiled "Ternary Bonsai," a new series of 1.58-bit language models designed to deliver a superior balance of memory efficiency and high accuracy. Building on their previous 1-bit Bonsai models, this release targets a sweet spot, offering a modest size increase for significant performance gains, crucial for deploying advanced AI on devices with limited resources.

  • True Ternary Architecture: Unlike some quantized models, Ternary Bonsai employs 1.58-bit representation across its entire network, including embeddings, attention layers, MLPs, and the LM head, using weights constrained to {-1, 0, +1}.
  • Exceptional Memory Compression: These models achieve a memory footprint approximately 9 times smaller than standard 16-bit models, making them highly efficient.
  • Benchmark Dominance: The 8B parameter version of Ternary Bonsai scores 75.5 on average across benchmarks, outperforming most peers in its class despite being 9-10 times smaller, and ranks just behind Qwen3 8B.
  • Extended Pareto Frontier: Ternary Bonsai further shifts the performance-versus-size curve established by its 1-bit predecessors, offering developers more flexible trade-offs between memory, throughput, and model quality.
  • High Throughput & Energy Efficiency: The models demonstrate strong throughput (e.g., 82 toks/sec on M4 Pro, 27 toks/sec on iPhone 17 Pro Max) and are 3-4 times more energy-efficient than their 16-bit counterparts.
  • Accessibility: Ternary Bonsai models run natively on Apple devices via MLX, and their weights are publicly available today under the Apache 2.0 License.

This release represents a significant step towards democratizing powerful AI, enabling sophisticated language models to operate effectively and efficiently on a broader range of hardware, from mobile phones to embedded systems.