A Caltech spin-out called PrismML has released Bonsai 8B, a large language model built on 1-bit architecture that fits into 1.15 GB of memory and is designed to run on Apple devices and Nvidia GPUs without a cloud connection.
The model targets edge and on-device deployment, where power constraints and limited bandwidth make conventional AI models impractical. Most large language models in production use 16-bit or 32-bit floating point representations for their weights, consuming substantially more memory. PrismML's approach strips that back to a single bit per weight, representing each value only as its sign, either minus one or plus one, with a shared scale factor applied across groups of weights.
The company says this architecture avoids the problems that have historically made low-bit models unreliable, including poor instruction following and inconsistent tool use, issues that earlier quantization attempts struggled to resolve.
PrismML defines its own benchmark for evaluating the tradeoff between model size and capability, which it calls intelligence density. On that measure, Bonsai 8B scores 1.06 per gigabyte against 0.10 per gigabyte for Qwen3 8B, a full-precision model of equivalent parameter count, implying more than a tenfold improvement in what the company is calling useful reasoning per unit of memory.
Related reading
- Microsoft launches three AI models under new superintelligence research unit
- Medical AI moves from research benchmark to clinical deployment
- Google rolls out AI upgrades across Search, Maps and developer tools
Babak Hassibi, founder and chief executive of PrismML, said the company spent years developing the mathematical theory needed to compress a neural network without degrading its reasoning capability. The white paper published alongside the release sets out the techniques and tradeoffs behind extreme quantization in detail.
Bonsai 8B runs natively on Apple hardware via MLX and on Nvidia GPUs via llama.cpp CUDA. Model weights are available now under the Apache 2.0 licence. PrismML has also published smaller variants at 4 billion and 1.7 billion parameters for more constrained deployments.
The recap
- PrismML released the 1-bit Bonsai 8B large language model.
- Model fits in 1.15 GB and is 14x smaller.
- Model weights are available today under the Apache 2.0 License.