Subscribe to Our Newsletter

Success! Now Check Your Email

To complete Subscribe, click the confirmation link in your inbox. If it doesn’t arrive within 3 minutes, check your spam folder.

Ok, Thanks

Caltech spin-out PrismML launches 1-bit AI model that fits into 1.15 GB and runs on Apple devices

Bonsai 8B is designed for edge and on-device deployment, where cloud-dependent AI cannot reach. The company claims a 10x improvement in intelligence per gigabyte over full-precision rivals

Defused News Writer profile image
by Defused News Writer
Caltech spin-out PrismML launches 1-bit AI model that fits into 1.15 GB and runs on Apple devices

A Caltech spin-out called PrismML has released Bonsai 8B, a large language model built on 1-bit architecture that fits into 1.15 GB of memory and is designed to run on Apple devices and Nvidia GPUs without a cloud connection.

The model targets edge and on-device deployment, where power constraints and limited bandwidth make conventional AI models impractical. Most large language models in production use 16-bit or 32-bit floating point representations for their weights, consuming substantially more memory. PrismML's approach strips that back to a single bit per weight, representing each value only as its sign, either minus one or plus one, with a shared scale factor applied across groups of weights.

The company says this architecture avoids the problems that have historically made low-bit models unreliable, including poor instruction following and inconsistent tool use, issues that earlier quantization attempts struggled to resolve.

PrismML defines its own benchmark for evaluating the tradeoff between model size and capability, which it calls intelligence density. On that measure, Bonsai 8B scores 1.06 per gigabyte against 0.10 per gigabyte for Qwen3 8B, a full-precision model of equivalent parameter count, implying more than a tenfold improvement in what the company is calling useful reasoning per unit of memory.

Babak Hassibi, founder and chief executive of PrismML, said the company spent years developing the mathematical theory needed to compress a neural network without degrading its reasoning capability. The white paper published alongside the release sets out the techniques and tradeoffs behind extreme quantization in detail.

Bonsai 8B runs natively on Apple hardware via MLX and on Nvidia GPUs via llama.cpp CUDA. Model weights are available now under the Apache 2.0 licence. PrismML has also published smaller variants at 4 billion and 1.7 billion parameters for more constrained deployments.

The recap

  • PrismML released the 1-bit Bonsai 8B large language model.
  • Model fits in 1.15 GB and is 14x smaller.
  • Model weights are available today under the Apache 2.0 License.
Defused News Writer profile image
by Defused News Writer

Explore stories