NVIDIA unveils AI inference memory platform powered by BlueField-4
The world's largest company has introduced a new storage platform designed to improve performance and efficiency for large-scale AI inference workloads.
NVIDIA said it has launched an Inference Context Memory Storage Platform powered by its BlueField-4 data processing unit, unveiling the technology at CES.
The company said artificial intelligence models generate large volumes of context data in the form of key-value caches, which are critical for accuracy, continuity and user experience. It added that storing this data long-term on graphics processors can create bottlenecks for real-time inference, particularly in multi-agent systems.
According to NVIDIA, the new platform extends effective GPU memory capacity and enables high-speed sharing of context data across rack-scale clusters. The company said this can increase tokens processed per second by up to five times while delivering up to five times greater power efficiency compared with conventional storage approaches.
The platform uses NVIDIA’s DOCA framework and integrates with the NVIDIA NIXL library and NVIDIA Dynamo software to accelerate key-value cache sharing, reduce time to first token and improve responsiveness across multiple interactions, the company said. It added that hardware-accelerated cache placement managed by BlueField-4 reduces data movement, removes metadata overhead and provides secure, isolated access for GPU nodes.
Related reading
- NVIDIA rolls out RTX upgrades for local generative AI workflows
- NVIDIA releases open AI models and datasets to speed development
- NVIDIA unveils Rubin platform as blueprint for next-generation DGX SuperPOD systems
NVIDIA said storage suppliers including AIC, Cloudian, DDN, Dell Technologies, Hewlett-Packard Enterprise, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data and WEKA are among the first to build next-generation AI storage platforms using BlueField-4.
The company said BlueField-4 is expected to be available in the second half of 2026.
The Recap
- BlueField-4 powers a new Inference Context Memory Storage Platform.
- Platform can boost tokens per seconds by up to 5x.
- BlueField-4 will be available in the second half of 2026.