NVIDIA releases open AI models and datasets to speed development
The US chipmaker has published a suite of open models, datasets and tools aimed at accelerating artificial intelligence development across language, robotics, autonomous vehicles and healthcare.
NVIDIA said it has released a range of open resources designed to support faster development of AI systems spanning agentic AI, physical AI, robotics, autonomous driving and biomedical research.
The company said the releases cover the Nemotron family for agentic AI, the Cosmos platform for physical AI, the Alpamayo family for autonomous vehicle development, Isaac GR00T for robotics and Clara for biomedical applications.
NVIDIA said it is contributing open training frameworks alongside one of the largest collections of open multimodal data, including 10 trillion language training tokens, 500,000 robotics trajectories, 455,000 protein structures and 100 terabytes of vehicle sensor data. It added that technology groups including Bosch, CodeRabbit, CrowdStrike, Cohesity, Fortinet, Franka Robotics, Humanoid, Palantir, Salesforce, ServiceNow, Hitachi and Uber are adopting and building on the technologies.
The company said Nemotron now includes speech, multimodal retrieval-augmented generation and safety models, with Nemotron Speech offering real-time captioning and performance it said is significantly faster than comparable systems. NVIDIA has also released datasets and training code for Nemotron models, along with updates to its large language model routing tools.
For physical AI, NVIDIA said it has released Cosmos Reason 2, Cosmos Transfer 2.5 and Cosmos Predict 2.5, alongside Isaac GR00T N1.6 for humanoid robots. It also introduced Alpamayo for reasoning-based autonomous vehicles, including simulation tools and more than 1,700 hours of driving data.
Related reading
- NVIDIA unveils AI inference memory platform powered by BlueField-4
- NVIDIA rolls out RTX upgrades for local generative AI workflows
- NVIDIA unveils Rubin platform as blueprint for next-generation DGX SuperPOD systems
In healthcare, the company said it has released Clara models including La-Proteina, ReaSyn v2, KERMT and RNAPro, together with a dataset of synthetic protein structures.
NVIDIA said the open resources are available via GitHub, Hugging Face and its developer platforms, and can also be deployed as NVIDIA NIM microservices.
The Recap
- NVIDIA released new open models, data and developer tools.
- Includes 10 trillion language tokens and significant scientific datasets.
- Open models available on GitHub, Hugging Face and build.nvidia.com.