NVIDIA launched Nemotron 3 Super, a 120-billion-parameter open model with 12 billion active parameters designed to run complex agentic artificial intelligence systems at scale and improve throughput on the NVIDIA Blackwell platform.
Companies building multi-agent applications face two constraints: context explosion and a “thinking tax,” where workflows can generate up to 15x more tokens and repeated reasoning inflates cost and causes goal drift.
Nemotron 3 Super uses a hybrid mixture-of-experts architecture combining Mamba layers and transformer layers, a Latent MoE that activates four experts for the cost of one, and multi-token prediction for faster inference; only 12 billion of 120 billion parameters are active at inference. The model supports a 1‑million‑token context window and delivers up to 5x higher throughput and up to 2x higher accuracy versus the prior Nemotron Super, the company said in an announcement.
Related reading
- NVIDIA to feature OpenClaw, 'build-a-claw' at its GTC 2026 conference
- Nvidia's Jetson chip is bringing AI out of the cloud and into the machines around us
- NVIDIA 'virtualizes' game development with new RTX PRO Server
The model runs in NVFP4 precision on Blackwell, cutting memory needs and yielding up to 4x faster inference than FP8 on NVIDIA Hopper, with no loss in accuracy. NVIDIA is releasing open weights under a permissive license, publishing methodology that covers over 10 trillion tokens of pre- and post-training data, 15 reinforcement learning training environments, and recipes for fine-tuning via the NeMo platform. NVIDIA also says Nemotron 3 Super powers its AI-Q research agent to the No. 1 positions on DeepResearch Bench and DeepResearch Bench II.
Nemotron 3 Super is available now through build.nvidia.com, Perplexity, OpenRouter and Hugging Face, and is packaged as an NVIDIA NIM microservice for on-premises and cloud deployment. Dell and HPE are integrating the model for enterprise hubs; Google Cloud Vertex AI and Oracle Cloud Infrastructure support deployment, with Amazon Bedrock and Microsoft Azure coming soon.
The recap
- NVIDIA releases Nemotron 3 Super, a 120-billion-parameter model.
- Model offers 1‑million‑token context window to retain workflow state.
- Available now via build.nvidia.com, Perplexity and Hugging Face.