NVIDIA Nemotron 3 Super boosts agentic AI throughput 5x

NVIDIA launched Nemotron 3 Super, a 120-billion-parameter open model with 12 billion active parameters designed to run complex agentic artificial intelligence systems at scale and improve throughput on the NVIDIA Blackwell platform.

Companies building multi-agent applications face two constraints: context explosion and a “thinking tax,” where workflows can generate up to 15x more tokens and repeated reasoning inflates cost and causes goal drift.

Nemotron 3 Super uses a hybrid mixture-of-experts architecture combining Mamba layers and transformer layers, a Latent MoE that activates four experts for the cost of one, and multi-token prediction for faster inference; only 12 billion of 120 billion parameters are active at inference. The model supports a 1‑million‑token context window and delivers up to 5x higher throughput and up to 2x higher accuracy versus the prior Nemotron Super, the company said in an announcement.

The model runs in NVFP4 precision on Blackwell, cutting memory needs and yielding up to 4x faster inference than FP8 on NVIDIA Hopper, with no loss in accuracy. NVIDIA is releasing open weights under a permissive license, publishing methodology that covers over 10 trillion tokens of pre- and post-training data, 15 reinforcement learning training environments, and recipes for fine-tuning via the NeMo platform. NVIDIA also says Nemotron 3 Super powers its AI-Q research agent to the No. 1 positions on DeepResearch Bench and DeepResearch Bench II.

Nemotron 3 Super is available now through build.nvidia.com, Perplexity, OpenRouter and Hugging Face, and is packaged as an NVIDIA NIM microservice for on-premises and cloud deployment. Dell and HPE are integrating the model for enterprise hubs; Google Cloud Vertex AI and Oracle Cloud Infrastructure support deployment, with Amazon Bedrock and Microsoft Azure coming soon.

The recap

NVIDIA releases Nemotron 3 Super, a 120-billion-parameter model.
Model offers 1‑million‑token context window to retain workflow state.
Available now via build.nvidia.com, Perplexity and Hugging Face.

Subscribe to Our Newsletter

NVIDIA Nemotron 3 Super boosts agentic AI throughput 5x

The recap

OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks

Explore topics

Tech

Artificial Intelligence

Business

Entertainment & Sport

Top tags

NVIDIA Nemotron 3 Super boosts agentic AI throughput 5x

Related reading

The recap

OpenAI redesigns AI agent defences against manipulation attacks that mimic human social engineering

Perplexity outlines product modes and teases Comet, a new browser built around AI search

Microsoft brings high-speed AI model engine Fireworks AI to its Azure cloud platform

Mastercard recruits Binance, Ripple and PayPal for crypto program

OpenAI gives its developers API a built-in computers to run complex, multi-step AI tasks