Nvidia launches open multimodal model that unifies vision, audio and language for AI agents

Nemotron 3 Nano Omni delivers up to nine times higher throughput than rival open models, replacing fragmented perception pipelines with a single architecture

by Defused News Writer

Updated April 28, 2026

Nvidia launches open multimodal model that unifies vision, audio and language for AI agents

Nvidia has released Nemotron 3 Nano Omni, an open-source multimodal AI model that combines video, audio, image and text understanding into a single system designed to serve as the perceptual engine inside autonomous AI agents.

The model addresses a growing problem in agent development: current systems typically rely on separate models for vision, speech and language, losing time and context when passing data between them.

By embedding vision and audio encoders within a single 30-billion-parameter hybrid mixture-of-experts (MoE) architecture that activates only three billion parameters per task, Nemotron 3 Nano Omni eliminates those handoffs.

Nvidia said the model delivers up to nine times higher throughput than other open multimodal models with equivalent interactivity, and 2.9 times faster single-stream reasoning speed, translating to lower cost and better scalability without sacrificing responsiveness.

"To build useful agents, you can't wait seconds for a model to interpret a screen," said Gautier Cloix, chief executive of H Company, one of the early adopters.

"By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings, something that wasn't practical before."

The model is designed to function as the "eyes and ears" within a multi-agent system, working alongside larger reasoning models such as Nemotron 3 Super and Ultra, which handle planning and execution.

It supports context windows of up to 256,000 tokens, enough to sustain long-running agent loops, reason across video timelines and hold multi-document context without chunking.

The model tops six leaderboards for complex document intelligence, video understanding and audio comprehension, and leads the VoiceBench benchmark for audio understanding.

Companies already adopting the model include Foxconn, Palantir, DocuSign and H Company, with Dell Technologies, Oracle, Infosys and Zefr among those evaluating it.

Enterprise use cases range from customer service applications such as video verification of deliveries, through document intelligence for contracts and financial filings, to GUI automation for browser-based agents.

Nemotron 3 Nano Omni is available immediately on Hugging Face, OpenRouter and Nvidia's Build platform as an NIM microservice, with fully open weights, datasets and training recipes.

It runs across Nvidia's Ampere, Hopper and Blackwell GPU architectures and supports FP8 and NVFP4 quantisation for deployment on hardware ranging from local workstations to data centre clusters.

The broader Nemotron 3 family has been downloaded more than 50 million times in the past year.

The recap

NVIDIA unveils Nemotron 3 Nano Omni multimodal AI model.
The company claims up to 9x more efficient AI agents.
The company says Nemotron 3 Nano Omni is open.

by Defused News Writer

Updated April 28, 2026

Explore stories

AI News Tech AI Ethics

OpenAI outlines plans for superhuman AI

by Defused News Writer

April 28, 2026

Quantum Computing Tech Giants Cryptocurrency AI News Crypto Electric Vehicles Markets

Solana's two core development teams independently converge on Falcon as quantum-resistant upgrade

by Defused News Writer

April 28, 2026

AI News Tech AI Ethics Electric Vehicles Markets

OpenAI's missed targets expose the tension at the heart of AI's biggest bet

by Defused News Writer

April 28, 2026

AI News Start Ups Tech Start-ups Chipmakers Tech Giants

AlphaGo creator raises record $1.1bn seed round to build AI that learns without human data

by Defused News Writer

April 28, 2026

AI News AI Platforms Tech eCommerce & Retail Tech agentic AI Finance Fintech

Google donates Agent Payments Protocol to FIDO Alliance in push to standardise AI-driven commerce

by Defused News Writer

April 28, 2026

AI News AI Ethics Consciousness debate Tech AI Models & Research

DeepMind scientist argues no AI system will ever become conscious, calling the assumption a 'fundamental fallacy'

by Defused News Writer

April 27, 2026

AI News AI Platforms Tech

Michael Burry deepens bearish bet against Palantir while building stakes in beaten-down enterprise software

by Defused News Writer

April 27, 2026

AI News agentic AI Tech AI Models & Research Markets

Gemini exchange launches AI-powered trading as first regulated US platform to let bots execute orders

by Defused News Writer

April 27, 2026

Google AI News Education & Academia Tech agentic AI Tech Giants Telecoms

Google and Kaggle relaunch AI Agents Vibe course

by Defused News Writer

April 27, 2026

AI News Energy & Utilities IPOs & Listings

Meta signs deals to beam solar power from space and store it for 100 hours to feed AI data centres

by Defused News Writer

April 27, 2026

AI News agentic AI Public sector AI rollout Tech Climate & Environment Tech Giants

HMRC rolls out Microsoft Copilot to 28,000 staff

by Defused News Writer

April 27, 2026

Subscribe to Our Newsletter

Nvidia launches open multimodal model that unifies vision, audio and language for AI agents

The recap

OpenAI outlines plans for superhuman AI

Solana's two core development teams independently converge on Falcon as quantum-resistant upgrade

OpenAI's missed targets expose the tension at the heart of AI's biggest bet

AlphaGo creator raises record $1.1bn seed round to build AI that learns without human data

Early bitcoin investor Michael Terpin says price must fall to $57,000 before sustained rally can begin

Explore stories

OpenAI outlines plans for superhuman AI

Solana's two core development teams independently converge on Falcon as quantum-resistant upgrade

OpenAI's missed targets expose the tension at the heart of AI's biggest bet

AlphaGo creator raises record $1.1bn seed round to build AI that learns without human data

Google donates Agent Payments Protocol to FIDO Alliance in push to standardise AI-driven commerce

DeepMind scientist argues no AI system will ever become conscious, calling the assumption a 'fundamental fallacy'

Michael Burry deepens bearish bet against Palantir while building stakes in beaten-down enterprise software

Gemini exchange launches AI-powered trading as first regulated US platform to let bots execute orders

Google and Kaggle relaunch AI Agents Vibe course

Meta signs deals to beam solar power from space and store it for 100 hours to feed AI data centres

HMRC rolls out Microsoft Copilot to 28,000 staff

Explore topics

Tech

Artificial Intelligence

Business

Entertainment & Sport

Top tags

Nvidia launches open multimodal model that unifies vision, audio and language for AI agents

Related reading

The recap

OpenAI outlines plans for superhuman AI

Solana's two core development teams independently converge on Falcon as quantum-resistant upgrade

OpenAI's missed targets expose the tension at the heart of AI's biggest bet

AlphaGo creator raises record $1.1bn seed round to build AI that learns without human data

Early bitcoin investor Michael Terpin says price must fall to $57,000 before sustained rally can begin

Explore stories

OpenAI outlines plans for superhuman AI

Solana's two core development teams independently converge on Falcon as quantum-resistant upgrade

OpenAI's missed targets expose the tension at the heart of AI's biggest bet

AlphaGo creator raises record $1.1bn seed round to build AI that learns without human data

Judge tells Musk, Altman and Brockman to stop trading barbs on social media as OpenAI trial opens

Meta breaks ground on 32nd data centre as AI buildout accelerates across the United States

Google donates Agent Payments Protocol to FIDO Alliance in push to standardise AI-driven commerce

DeepMind scientist argues no AI system will ever become conscious, calling the assumption a 'fundamental fallacy'

Michael Burry deepens bearish bet against Palantir while building stakes in beaten-down enterprise software

Gemini exchange launches AI-powered trading as first regulated US platform to let bots execute orders

Google adds Albertsons grocery data to YouTube ad targeting through Commerce Media Suite

Google and Kaggle relaunch AI Agents Vibe course

Meta signs deals to beam solar power from space and store it for 100 hours to feed AI data centres

OpenAI breaks free from Microsoft exclusivity in rewritten partnership deal

HMRC rolls out Microsoft Copilot to 28,000 staff