Google launches Gemini 3 Pro vision AI model

Google has introduced Gemini 3 Pro, describing the latest LLM platform as its most capable multimodal model to date.

Gemini 3 Pro boasts state-of-the-art performance in document, spatial, screen and video understanding, Google said.

The model is designed for complex visual reasoning and processing across unstructured documents, physical environments, user interfaces and long-form video, and is available to developers through Google AI Studio.

In document understanding, Gemini 3 Pro advances optical character recognition and “derendering” of visual documents into structured formats such as HTML, LaTeX and Markdown, and Google said it exceeds a human baseline on the CharXiv Reasoning benchmark with a score of 80.5%.

Google added that the model also introduces spatial capabilities including pixel-level pointing, open-vocabulary object references for robotics and AR/XR, and screen understanding aimed at automating computer use tasks such as QA testing, onboarding and UX analytics.

For video, Gemini 3 Pro is optimized for higher frame-rate analysis, processing clips at up to 10 frames per second to capture fast actions, and an upgraded “thinking” mode is intended to support cause-and-effect reasoning and code generation from long videos.

Related reading

The model underpins applications in education, medical and biomedical imaging, and professional domains such as law and finance, the tech giant added, highlighting that it gives developers control over visual token usage through a new media resolution parameter that balances image fidelity against cost and latency.

Google said it is “excited to see what you build with these new capabilities” and directed developers to its Gemini 3.0 documentation guide and Google AI Studio to start working with Gemini 3 Pro.

The recap

Google unveils Gemini 3 Pro multimodal model for advanced vision tasks.
It targets documents, spatial reasoning, screen automation and high-frame-rate video.
Developers access Gemini 3 Pro via Google AI Studio and documentation.

Subscribe to Our Newsletter

Google launches Gemini 3 Pro vision AI model

The recap

Most token launches fail in the months after listing, not on the day itself, Kraken research finds

Most workers are using AI wrong, Google and Stanford study finds

Nvidia brings 90 frames-per-second VR streaming to GeForce NOW as cloud gaming pushes into headsets

Kraken received nearly 8,000 law enforcement data requests in 2025 as regulatory scrutiny of crypto intensifies

Google expands open shopping protocol to let AI agents browse, compare and check out like human customers

Explore topics

Tech

Artificial Intelligence

Business

Entertainment & Sport

Top tags