Google launches Gemini 3 Pro vision AI model
Gemini 3 Pro is Google’s latest multimodal model focused on advanced visual and spatial understanding for documents, screens, video and real-world applications.
Google has introduced Gemini 3 Pro, describing the latest LLM platform as its most capable multimodal model to date.
Gemini 3 Pro boasts state-of-the-art performance in document, spatial, screen and video understanding, Google said.
The model is designed for complex visual reasoning and processing across unstructured documents, physical environments, user interfaces and long-form video, and is available to developers through Google AI Studio.
In document understanding, Gemini 3 Pro advances optical character recognition and “derendering” of visual documents into structured formats such as HTML, LaTeX and Markdown, and Google said it exceeds a human baseline on the CharXiv Reasoning benchmark with a score of 80.5%.
Google added that the model also introduces spatial capabilities including pixel-level pointing, open-vocabulary object references for robotics and AR/XR, and screen understanding aimed at automating computer use tasks such as QA testing, onboarding and UX analytics.
For video, Gemini 3 Pro is optimized for higher frame-rate analysis, processing clips at up to 10 frames per second to capture fast actions, and an upgraded “thinking” mode is intended to support cause-and-effect reasoning and code generation from long videos.
The model underpins applications in education, medical and biomedical imaging, and professional domains such as law and finance, the tech giant added, highlighting that it gives developers control over visual token usage through a new media resolution parameter that balances image fidelity against cost and latency.
Google said it is “excited to see what you build with these new capabilities” and directed developers to its Gemini 3.0 documentation guide and Google AI Studio to start working with Gemini 3 Pro.
The recap
- Google unveils Gemini 3 Pro multimodal model for advanced vision tasks.
- It targets documents, spatial reasoning, screen automation and high-frame-rate video.
- Developers access Gemini 3 Pro via Google AI Studio and documentation.