Google launches Gemini 3, Claims Performance Gains Across Core AI Benchmarks

Google has introduced Gemini 3, the latest version of its large multimodal model, positioning it as a significant upgrade across reasoning, math, science and complex multimodal tasks. The company said the new model marks one of its strongest benchmark showings to date, claiming leadership over competing systems such as GPT 5.1, Claude Sonnet 4.5 and recent versions of Grok. While external evaluations are still emerging, Google highlighted a series of internal and third-party tests to support the model’s performance improvements.

According to Google, Gemini 3 delivers more consistent results in mathematical reasoning, scientific problem solving and code generation. These areas have become important indicators of foundational model capability, with companies increasingly using them to evaluate reliability, adaptability and depth of understanding. The company said improvements are driven by enhancements in architecture, training data diversity and more refined multimodal alignment, particularly in image, text and structured data processing.

The announcement follows a year of intensifying competition among leading AI developers. OpenAI introduced GPT 5.1 earlier this month with an emphasis on customization, agent-style workflows and deeper memory features. Anthropic has continued to expand its Claude Sonnet series, introducing updated versions designed to strengthen long-context performance and structured reasoning. Google’s positioning of Gemini 3 as a benchmark leader indicates the company’s intent to reassert competitiveness in areas where rivals have made rapid advancements.

In a statement shared during the launch, Google said Gemini 3 shows stronger results in standard math and coding benchmarks such as GSM8K, MATH and HumanEval. For scientific reasoning, the company referenced improvements in datasets that test conceptual understanding in physics, chemistry and biology. The company also said the model has developed more stable step-by-step reasoning, reducing the incidence of incorrect intermediate steps that can lead to significant final errors.

The model’s multimodal stack has also been updated. Google noted that Gemini 3 demonstrates improved ability to interpret images, charts, diagrams and mixed-format inputs. This includes better grounding in visual elements and an enhanced ability to produce text responses that accurately reference visual cues. Multimodal reliability has become a central point of comparison between major models, particularly as enterprises adopt AI for data extraction, compliance workflows and analytic support.

One specific area Google highlighted is the model’s ability to handle complex math involving visual components such as geometric figures or plotted datasets. This capability is increasingly relevant as organisations look for AI tools that can support technical workflows that require combined textual and graphical interpretation.

The launch of Gemini 3 comes amid a broader strategic shift within Google’s AI ecosystem. The company has consolidated its research, infrastructure and applied AI teams across Google Research, DeepMind and the cloud division. The objective is to streamline model development and accelerate deployment across consumer and enterprise products. With Gemini 3, the company said improvements in efficiency and inference cost also support new applications for partners using Google Cloud’s AI stack.

Industry analysts noted that Google’s emphasis on benchmark leadership reflects ongoing scrutiny of AI model evaluation. Benchmark tests remain widely used but vary in structure, dataset size and sensitivity to training methodologies. The company acknowledged that performance in real-world applications may differ depending on use cases, prompting continued investment in fine-tuning, safety reviews and partnered testing.

The introduction of Gemini 3 also aligns with Google’s goals to maintain competitiveness in the developer and enterprise markets, where customers increasingly evaluate models based on cost efficiency, compliance capabilities and reliability under high-volume workloads. By presenting gains in reasoning-heavy tasks, Google is positioning Gemini 3 as a model suited for enterprise applications such as analytics automation, decision-support systems and technical research assistance.

Early industry reaction has centered on how the model compares to GPT 5.1 and Claude Sonnet 4.5, two of its closest competitors. While benchmark claims suggest improvements, real-world testing is expected to provide clearer insight into how Gemini 3 performs in longer workflows, collaborative environments and multi-step reasoning. Developers also highlighted interest in inference speed, which has become a key differentiator as large-scale AI adoption increases.

Google said Gemini 3 will be available through its cloud platform, API access and integration into consumer products. The company plans to release further technical details, including training methodology and safety evaluations, over the coming weeks.

The launch continues a competitive cycle among major AI firms, with each company seeking to demonstrate leadership in scientific reasoning, multimodal performance and model stability. As enterprises deepen adoption of AI across operations, reliability in these areas has become central to product selection. For Google, Gemini 3 represents an effort to reinforce its position in the global AI landscape by offering improved performance across widely scrutinized benchmark categories.