Google announces Gemini 3.1 Pro, says it’s better at complex problem-solving


Another day, another Google AI model. Google has really been pumping out new AI tools lately, having just released Gemini 3 in November. Today, it’s bumping the flagship model to version 3.1. The new Gemini 3.1 Pro is rolling out (in preview) for developers and consumers today with the promise of better problem-solving and reasoning capabilities.

Google announced improvements to its Deep Think tool last week, and apparently, the “core intelligence” behind that update was Gemini 3.1 Pro. As usual, Google’s latest model announcement comes with a plethora of benchmarks that show mostly modest improvements. In the popular Humanity’s Last Exam, which tests advanced domain-specific knowledge, Gemini 3.1 Pro scored a record 44.4 percent. Gemini 3 Pro managed 37.5 percent, while OpenAI’s GPT 5.2 got 34.5 percent.

Gemini 3.1 Pro benchmarks

Google also calls out the model’s improvement in ARC-AGI-2, which features novel logic problems that can’t be directly trained into an AI. Gemini 3 was a bit behind on this evaluation, reaching a mere 31.1 percent versus scores in the 50s and 60s for competing models. Gemini 3.1 Pro more than doubles Google’s score, reaching a lofty 77.1 percent.

Google has often gloated when it releases new models that they’ve already hit the top of the Arena leaderboard (formerly LM Arena), but that’s not the case this time. For text, Claude Opus 4.6 edges out the new Gemini by four points at 1504. For code, Opus 4.6, Opus 4.5, and GPT 5.2 High all run ahead of Gemini 3.1 Pro by a bit more. It’s worth noting, however, that the Arena leaderboard is run on vibes. Users vote on the outputs they like best, which can reward outputs that look correct regardless of whether they are.



Source link

  • Related Posts

    The 10 Best Shows to Stream Right Now (February 2026)

    No matter how well your favorite streaming service’s algorithm knows you, come February, sometimes even the smartest technology can be swayed by the power of Valentine’s Day. Hence all those…

    The boys’ club no one was supposed to write about

    If you work in tech, Wired’s new cover story isn’t exactly going to shatter your worldview, but it’s a genuinely great read all the same. Reporter Zoë Bernard spent months…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Frontiers of Pandora From the Ashes, Amazon Ravensburger Puzzles, and a Logitech Gaming Headset

    Frontiers of Pandora From the Ashes, Amazon Ravensburger Puzzles, and a Logitech Gaming Headset

    Mikaela Shiffrin drops F-bomb, talks celebrating gold with espresso martini

    Mikaela Shiffrin drops F-bomb, talks celebrating gold with espresso martini

    9 Best Libido Supplements For Women (2026)

    9 Best Libido Supplements For Women (2026)

    4 extortion investigations launched in Vancouver — the first in the city

    4 extortion investigations launched in Vancouver — the first in the city

    The 10 Best Shows to Stream Right Now (February 2026)

    The 10 Best Shows to Stream Right Now (February 2026)

    Boeing 777-9 EIS Nearer As FAA & EASA Qualify First Simulators

    Boeing 777-9 EIS Nearer As FAA & EASA Qualify First Simulators