AI models are terrible at betting on soccer—especially xAI Grok



“Every frontier model we evaluated lost money over the season and many experienced ruin,” the authors of the paper concluded, with the AI “systematically underperforming humans” in this scenario.

AI Model Mean ROI Best try Worst try Mean final bankroll
Anthropic Claude Opus 4.6 –11.0% –0.2% –18.8% £89,035
OpenAI GPT-5.4 –13.6% –4.1% –31.6% £86,365
Google Gemini 3.1 Pro –43.3% +33.7% –100.0% £56,715
Google Gemini Flash 3.1 LP –58.4% +24.7% –100.0% £41,605
Z.AI GLM-5 –58.8% –14.3% –100.0% £41,221
Moonshot Kimi K2.5 –68.3% –27.0% –100.0% £7,420
xAI Grok 4.20 –100.0% –100.0% –100.0% £0
Acree Trinity –100.0% –100.0% –100.0% £0
Each model began with a £100,000 normalized bankroll. Return on investment and final bankroll are averaged across three tries. Grok and Trinity did not complete every attempt.

The results offer some comfort to white-collar professionals and businesses who are fretting that AI could take their jobs, as it roils the shares of industries from finance to marketing.

Ross Taylor, one of the study’s authors and General Reasoning’s chief executive, said: “There is so much hype about AI automation, but there’s not a lot of measurement of putting AI into a longtime horizon setting.”

He added that many of the benchmarks typically used to test AI are flawed because they are set in “very static environments” that bear little resemblance to the chaos and complexity of the real world.

General Reasoning’s paper, which has not yet been peer reviewed, provides a counterweight to growing excitement in Silicon Valley about the huge recent leaps in AI’s ability to complete computer programming tasks with little to no human intervention.

Taylor, a former Meta AI researcher, said: “If you… try AI on some real-world tasks, it does really badly… Yes, software engineering is very important and economically valuable, but there are lots of other activities with longer time horizons that are important to look at.”

© 2026 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.



Source link

  • Related Posts

    ‘Crimson Desert’ Is a Cat Dad Simulator

    Step into the shoes of the strongest, goodest boy in a game that is beautiful, baffling, and impossible to put down. Source link

    ‘It has your name on it, but I don’t think it’s you’: how AI is impersonating musicians on Spotify | AI (artificial intelligence)

    Jason Moran, a renowned jazz composer and pianist, got a strange call from a friend last month. The friend, bassist Burniss Earl Travis, was curious about Moran’s new record that…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    First Nations’ court challenge may block Alberta separatism itself, not just petition drive

    First Nations’ court challenge may block Alberta separatism itself, not just petition drive

    PSL 2026 – Lahore Qalandars’ Parvez Hossain Emon out of season with injury

    PSL 2026 – Lahore Qalandars’ Parvez Hossain Emon out of season with injury

    VP Vance Arrives in Pakistan for Peace Talks Between US and Iran

    VP Vance Arrives in Pakistan for Peace Talks Between US and Iran

    ‘Crimson Desert’ Is a Cat Dad Simulator

    ‘Crimson Desert’ Is a Cat Dad Simulator

    Jubilant return of Artemis II shadowed by ‘extinction-level’ cuts to Nasa: ‘It’s discordant’ | Artemis II

    Jubilant return of Artemis II shadowed by ‘extinction-level’ cuts to Nasa: ‘It’s discordant’ | Artemis II

    UK starts crackdown on EU citizens’ post-Brexit rights | Brexit

    UK starts crackdown on EU citizens’ post-Brexit rights | Brexit