A Comparison of Agentic AI Systems and Human Economists


This paper compares agentic AI systems and human economists performing the same causal inference tasks. AI systems and humans generally obtain similar median causal effect estimates. While there is substantial dispersion of estimates across model instances, the human distributions of estimates have wider tails. Using AI models as reviewers to compare and rank “submissions,” the following ranking emerges regardless of reviewer model: (1) Codex GPT-5.4, (2) Codex GPT-5.3-Codex, (3) Claude Code Opus 4.6, and (4) Human Researchers. These findings suggest that agentic AI systems will allow us to scale empirical research in economics.

I enjoy the name of the author, namely Serafin Grundl.  Here is the paper, via Ethan Mollick.  You could interpret these results as showing the AIs have fewer hallucinations.  And just to reiterate a key point from the paper:

The second part of this paper is an AI review tournament in which “submissions” (codes and write-ups) from humans and the AI models are compared and ranked against each other. The reviewers are the following AI models: Gemini 3.1 Pro Preview, Opus 4.6 and GPT-5.4. For each review the reviewer is asked to write a report comparing four submissions (human, Opus 4.6, GPT-5.3-Codex, GPT-5.4). Each reviewer model writes comparison reports for the same 300 comparison groups. The average rankings are strikingly similar across reviewer models: (1) Codex GPT-5.4, (2) Codex GPT-5.3-Codex, (3) Claude Code Opus 4.6, and 2(4) Human Researchers.

Who comes in last?  Hi people!




Source link

  • Related Posts

    Rob Shaw: Eby takes his sixth position on DRIPA after weekend backpedalling

    B.C. premier has reversed course once again after the First Nations Leadership Council threatened legal action, protests Source link

    Lenovo Brings Production-Scale AI to Hannover Messe 2026, Delivering Up to 85% Faster Lead Times for Manufacturers

    For more information, visit: https://techtoday.lenovo.com/ww/en/solutions/manufacturing/offeringsAbout Lenovo Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    EU foreign ministers meet to discuss Ukraine, Russia and the Middle East – Europe live | Europe

    EU foreign ministers meet to discuss Ukraine, Russia and the Middle East – Europe live | Europe

    PSL Sri Lanka allrounder Dasun Shanaka banned from PSL for one year

    PSL Sri Lanka allrounder Dasun Shanaka banned from PSL for one year

    Instagram says a bug turned your photos black and white

    Instagram says a bug turned your photos black and white

    Rob Shaw: Eby takes his sixth position on DRIPA after weekend backpedalling

    Rob Shaw: Eby takes his sixth position on DRIPA after weekend backpedalling

    Music executive behind K-pop group BTS faces arrest in South Korea | South Korea

    Music executive behind K-pop group BTS faces arrest in South Korea | South Korea

    Swiss Privacy Goes Global: Proton VPN Grows Coverage to 145 Countries

    Swiss Privacy Goes Global: Proton VPN Grows Coverage to 145 Countries