Maybe AI agents can be lawyers after all


Last month, I wrote about Mercor’s new benchmark measuring AI agents’ capabilities on professional tasks like law and corporate analysis. At the time, the scores were pretty dismal, with every major lab scoring under 25%, so we concluded lawyers were safe from AI displacement, at least for now.

But AI capabilities can change a lot in a couple of weeks.

This week’s release of Anthropic’s Opus 4.6 shook up the leaderboards, with Anthropic’s new model scoring just shy of 30% in one-shot trials, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including “agent swarms,” which may have helped with this kind of multistep problem-solving.

Regardless, the score is a huge jump from the previous state-of-the-art, and a sign that progress on foundation models isn’t slowing down. Mercor CEO Brendan Foody, who was particularly impressed, said, “jumping from 18.4% to 29.8% in a few months is insane.”

The APEX-Agents Leaderboard.Image Credits:Mercor (screenshot)

Thirty percent is still a long way from 100%, so it’s not like lawyers need to be worried about getting replaced by machines next week. But they should be a lot less confident than they were last month!



Source link

  • Related Posts

    Why has Elon Musk merged his rocket company with his AI startup? | Elon Musk

    As well as extending “the light of consciousness to the stars”, as Musk described it, the transaction creates a business worth $1.25tn (£920bn) by combining Musk’s rocket company with his…

    How to track your sleep and view your sleep data in Apple Health

    Apple Health brings sleep tracking, scheduling and long-term analysis into one place, with your iPhone acting as the hub and the Apple Watch doing the overnight monitoring. Once everything is…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    CP’s Winter Olympics advisory for Feb. 7 – National

    CP’s Winter Olympics advisory for Feb. 7 – National

    Why has Elon Musk merged his rocket company with his AI startup? | Elon Musk

    Why has Elon Musk merged his rocket company with his AI startup? | Elon Musk

    Gus Kenworthy: British freestyle skier will not be punished for graphic ICE message

    Gus Kenworthy: British freestyle skier will not be punished for graphic ICE message

    Jennifer Lawrence is Backing Tall Ugg Boots For 2026

    Jennifer Lawrence is Backing Tall Ugg Boots For 2026

    Spain, Portugal brace for new storm after floods kill 2, displace 11,000 | Environment News

    Spain, Portugal brace for new storm after floods kill 2, displace 11,000 | Environment News

    Video WH offers shifting explanations of why Gabbard was at Georgia election office raid

    Video WH offers shifting explanations of why Gabbard was at Georgia election office raid