Maybe AI agents can be lawyers after all


Last month, I wrote about Mercor’s new benchmark measuring AI agents’ capabilities on professional tasks like law and corporate analysis. At the time, the scores were pretty dismal, with every major lab scoring under 25%, so we concluded lawyers were safe from AI displacement, at least for now.

But AI capabilities can change a lot in a couple of weeks.

This week’s release of Anthropic’s Opus 4.6 shook up the leaderboards, with Anthropic’s new model scoring just shy of 30% in one-shot trials, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including “agent swarms,” which may have helped with this kind of multistep problem-solving.

Regardless, the score is a huge jump from the previous state-of-the-art, and a sign that progress on foundation models isn’t slowing down. Mercor CEO Brendan Foody, who was particularly impressed, said, “jumping from 18.4% to 29.8% in a few months is insane.”

The APEX-Agents Leaderboard.Image Credits:Mercor (screenshot)

Thirty percent is still a long way from 100%, so it’s not like lawyers need to be worried about getting replaced by machines next week. But they should be a lot less confident than they were last month!



Source link

  • Related Posts

    NASA is sending Crew-12 astronauts to the ISS on February 11

    The Crew-12 astronauts will soon make their way to the ISS, joining the three remaining spacefarers on board after the previous mission was cut short due to a medical concern.…

    7 Game-Day Foods You Can Make in an Air Fryer That Aren’t Wings

    Chicken wings have been typecast as football food, especially around this time of year. I’m not saying those crispy flats and drumettes aren’t perfect for a watch party, but there…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Menopause linked to grey matter loss in key brain regions

    Menopause linked to grey matter loss in key brain regions

    Latest from authorities as they search for Savannah Guthrie’s mother

    Latest from authorities as they search for Savannah Guthrie’s mother

    NASA is sending Crew-12 astronauts to the ISS on February 11

    NASA is sending Crew-12 astronauts to the ISS on February 11

    DCO Concludes 5th General Assembly with Adoption of the Kuwait Declaration on Responsible AI for Global Digital Prosperity

    London Knights return from 3-goal deficit to edge Niagara IceDogs in overtime – London

    London Knights return from 3-goal deficit to edge Niagara IceDogs in overtime – London

    How Many Flights Does The Boeing Dreamlifter Typically Fly In A Year?

    How Many Flights Does The Boeing Dreamlifter Typically Fly In A Year?