A new open-weights AI coding model is closing in on proprietary options



On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models.

Perhaps more notably, Mistral didn’t just release an AI model, it released a new development app called Mistral Vibe. It’s a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

It’s always wise to take AI benchmarks with a large grain of salt, but we’ve heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, it’s one of the few standardized ways to compare coding models.

At the same time as the larger AI coding model, Mistral also released Devstral Small 2, a 24 billion parameter version that scores 68 percent on the same benchmark and can run locally on consumer hardware like a laptop with no Internet connection required. Both models support a 256,000 token context window, allowing them to process moderately large codebases (although whether you consider it large or small is very relative depending on overall project complexity). The company released Devstral 2 under a modified MIT license and Devstral Small 2 under the more permissive Apache 2.0 license.



Source link

  • Related Posts

    The Pentagon’s culture war tactic against Anthropic has backfired

    The stakes in the case—how much the government can punish a company for not playing ball—were apparent from the start. Anthropic drew lots of senior supporters with unlikely bedfellows among…

    Don’t Let Hackers Access Your Home Wi-Fi Network. Here’s How to Lock It Down

    Bolstering your home security often means door locks, alarm systems and security cameras, but a recent hack of 14,000 home routers shows you need to protect your home internet network…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Trump Claims ‘Regime Change’ in Iran Is Already Complete

    Ottawa earmarks $1.5 billion to support RCAF’s new Husky tanker fleet

    Ottawa earmarks $1.5 billion to support RCAF’s new Husky tanker fleet

    S&P/TSX composite up more than 300 points as oil tops US$100 per barrel

    S&P/TSX composite up more than 300 points as oil tops US$100 per barrel

    Indigenous people 8 times more likely to be incarcerated than others in N.L., says study

    Indigenous people 8 times more likely to be incarcerated than others in N.L., says study

    The Pentagon’s culture war tactic against Anthropic has backfired

    The Pentagon’s culture war tactic against Anthropic has backfired

    TSA officers to receive backpay as shutdown continues

    TSA officers to receive backpay as shutdown continues