Three reasons why DeepSeek’s new model matters


In terms of performance, V4 is, perhaps unsurprisingly, a huge jump from R1—and it seems to be a strong alternative to just about all the latest big AI models. On the major benchmarks, according to results shared by the company, DeepSeek V4-Pro competes with leading closed-source models, matching the performance of Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. And compared to other open-source models, such as Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released. 

DeepSeek also says that V4-Pro now ranks among the strongest open-source models on benchmarks for agentic coding tasks and performs well on other tests that measure ability to carry out multistep problems. Its writing ability and world knowledge also lead the field, according to benchmarking results shared by the company. 

In a technical report released alongside the model, DeepSeek shared results from an internal survey of 85 experienced developers: More than 90% included V4-Pro among their top model choices for coding tasks.

DeepSeek says it has specifically optimized V4 for popular agent frameworks such as Claude Code, OpenClaw, and CodeBuddy.

2. It delivers on a new approach to memory efficiency.

One of the key innovations of V4 is its long context window—the amount of text the model can process at once. Both versions can handle 1 million tokens, which is large enough to fit all three volumes of The Lord of the Rings and The Hobbit combined. The company says this context window size is now the default across all DeepSeek services and it matches what is offered by cutting-edge versions of models like Gemini and Claude. 

But it’s important to know not just that DeepSeek has made this leap, but how it did so. V4 makes significant architectural changes to the company’s former models—especially in the attention mechanism, which is the feature of AI models that helps them understand each part of a prompt in relation to the rest. As the prompt text gets longer, these comparisons become much more costly, making attention one of the main bottlenecks for long-context models.



Source link

  • Related Posts

    Designer Baby Companies Are in Turmoil

    Two companies that launched last year with plans to create gene-edited babies have already shut down, citing money issues and internal conflict. One of them, Manhattan Genomics of New York,…

    Steve Ballmer blasts founder he backed who pleaded guilty to fraud: ‘I was duped and feel silly’

    Silicon Valley tends to tolerate a certain amount of founder exaggeration when pitching investors, often dismissing it as part of selling a vision. But some choices cross the line and…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    The 6 New Business Class Suites Debuting On Boeing 787s In 2026 & 2027

    The 6 New Business Class Suites Debuting On Boeing 787s In 2026 & 2027

    WATCH: Newly released video shows a suspect driving off with a police officer in the car

    WATCH:  Newly released video shows a suspect driving off with a police officer in the car

    Palantir under fire for X ‘manifesto’ from co-founder Alex Karp

    Palantir under fire for X ‘manifesto’ from co-founder Alex Karp

    Australian women and children leave Syrian detention camp for Damascus – and potentially home | Australia news

    Australian women and children leave Syrian detention camp for Damascus – and potentially home | Australia news

    Trump says Iran is ‘making an offer’

    Trump says Iran is ‘making an offer’

    Designer Baby Companies Are in Turmoil

    Designer Baby Companies Are in Turmoil