The emergence of the web data infrastructure layer for AI


The next frontier in AI may depend on a new web data infrastructure layer that can enable models to discover and map this ever-expanding digital realm. This layer must be able to navigate hundreds of millions of existing web domains and billions of new URLs created each week, delivering real-time information and overcoming technical barriers.

“The data suggests there’s far more data out there,” says Or Lenchner, CEO of Bright Data, a web data collection platform. “Think of the universe: It’s out there, but you don’t know what you don’t know.”

Enabling access to fresh, relevant, and trustworthy data

While early AI breakthroughs were driven by scaling training data and model size, organizations are now encountering a fundamental bottleneck: They need to keep pace with the dynamic, unstructured, and constantly evolving nature of web data in order to ground outputs in current and verifiable information. AI performance increasingly depends not just on model architecture but on a system’s compute, networking, retrieval, and data engineering capabilities—that is, the system’s ability to quickly and reliably retrieve data that is fresh, relevant, and trustworthy.

Traditional model training relies on snapshots of information collected at a particular point in time. Training AI on such static data is no longer sufficient. To track fluctuations such as competitor pricing, consumer sentiment, and market trends, companies need a constant feed of new information, pulling data in real time along with relevant context. Their infrastructure must therefore be able to handle millions of simultaneous interactions across websites that vary by geography, language, format, and access rules.

“If it can’t retrieve real-time information, it lacks context,” Lenchner says. “In a business setting, that’s not acceptable anymore. Stale answers lead to bad decisions and disappointed consumers.”

Speed is not merely a matter of convenience; it’s a matter of necessity. Today’s organizations operate in environments where prices, inventory, markets, security threats, and customer behavior change continuously. Delayed data retrieval can reduce the usefulness of an otherwise sophisticated model.

Using live, high-quality web data can also reduce AI hallucinations because the model has a more relevant knowledge base. This builds user trust. In fact, one survey found that 56% of AI practitioners said businesses need access to real-time web data to improve trust in AI outputs. To ensure the model runs efficiently and effectively, the information must also be pared down to the appropriate essentials. 

Despite the introduction of retrieval-augmented generation (RAG), where models pull in external data at the moment of a query, many AI systems still struggle to deliver outputs that are current, contextually relevant, and trustworthy in operational settings. According to Gartner, 60% of AI projects that are not supported by AI-ready data—accurate, structured, organized, and contextualized—will be abandoned by the end of the year. 



Source link

  • Related Posts

    Google starts lowering Play Store fees, making good on Epic Games settlement

    All transactions that run through Google’s Play Store platform add a 5 percent billing fee—even the base rate for publishers earning less than $1 million. Google notes that the billing…

    Best Red-Light Therapy Mask: WIRED’s Top 5, Ranked (2026)

    Updated June 2026: I refined our picks and added the Laduora Lumeo SkinLift 4D Current Red Light Therapy Kit to Honorable Mentions. Product information, specs, prices, and links were updated.…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Carney’s office offers few details on call with Trump before NATO summit

    Carney’s office offers few details on call with Trump before NATO summit

    Brookfield Infrastructure Corporation Announces Results of Annual Meeting of Shareholders

    Woman in critical condition after early morning crash in Calgary – Calgary

    Woman in critical condition after early morning crash in Calgary – Calgary

    Google starts lowering Play Store fees, making good on Epic Games settlement

    Google starts lowering Play Store fees, making good on Epic Games settlement

    Will Marathon’s new PvE mode reignite interest around the fizzling game? Community reaction so far seems positive

    Will Marathon’s new PvE mode reignite interest around the fizzling game? Community reaction so far seems positive

    Effort to extend Army general’s career falls flat with Hegseth

    Effort to extend Army general’s career falls flat with Hegseth