The Free and Open Web Is Under Attack at the IETF

The ability to access publicly available information using automated tools is a central value and benefit of a free and open internet. Automated access—often called crawling or scraping—powers important, useful tools for locating, preserving, and analyzing online information. For example, crawling and scraping helps journalists, researchers, and watchdog organizations report the news, find security flaws, and investigate discrimination. Crawling the web allows non-profits like the Internet Archive to preserve historical copies of websites. Tools for automated comparison shopping allow consumers to find the best deals on items they want to buy. And so on.

Yet the open internet access is increasingly under threat from publishers and Big Tech companies alike. Fearing lost advertising and licensing revenues, website operators increasingly claim that they need to lock down their sites from bots that crawl public web content to train or operate AI models. Some companies are even trying to embed their business models into internet standards by changing Internet Engineering Task Force (IETF) technical standards that shape much of the internet.

Many of their economic anxieties are understandable. AI bots can strain websites’ infrastructure, in some cases, degrading site performance or taking them offline altogether. Upgrading systems costs money that some sites may not have. And AI is likely to disrupt the business models many publishers adopted in response to the rise of the internet, if users rely on AI overviews instead of visiting source websites.

However reasonable these fears may be, the answer is not to change the IETF standards from neutral protocols that encourage openness to restrictive requirements designed to monetize internet access.

The worst of these proposed standards would give websites far greater ability to automatically block legitimate, lawful scraping and crawling. For example, the AI Preferences working group is working on proposals to give publishers a way to express “preference signals” against crawling web data for AI-related purposes, including to train models, generate outputs, and help users search the web. These preference signals would be expressed through robots.txt and could potentially become legally binding in some jurisdictions.

Another working group, called Web Bot Auth, is pursuing efforts to protect sites from overly-aggressive bots that strain website resources—a positive goal that could meaningfully improve the internet in the AI era. But Web Bot Auth is simultaneously pursuing a much more dangerous path as well: standards changes that would enable sites to cryptographically identify bots so that they can more easily block anyone they wish—not just “bad” actors, but competitors, dissidents, or anyone who hasn’t paid for the right to access sites using automated tools. If sites restrict crawling to a preapproved list of cryptographically authenticated bots, they could require licensing payments from those wishing to crawl their sites. This would close off the open web to researchers, archivists, and startups without the ability to pay for automated access.

Websites may have legitimate reasons to worry about AI’s impacts on their traffic and advertising revenue, but those reasons must be weighed against the benefits of the open web. These proposals would effectively give website operators veto power over a wide range of important uses—from the investigations and archival works described above to accessibility tools for people with disabilities, to research efforts aimed at holding governments accountable.

That is why we are fighting back against these threats to open access. EFF and our allies in the open internet community have successfully resisted some of the most dangerous IETF proposals thus far—and won’t stop working to protect the open web from efforts to manipulate internet standards to undermine the right to freely access the internet in any legal way, including with automated tools.

Source link