Microsoft deletes blog telling users to train AI on pirated Harry Potter books



“I think that the regurgitation and the creation of fan fiction, they both could flag copyright issues, in that fan fiction often has to take from the expressive elements, a copyrighted character, a character that’s famous enough to be protected by a copyright law or plot stories or sequences,” Smith said. “If these things are copied and reproduced, then that output could be potentially infringing.”

But it’s also still a gray area. Looking at the blog, Smith said, “I would be concerned,” but “I wouldn’t say it’s automatically infringement.”

Smith told Ars that, in pulling the blog, Microsoft “was probably smart,” since courts have only generally said that training AI on copyrighted books is fair use. But courts continue to probe questions about pirated AI training materials.

On the deleted Kaggle dataset page, Maindola previously explained that to source the data, he “downloaded the ebooks and then converted them to txt files.”

Microsoft may have infringed copyrights

If Microsoft ever faced questions as to whether the company knowingly used pirated books to train the example models, fair use “could be a difficult argument,” Smith said.

Hacker News commenters suggested the blog could be considered fair use, since the training guide was for “educational purposes,” and Smith said that Microsoft could raise some “good arguments” in its defense.

However, she also suggested that Microsoft could be deemed liable for contributing to infringement on some level after leaving the blog up for a year. Before it was removed, the Kaggle dataset was downloaded more than 10,000 times.

“The ultimate result is to create something infringing by saying, ‘Hey, here you go, go grab that infringing stuff and use that in our system,’” Smith said. “They could potentially have some sort of secondary contributory liability for copyright infringement, downloading it, as well as then using it to encourage others to use it for training purposes.”



Source link

  • Related Posts

    Peak XV raises $1.3B, doubles down on AI as global VC rivalry in India heats up

    Peak XV announced on Friday that it has raised $1.3 billion across new India and Asia-focused funds. The firm, which now manages more than $10 billion in assets, is sharpening…

    Nascent tech, real fear: how AI anxiety is upending career ambitions | Technology

    Matthew Ramirez started at Western Governors University as a computer science major in 2025, drawn by the promise of a high-paying, flexible career as a programmer. But as headlines mounted…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    11 best US spring break destinations in 2026

    11 best US spring break destinations in 2026

    Finnerty Mackay, Maxina Brewer Win Central Saint Martins MA Fashion Show

    Finnerty Mackay, Maxina Brewer Win Central Saint Martins MA Fashion Show

    Lara rides a motorbike in latest set photos from Amazon’s live-action Tomb Raider series, joined by a rather sharply dressed Zip

    Lara rides a motorbike in latest set photos from Amazon’s live-action Tomb Raider series, joined by a rather sharply dressed Zip

    Supreme Court strikes down Trump tariffs: Breaking

    Supreme Court strikes down Trump tariffs: Breaking

    Trump’s tariffs on Canada, world are unlawful, U.S. Supreme Court rules – National

    Trump’s tariffs on Canada, world are unlawful, U.S. Supreme Court rules – National

    GPT as a Measurement Tool

    GPT as a Measurement Tool