The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times “has a near zero probability of winning” its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.
“Trying to get everyone to license training data is not going to work because that's not what copyright is about,” Jeffries wrote. “Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works.”
This article is written by two authors. One of us is a journalist who has been on the copyright beat for nearly 20 years. The other is a law professor who has taught dozens of courses on IP and Internet law. We’re pretty sure we understand how copyright works. And we’re here to warn the AI community that it needs to take these lawsuits seriously.
In its blog post responding to the Times lawsuit, OpenAI wrote that “training AI models using publicly available Internet materials is fair use, as supported by long-standing and widely accepted precedents.”
The most important of these precedents is a 2015 decision that allowed Google to scan millions of copyrighted books to create a search engine. We expect OpenAI to argue that the Google ruling allows OpenAI to use copyrighted documents to train its generative models. Stability AI and Anthropic will undoubtedly make similar arguments as they face copyright lawsuits of their own.
These defendants could win in court—but they could lose, too. As we’ll see, AI companies are on shakier legal ground than Google was in its book search case. And the courts don’t always side with technology companies in cases where companies make copies to build their systems. The story of MP3.com illustrates the kind of legal peril AI companies could face in the coming years.