Anthropic Settles AI Training Case for $1.5 Billion +



The Anthropic settlement shows just how costly copyright missteps can be in AI development. Anthropic has agreed to a $1.5B settlement after a court found that keeping a permanent library of pirated books was not fair use—even though training its AI model on those same works was.
 
On this episode of The Briefing, Weintraub attorneys Scott Hervey and Matt Sugarman discuss the ruling, the settlement, and what it means for future copyright claims against AI companies.

Show Notes: 

Scott: In a previous episode, we broke down a key ruling in the Anthropic AI Training case. That one asked, what happens when an AI company trains its model on millions of books? Some purchased, some pirated. In that closely watched decision, a federal judge said, the training itself was fair use, comparing it to how humans learn by reading. But keeping pirated copies of those books in a permanent digital library, that crossed the line. I’m Scott Hervey, a partner with the law firm of Weintraub Tobin. I’m joined today by my partner, Matt Sugarman. Today, we are going to talk about the one big question that ruling left open. What’s the price tag for that mistake? That answer just came in, and it’s a big one on this installment of the briefing. Matt, welcome back to the briefing. It’s good to have you.

Matt: Thank you, Scott. It’s good to be here.

Scott: Great. Well, this one’s a good one. I know you and I both talk a lot about these AI training cases, and we covered the meta case previously. But why don’t you give us a quick backstory on this case.

Matt: Okay, Scott, let’s rewind for a second. In 2021, Anthropic trained its Claude model on a massive data set of books, articles, websites, you name it. But instead of licensing the books, they grabbed millions of copyrighted works straight off the pirate sites.

Scott: Right. They did license them by some, but for sure, they pirated millions of books. Like you said, we’re not talking about a few. We’re talking about more than seven million pirated books. And those works include some very notable authors. At the same time, they bought millions of print books, they scanned them, and they built this huge searchable digital library.

Matt: That’s correct, Scott. And that’s what set off the lawsuit. The author said that Anthropic infringed their copyrights in three separate ways: downloading the pirated books, using them to train Claude, and keeping digital copies in a permanent internal library.

Scott: So when Anthropic moved for summary judgment on fair use, Judge William Alsup, of the Northern District of California, didn’t really give them a clean win. Instead, he carved up their conduct into three categories.

Matt: That’s right. Training AI on books, scanning and digitizing legally-purchased print books, and then the big problem, keeping pirated books in a permanent digital library.

Scott: And the judge treated each one differently.

Matt: Correct. First, training Claude with the books, the court said that was fair use. And not just fair use, he called it spectacularly transformative.

Scott: That’s right. He did call it spectacularly transformative. Even if Claude absorbed a lot of the underlying materials, the judge pointed out that the model wasn’t spitting out verbatim chunks of the author’s books.

Matt: Well, the second point was digitizing purchased printbooks. The authors argued that converting them into searchable PDFs was also in free trade.

Scott: But the court pushed back. Because Anthropic lawfully bought the books and then destroyed the physical copies and only kept one digital version for internal use, that passed muster as fair use.

Matt: Scott, the judge even went out of his way to say that this use was more transformative than in Texaco. Google Books and Sony Betamax, and clearly different from the Napster case.

Scott: Right, clearly different from the Napster case. That brings us to the third use, which was pirating books and retaining those pirated books.

Matt: Correct, Scott. That’s where Anthropic went off the rails. They downloaded millions of books from pirate sites, and they stored them, even when a lot of them weren’t used for trading at all.

Scott: The kicker, internal emails show that the founder and other executives really knew of the risk, and they were quite cavalier about this, but they decided that essentially, piracy was easier than licensing.

Matt: Yep. And the court said no. This was not transformative. It undercut the market, and it was full verbatim copy. The bottom line, fair use didn’t apply.

Scott: So this brings us to the fallout. So just last week, Anthropic agreed to settle the author’s claims for $1. 5 billion.

Matt: That sounds like a lot, but when you break it down, Scott, that’s only about $3,000 per copyrighted work.

Scott: True, but it doesn’t really stop at $1.5 billion. That $1.5 billion is only floor. Once the lawyers finalize the class list, Anthropic may owe another $3,000 for every infringing work over the first $500,000. Plus, they have to destroy all of the pirated data sets.

Matt: That’s right. But the settlement still needs court approval. There are a lot of logistical pieces, class certification, claims processing, notification, but the number is already quite staggering.

Scott: I agree. That number is quite big. Here’s a bigger picture. This case doesn’t really line up with Codre versus Metta, which we covered previously. In Codre, the judge rejected the whole AI learn like a student analogy, saying the risk of competitive harm was way too high.

Matt: Right. And that shows how different courts are approaching this. Judge Alsup zeroed in on the market harm and intent. In Cadegny, however, the plaintiffs They didn’t just have enough facts. But future plaintiffs could succeed, especially if they can prove market harm, even when the works aren’t pirated, but if they’re legally purchased.

Scott: And we’re already seeing this play out. Apple was sued on September fifth for copyright infringement over AI training data sets. The complaint alleges unlicensed and pirated books, and it leans hard into the market harm argument that Apple’s output could replace place the very works authors are paid to write.

Matt: The takeaway, Scott, building data sets from pirated material is at least a billion and a half dollar mistake, if not more. This case gives authors and their lawyers a clear roadmap for future claims.

Scott: It certainly does, Matt. So, thanks again to my co-host, Matt Sugarman. Matt, always great to have your insights. And thank you to our listeners for joining us on the briefing. If you found this episode helpful or interesting, please take a moment to subscribe, like, and share it with your network. We’d also love to hear from you. Leave us a comment or a review, and let us know what topics you’d like us to cover in future episodes. I’m Scott Hervey. See you next time on the briefing.