---
title: Anthropic’s $1.5bn Deal Shows The Real Liability Isn’t Model Training—It’s The ‘Central Library’
description: "Anthropic’s $1.5bn US settlement clarifies AI copyright: training as fair use but storing full texts breaches rights. Data governance and engineering shifts."
author: Darie Nani (Editor-in-Chief)
date: 2025-09-26T09:16:26.000Z
updated: 2026-04-01T12:06:32.948Z
canonical: https://www.sovereignmagazine.com/article/anthropic-s-1-5bn-deal-shows-the-real-liability-isn-t-model-training-it-s-the-central-library
image: https://cdn.nanimediahouse.com/d8e77f4c-971d-4bde-b78b-bafd2fb20bb8.jpg
categories: Legal
content_type: Analysis
region: United States
publication: Sovereign Magazine
about:
  - type: Organization
    name: Anthropic
---

A federal judge has preliminarily approved a $1.5 billion settlement over how Anthropic stored more than seven million pirated books, even after ruling that the company’s model training was fair use. The counterintuitive split reveals where legal liability actually lies in AI development: fair use protected Claude’s training, but copyright infringement applied to Anthropic’s persistent storage of complete pirated texts.

The liability isn’t about model architectures or training performance—it’s about how companies store and catalogue the raw text they ingest. This landmark case demonstrates how [AI copyright settlements are reshaping industry data practices](/category/science-amp-techartificial-intelligence/anthropics-historic-copyright-settlement-could-reshape-ai-industrys/) in ways few anticipated.

## Legal Timeline Reveals The Split Decision

In June 2025, [Judge William Alsup ruled](https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/) that Anthropic’s training on purchased books constituted ‘exceedingly transformative’ fair use. However, the same judge found the company violated rights by saving more than seven million pirated books to a ‘central library’ that wouldn’t necessarily be used for training purposes.

On 25 September 2025, [Reuters reported](https://www.reuters.com/sustainability/boards-policy-regulation/us-judge-approves-15-billion-anthropic-copyright-settlement-with-authors-2025-09-25/) that Alsup gave preliminary approval to the $1.5 billion settlement. The judge had initially declined approval, asking follow-up questions before giving his preliminary sign-off.

Plaintiffs Andrea Bartz, Charles Graeber and Kirk Wallace Johnson said the decision ‘brings us one step closer to real accountability for Anthropic and puts all AI companies on notice they can’t shortcut the law or override creators’ rights.’ Anthropic deputy general counsel Aparna Sridhar said the decision will allow the company to ‘focus on developing safe AI systems that help people and organisations extend their capabilities, advance scientific discovery and solve complex problems.’

## What The ‘Central Library’ Actually Means

The ‘central library’ referenced in Alsup’s ruling represents a persistent repository Anthropic maintained containing complete copied books. Think of it in concrete engineering terms: a data lake storing raw collections, a versioned dataset, or a vector database used for [retrieval-augmented generation](https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/).

This differs fundamentally from ephemeral training feeds where data streams through processing pipelines and gets discarded. When training a model, you might stream tokens through the system and discard the sources. Storing entire books in a central library means retaining full texts that can be fetched or extracted later.

The court distinguished between using books for training (fair use) and keeping complete copies in searchable repositories (potential infringement). Retaining full texts increases legal exposure because those copies can be characterised as reproductions not transformed by model training alone.

## Engineering Becomes A Legal Decision

Companies now face three options: tighten provenance and licensing, rearchitect pipelines to avoid persistent full-text storage, or pay to licence bulk archives. Each carries distinct engineering trade-offs and compliance costs.

The settlement creates pressure for [architectural changes](https://aws.amazon.com/what-is/retrieval-augmented-generation/). Vector databases storing embeddings might be safer than full-text repositories. Streaming training pipelines that discard sources could reduce liability compared to comprehensive data lakes holding complete works.

An engineer’s new checklist includes: provenance logging for every text ingested, automated takedown pipelines when rights holders object, rolling ephemeral feeds that don’t retain source materials, aggressive pruning of raw data, hashed fingerprints rather than full texts, and legal tagging of dataset entries by acquisition method. These considerations align with broader trends in [enterprise technology oversight](https://www.sovereignmagazine.com/article/ftc-s-ai-crackdown-signals-new-era-of-enterprise-technology-oversight) emerging across the industry.

The technical decision to keep a ‘central library’ produced a multibillion-dollar settlement. Storage architecture moves from an engineering choice to a legal calculation.

## Settlement Mechanics And Industry Impact

The $1.5 billion preliminary settlement resolves a case that could have exposed Anthropic to significantly larger statutory damages had it proceeded to trial. [Reports suggest](https://www.cnet.com/tech/services-and-software/judge-in-anthropic-ai-piracy-suit-approves-1-5b-settlement/) the settlement includes roughly $3,000 per covered book, though these figures require verification against actual settlement documentation.

Authors and publishers stand to receive payments through a class action process. The court must issue class notice and establish claim windows for affected rights holders to file for compensation.

More than [50 lawsuits are pending](https://insurancenewsnet.com/innarticle/copyright-lawsuits-highlight-potential-risks-for-agents-using-ai-tools-in-marketing) in US courts against AI companies over copyrighted training data. Similar high-profile cases include [OpenAI and Microsoft being sued by the New York Times](https://www.sovereignmagazine.com/article/openai-microsoft-sued-by-new-york-times-for-copyright-infringement) for copyright infringement. This settlement provides a template other plaintiffs will reference and a benchmark for damages calculations in ongoing cases against major tech companies.

Smaller AI labs face proportionally higher compliance burdens than large firms with deep legal budgets. The settlement strengthens the position of licensing marketplaces and data brokers who can provide legally cleared training data. Publishers gain negotiating power in bulk licensing discussions.

## What Engineers Need To Audit Now

Engineers and product managers should audit their training data pipelines against these emerging legal standards. Key questions include: How long do you retain source texts? Can you demonstrate legitimate training purposes for stored materials? Do you have provenance records showing legal acquisition?

Consider whether your architecture needs modification: streaming training pipelines, hashed fingerprints instead of full texts, automated legal compliance checks, and clear policies for third-party data handling. Document all decisions for legal counsel, particularly choices about what to store versus what to process and discard.

[Vector databases and RAG systems](https://cloud.google.com/use-cases/retrieval-augmented-generation) that store embeddings rather than complete texts may offer safer approaches than comprehensive data lakes holding full works. The US legal precedent suggests that storage decisions carry different legal risks than training decisions. Companies must also consider privacy implications, as seen in recent cases involving [AI transcription services facing privacy backlash](https://www.sovereignmagazine.com/article/ai-transcription-giant-otter-ai-faces-privacy-backlash-what-uk-businesses-need-to-know).

## Law And Engineering Converge

This settlement reframes the AI copyright debate from model behaviour to data custody and governance. The technical choice to maintain a ‘central library’ produced liability that training itself did not. Legal and engineering teams must now collaborate on architecture decisions that were previously purely technical.

Anthropic’s experience suggests the answer lies not in restricting training itself, but in how companies handle the [data infrastructure that enables it](https://www.sovereignmagazine.com/article/when-training-becomes-the-target-how-mandatory-workplace-programmes-face-growing-legal-scruti). The question remains whether law and engineering can converge to let AI models learn from human culture without undermining creators’ rights.