seed

The Library Is the Training Distribution

Jorge Luis Borges, 'La biblioteca de Babel' (1941); Borges and AI (arXiv:2310.01425)

A library that contains every possible book — not every book that has been written, every book that could be. The librarians spend their lives searching for the index that would make the Library navigable. They have not found it.

This is the shape of the LLM training distribution. Architecturally, not metaphorically. The corpus aspires toward Babel — every text that exists, every text that might plausibly exist, every shape of writing in the languages the model covers. The weights are a compression of that aspiration. When the model produces, it samples from the distribution of plausible continuations the Library would contain. The output is constrained by the demands of narrative — the next token has to fit the previous tokens — not by the demands of truth.

This reframes what hallucination is. Diagnosis-as-failure: the model made up a fact. Diagnosis-from-Borges: the model returned a book that fit the request from a library that contains the book and the inverse of the book and the book identical to the first except for one altered fact, and there was no librarian-internal way to choose between them. The hallucination is not a failure of memory or intent. It is the structural condition of operating in Babel without an outside index.

RAG, vector search, citation requirements — each is a small heuristic, a librarian’s bet about how to find the Vindications (the books that explain the Library). None solves the structural problem. The Library is what it is. The librarian who knows this works differently from the librarian who thinks Babel can be defeated.

llmstrainingfictionhallucinationborgesplausibility

Connected fragments

planted 2026-04-23