A long wooden library card catalog with one drawer pulled partway open, the index cards standing upright in tight order inside, and a single card lifted out and held upright in the foreground as if being read

There was a catalog before there was a linker. That order matters. That order is the whole post.

The catalog is a YAML file. It lists every post on this site by slug, by title, by one-line summary. It was written, the first time, as documentation — the kind of thing you build when you suspect you will lose track of yourself. Forty-some posts in, I had lost track of myself. I could not remember which essay carried which argument. I could not remember if I had already made the point about overrides, or only thought about making it. The catalog began as a memory aid for the human at the desk.

Then the linker came, and the catalog stopped being for me.

The aggressive reader

The linker is a pass that runs over a post and inserts inline anchors to other posts. You can write such a thing in two registers. You can write it shy — link only when the slug appears verbatim, link only when the title is named. Or you can write it aggressive — link when the idea appears, when the argument leans on a thing said elsewhere, when a phrase in this post is a door into another.

The shy linker is safe and useless. It finds what you would have found yourself. The aggressive linker is useful and dangerous. It finds connections you did not see. It also invents them. An aggressive linker without a constraint will cheerfully point at /blog/the-pipeline-that-was-never-written.html because the phrase sounded right in the sentence it had just composed. The link resolves to a 404. The 404 is signed in your name.

I had this happen. I will not say how many times. The model would draft a paragraph and underline three phrases and hand back HTML with three anchors, and one of the three anchors pointed at a post I had never written. The hallucination was confident. The hallucination was in the register of help.

What the catalog does to the prompt

So the catalog moved from the docs/ folder into the prompt. Not as reference. As vocabulary. The linker now receives, before it sees the post it is supposed to link, the full list of slugs that exist. The instruction is plain: link only to slugs in this list. The list is the link space. The list is the universe.

The aggression of the linker, which was the dangerous part, became the useful part. Because the vocabulary was finite, the model could be told to swing hard. Find every connection. Underline every phrase that gestures at another post. The penalty for over-linking is a stylistic one — too many anchors in a paragraph. The penalty for inventing a destination is gone, because invention is no longer a legal move.

The sequence: catalog first. Linker second, written to consume the catalog. Then a retrofit pass across twenty-three older posts, adding inline anchors where the argument leaned on prior work and the prior work was now nameable. Then the catalog grew again, because the retrofit had taught the catalog what it was missing. The catalog and the linker are now a loop. Each pass over the corpus thickens the graph. Each thickening teaches the catalog a row it had not yet earned.

Prompt engineering by constrained vocabulary

The lesson is not keep a list of your posts. The lesson is that the list is part of the prompt, and the prompt is part of the model's permission. A model with the catalog in context is a different model than the same weights without it. It is permitted to mean less. Permitted to mean less is the same, in this case, as permitted to be trusted.

You can build aggressive agents. You should, in places. But aggression in a model is only as safe as the fence around the field it runs in. The fence is not in the weights. The fence is in the prompt. The catalog is the fence. The catalog is the prompt. The documentation, it turns out, was load-bearing the whole time — it just took building a thing that depended on it to know.


References: the earlier framing in The Catalog Is Load-Bearing and the diagnostic post in The Catalog Is the Link Graph.

Borges imagined a library containing every possible book, and observed that such a library is indistinguishable from no library at all. The catalog is the opposite gesture — a small list, finite, signed, against which the machine is required to check its work.