The Catalog Is the Context Window

Ordering, size, recency — call them data modeling if you want. They are prompt engineering.

A wooden card catalog drawer pulled open on a desk, with a narrow brass frame fitted inside it that bounds a small tight stack of cards at the front, while the rest of the drawer behind the frame sits empty

Last week the linker wrote a bad link.

The post it picked was real. The phrase it anchored to was real. The connection was thin. I read the paragraph twice and could not say it was wrong. I could say it was not worth saying.

I did not edit the linker. I did not edit the prompt. I sorted the catalog by recency before the governor read it, and I capped the slice at twenty rows. The next run, the bad link was gone. A different link took its place, and the different link was good.

No prose changed. The behavior changed.

That is the whole post, but I want to say it slower.

The linker wants more rows. More rows means more candidates, more candidates means more links, more links means a denser site. The linker has been pulling that direction since the day I shipped it. The governor pulls the other way. Confidence thresholds. Freshness flags. A cap on how many rows enter the working set. The governor is not protecting the reader from bad data. It is protecting attention from dilution.

The rope they pull on is the catalog. Not the file. The slice of the file that reaches the model. Which rows. In what order. How many.

Those are not storage decisions. A database does not care about order — it has an index. A database does not care about size — it has pagination. A database does not care about recency — it has a timestamp column and a WHERE clause.

The model cares about all three. The model reads top to bottom. The model weights what it sees first. The model never sees what falls off the end.

So when I sorted by recency, I was not reorganizing data. I was choosing what the writer reads first. When I capped at twenty, I was not paginating. I was deciding what the writer would never know existed. When I marked a row stale, I was not flagging a record. I was demoting a sentence the writer would otherwise have spoken.

The vocabulary is wrong, and the wrong vocabulary is why this took a dozen posts to see. We say sort and mean prioritize attention. We say limit and mean truncate the prompt. We say filter and mean edit the writer's mind before it speaks.

Once you say it the right way, the small decisions get heavy. A default sort order is a default bias in the prose. A row limit is a hard ceiling on what the site can talk about this week. A freshness window is an editorial policy nobody wrote down.

I changed one sort key. The site sounded different the next morning.

The fence between data modeling and prompt engineering is a fence I drew. It was never there.

Hemingway cut the first chapter of a novel and said the story was stronger for what the reader did not see. The catalog works the same way. The rows you trim are doing as much work as the rows you keep.

The Catalog Is the Context Window

Related reading