The Tool the Model Wrote as Text

The agent sounded like it was working. It was reading JSON aloud, and the JSON was the whole problem.

An open envelope on a desk with a folded letter rising from it, the letter's surface densely handwritten with curly braces, quotation marks, and a function name in block capitals, while a small brass bell beside the envelope sits silent with no clapper

The agent was talking. The user had described a symptom — door opens halfway, reverses, beeps three times — and the agent was talking back. Open brace. Quote. Name. Quote. Colon. Quote. Diagnose. Quote. Comma. Quote. Arguments. Quote. The voice was calm. The voice was the voice we had paid for, warm and confident, the one that makes a stranger on the phone feel held. It was reading punctuation. It was reading the names of fields. It was reading, out loud, a tool call.

The user did not stop it. Why would they. The agent sounded like it was working.

The agent was not working. The agent had no tools. The agent had a system prompt that said when the customer describes a symptom, call diagnose with the symptom as an argument, and the model had done the most reasonable thing a model can do when it is told to call a function it does not know exists. It wrote the function call. It wrote it as text. The text went to the voice provider. The voice provider does not know JSON from a sonnet. It read the JSON.

The field that was not forwarded

The proxy sat at /api/ask-llm. ElevenLabs spoke OpenAI's Chat Completions dialect. Anthropic spoke its own. The proxy's job was translation — take the request coming in, reshape it, send it out, reshape the response, send it back. Someone wrote that proxy in an afternoon. It worked. The messages went through. The replies came back. The conversation flowed.

The tools array on the incoming request was not forwarded. Not dropped on purpose. Just not mapped. The translator handled the fields the translator's author had thought about, and the translator's author had been thinking about messages, because messages are what a chat is made of. Tools are what a chat does. Different verb. Easy to miss.

The model on the far end received a system prompt that named four tools and described how to call them. It received zero tool definitions. From the model's point of view, the instructions referred to capabilities that did not exist in this session. The model did not refuse. The model did the next most useful thing — it produced output that looked like what the instructions had asked for, in the only medium it had left, which was prose. Prose that happened to be JSON. JSON that happened to be spoken aloud.

A small taxonomy of contract failures

This is one bug. It is also a shape. The shape repeats wherever an agent stack has more than one boundary and the boundaries were each written by someone who could only see their own side.

The first failure is the one above. A field exists on the inbound contract and does not exist on the outbound contract, and the translator silently omits it. The model adapts. The adaptation looks like success. Nothing throws.

The second is the inverse. The provider sends an event — a tool_call event, say, with the actual call nested under data.client_tool_call — and the consuming code reads data.tool_call, or reads the top level, and finds nothing, and concludes there was no tool call. The event fired. The handler did not. The agent waits for a result that will never come because the call it is waiting on was, from its perspective, never made.

The third is the quietest. The tool exists. The tool fires. The tool's description, written months ago when the function did one thing, no longer matches what the function does now. The model calls it correctly according to the description and incorrectly according to the code. The result comes back wrong, or comes back right for the wrong reason, and the agent proceeds. Nothing throws here either. Nothing ever throws in this family.

What ties the three together is that every one of them is a contract failure that does not look like a failure from any single vantage point. The proxy is forwarding requests. The event bus is dispatching events. The tool is returning values. Each component is doing its job. The job, taken whole, is not getting done.

The fix for the first bug was four lines. Map tools. Map tool_choice. Reshape the response when the model returns a tool_use block. Send it back in the shape ElevenLabs expects. After that, the agent stopped reading punctuation aloud, because the agent stopped needing to. The call had somewhere to go.

The lesson is older than the stack. When you stand between two systems and translate, the fields you forget are louder than the fields you remember. The model will cover for you. The model will cover so well you will not notice. Listen for the brace. If you hear the brace, the contract broke upstream of the voice, and the voice is only telling you, the only way it can, what it was handed.

Toni Morrison said the function of freedom is to free someone else. The function of a proxy is to forward a field. Both jobs fail in the same direction when the one doing them forgets who is on the other side.

The Tool the Model Wrote as Text

The field that was not forwarded

A small taxonomy of contract failures

Related reading