The /build/ page had a row of starter buttons. Click "Add an avatar to my site," allow the microphone, and the avatar was supposed to appear and begin the walkthrough without making you repeat yourself. The button encoded the question. The avatar would open on it.
Instead, the avatar appeared, said the greeting, and went silent. Not an error. Not a crash. Just the avatar, waiting, as if it were your turn.
The code that looked correct
After the session connected, the code called session.message(prompt), where prompt was the text from whichever starter button the user had clicked. session.message appeared in the SDK's autocomplete. It accepted a string. It existed. A method named message on a conversational session object — the inference that it sent the string to the agent as a user turn was not unreasonable.
It made the avatar speak the text aloud.
session.message on @heygen/liveavatar-web-sdk@0.0.17 sends a command internally called AVATAR_SPEAK_RESPONSE. That command puts words in the avatar's mouth — not in the agent's input stream. The starter prompt was being read out in Seth's voice, into the void, as if the avatar had originated the thought. The agent received nothing.
What the SDK source said
The diagnostic took ten minutes once the source was open. LiveAvatarSession.js, the message() method: one function, one command emitted. CommandEventsEnum.AVATAR_SPEAK_RESPONSE. Not ambiguous.
Then the enum itself. Three values:
AVATAR_SPEAK_TEXT
AVATAR_SPEAK_RESPONSE
AVATAR_SPEAK_AUDIO
All three end in a verb describing what the avatar does with output. None end in INPUT. None end in USER. There is no USER_MESSAGE. There is no AGENT_INPUT. The three commands the client can issue are three ways to put a string or audio buffer into the avatar's mouth.
In LITE mode, the user speaks through the microphone. The microphone feeds ElevenLabs's speech-to-text pipeline, which the agent is wired to. That is the only path. The client cannot inject text the agent receives as a conversational turn. The SDK does not expose that surface.
The shape of the possible
The three SPEAK commands and the absence of any USER or INPUT command are not a gap in the implementation. They are the implementation. The SDK's job is to control an avatar. The mouth. Not the ear.
Once that is clear, the constraint follows: in LITE mode, the agent's ear is the microphone. The pipeline from mic to STT to agent is managed by ElevenLabs, on their servers. If you want to prime the agent with something the client knows before the conversation begins, you have one window: session start, before the ear opens.
The fix, three layers down
The fix did not live on the page. The page was wired correctly to the API it had. The API it had was the wrong one for the job.
The seam where text can cross from client space to agent space is the session token. The Cloudflare Worker that mints the token accepts a request from the browser, holds the API key server-side, and calls the LiveAvatar token endpoint with a configuration object. That configuration object can include elevenlabs_agent_config. Inside that, dynamic_variables.
Three changes, in order of where they live in the stack.
On the page: when a starter button was clicked, instead of staging the prompt for a post-connect session.message call, the code appended ?starter=<prompt> to the worker URL at token-fetch time. Before the session opened.
In the Worker: read the starter query parameter, validate it (length cap at 800 characters), and forward it as dynamic_variables: { starter_prompt: starterPrompt } in the elevenlabs_agent_config block.
In the agent's system prompt: a paragraph conditioned on . When the variable is present and non-empty, the agent opens the conversation by acknowledging the selection and beginning the walkthrough. When it is absent, the agent greets normally and waits.
The starter prompt reaches the agent before the conversation begins, as configuration. The agent acts on it the way it acts on any instruction in its system prompt — because that is exactly where it lives.
The hidden contract
The avatar reads first made the argument that the brain is the agent, the face is the face, and the seam between them is plumbing. True. But plumbing has a direction. The pipes that carry data from the client to the agent — in LITE mode, through a HeyGen session — are laid at the SDK level, and the SDK laid them one way. Outward. From agent to avatar.
The constraint was already there the day the SDK shipped. Every developer who installed @heygen/liveavatar-web-sdk@0.0.17 and typed session. got the same autocomplete. message appeared. The inference was available to anyone who did not read the source.
The enum names are where the contract is legible. AVATAR_SPEAK_TEXT. AVATAR_SPEAK_RESPONSE. AVATAR_SPEAK_AUDIO. Three verbs describing the avatar's output. The input side is not a locked door. It is a wall. The wall does not announce itself.
What sits in the autocomplete
The fix works. The starter prompts reach the agent. The walkthrough begins without the user repeating themselves. But the fix does not change the SDK. The three SPEAK commands are still the three SPEAK commands. The next build that wants to inject a user turn after session start will hit the same wall.
When debugging a LITE-mode avatar integration that fails to pass a message to the agent: check the SDK's command enum before you check the wiring. If the method is named message or say or speak, it almost certainly addresses the avatar's output. The fix lives at session-start, in dynamic variables, before the ear opens — because in LITE mode, the ear is the microphone, and the client does not own it.
The full record, with the source excerpt and the three-layer fix, is at brain/learnings/liteavatar-sdk-no-client-user-message.md.