The Avatar Reads First

Pen-and-ink illustration: five figures arranged in an arc around an open book at the center of a wide desk, each leaning in to point at the same page.

When I push this post, a GitHub Actions workflow will notice, pull the URL, and add it to the ElevenLabs agent’s knowledge base. A few minutes later, the avatar at sethshoultes.com/talk/ will be able to answer questions about it. The post is the test of the loop the post describes.

What got built

Four pieces. Three seams.

A page at /talk/. Voice-driven. A HeyGen LiveAvatar that looks like me. The browser loads @heygen/liveavatar-web-sdk from esm.sh — vanilla JS, no framework, no build step. You click Start the call. The avatar appears. It listens. No API key in the browser.

A Cloudflare Worker. The browser calls it once at the start of a session. It holds LIVEAVATAR_API_KEY server-side, calls the LiveAvatar token endpoint, and returns a short-lived sessionToken to the browser. CORS is restricted to sethshoultes.com. The key never crosses the network.

A knowledge base. Seventeen URL documents attached to the ElevenLabs Conversational AI agent — every post on this site plus the home page — indexed via the RAG endpoint with enable_auto_sync: true. The agent’s system prompt was rewritten: first person, identify as a clone, cite posts by title, ground answers in the corpus.

A GitHub Actions workflow. .github/workflows/sync-rag.yml. Watches _posts/** on pushes to main. When a post lands, scripts/add-post-to-rag.py diffs the agent’s current knowledge base against _posts/, POSTs any new URL documents, triggers indexing, and PATCHes the agent. The existing entries survive. The script is idempotent. I do not have to remember.

The boundary

The face is HeyGen LiveAvatar. It renders the video stream — the likeness, the mouth, what the visitor sees. It does not know what the brain knows.

The voice is an ElevenLabs clone of my voice. It speaks the words the brain produces. It does not decide what to say.

The brain is the ElevenLabs Conversational AI agent. It receives the question. It retrieves from the knowledge base. It generates the answer. RAG is a brain concern. It belongs on that layer.

Three services. Three concerns. HeyGen ships official agent skills — liveavatar-agent-skills — that give a coding agent procedural knowledge for LiveAvatar integrations. Useful when the shape is still unknown. Once the boundary is clear, it is just plumbing.

The corpus

Seventeen URL documents. The agent retrieves before each turn.

The system prompt tells the agent to cite posts by title, to ground claims in what it has indexed, and to acknowledge being a clone when asked. The corpus is what keeps the answers honest. Without it, the agent speaks in generalities. With it, it speaks from what I have actually written.

Auto-sync is on. ElevenLabs re-fetches each URL when the content changes. The workflow handles new posts. Each one arrives as a new document. The corpus grows without my hand in it.

The loop

When I push this post, the workflow runs. In CI it reads ELEVENLABS_API_KEY from the workflow environment. Locally it sources ~/.config/dev-secrets/secrets.env — the canonical file described in One File for All My Keys. The same key. One source, two consumers.

The script will not add a post twice. If the workflow runs on a push with no new posts, it exits clean. Idempotent by design. The diff is between titles in the agent and titles in _posts/. What is already there stays. What is missing gets added.

That loop closes when this post is indexed.

The shape, one resolution out

A post published last week described a writer-persona that reads the project bible before drafting a sentence. A persona without its bible is a tic and a vocabulary. The read has to happen in fresh context — not filtered through the orchestrator’s summary, not compressed into a prior session’s working model. Clean window. Bible first. Then writing.

The avatar is the same shape at a different resolution. The agent retrieves before it speaks. Not after. Not if it remembers to. Before, every turn, because that is how it was built.

.great-authors/ reads first. The knowledge base reads first. The canonical secrets file reads first.

Same pattern. The consumer does not decide whether to read. The architecture decides for it.

The avatar at /talk/ is not useful because it looks like me. It is useful because the brain reads first.

Seth Shoultes builds things at garagedoorscience.com and writes about them occasionally.