Recipe · Voice UI

Add a Real-Time Avatar to a Static Site

A synchronous, speaking avatar embedded in a webpage — voice clone, lip sync, and all — wired to an ElevenLabs agent behind a Cloudflare Worker that mints temporary tokens.

Time: ~35 min Difficulty: intermediate Companion skill: add-avatar-to-site

Why an avatar

A chat widget is fine for text. But when the person on the other end is a voice clone of the site owner, reading answers aloud with lip sync, the interaction becomes something else. Visitors stop scrolling and start talking. The avatar becomes the interface.

This recipe embeds a HeyGen streaming avatar into a static site (Jekyll, Next.js, or plain HTML). The avatar's voice comes from an ElevenLabs Conversational AI agent. A Cloudflare Worker sits in between, handing out short-lived tokens so the long-lived API key never touches the browser. The frontend binds to a <video> element and handles start, stop, and mute.

What you'll build

Prerequisites


Step 1 — Gather credentials

Before writing code, you need four pieces of information. Write them down somewhere safe (not in the repo). They'll go into the Worker's secrets, not into your site's source.

CredentialWhere to get it
HEYGEN_AVATAR_UUIDHeyGen dashboard → Avatars → your streaming avatar's ID
HEYGEN_SECRET_IDHeyGen dashboard → API Keys → create a new secret, copy the ID
ELEVENLABS_AGENT_IDElevenLabs dashboard → Conversational AI → your agent's ID
LIVEAVATAR_API_KEYHeyGen dashboard → API Keys → the key itself (long-lived)
Gotcha — secret vs. UUID. The HeyGen avatar UUID is not the same thing as the API secret. The UUID identifies the avatar; the secret authenticates you. The secret is what the Worker will hide.

Step 2 — Create the Cloudflare Worker

The Worker's job is simple: receive a request from the browser, authenticate itself to HeyGen using the long-lived secret, and return a short-lived session token. The browser never sees the secret.

mkdir -p ~/projects/avatar-token-worker
cd ~/projects/avatar-token-worker
npm create cloudflare@latest . -- --template hello-world
npm install wrangler --save-dev

Replace src/index.js with the token-minting handler:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    if (url.pathname !== '/token') {
      return new Response('Not Found', { status: 404 });
    }

    const response = await fetch('https://api.heygen.com/v1/streaming.create_token', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-API-KEY': env.HEYGEN_SECRET_ID   // case matters: X-API-KEY, uppercase
      },
      body: JSON.stringify({
        avatar_id: env.HEYGEN_AVATAR_UUID,
        quality: 'high'
      })
    });

    if (!response.ok) {
      return new Response('Token creation failed', { status: 502 });
    }

    const data = await response.json();
    return new Response(JSON.stringify({ token: data.data.token }), {
      headers: {
        'Content-Type': 'application/json',
        'Access-Control-Allow-Origin': '*'   // lock this to your domain in production
      }
    });
  }
};

Configure secrets via Wrangler (never commit these):

npx wrangler secret put HEYGEN_SECRET_ID
npx wrangler secret put HEYGEN_AVATAR_UUID

Deploy:

npx wrangler deploy
Gotcha — header case. HeyGen's edge is case-sensitive on the X-API-KEY header. x-api-key (lowercase) will 401. The example above uses uppercase exactly.

Step 3 — Add the avatar page to your site

Create a new page that hosts the video element and the SDK wiring. This example is plain HTML; adapt to your framework.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Ask Me Anything — Avatar</title>
  <style>
    body { font-family: Inter, sans-serif; background: #fafafa; padding: 2rem; }
    .avatar-wrap { max-width: 480px; margin: 0 auto; }
    video {
      width: 100%;
      aspect-ratio: 16/9;
      background: #e5e7eb;
      border-radius: 12px;
      display: block;
    }
    .controls { display: flex; gap: 0.75rem; margin-top: 1rem; }
    button {
      flex: 1;
      padding: 0.75rem;
      border: none;
      border-radius: 8px;
      font-weight: 600;
      cursor: pointer;
    }
    #start { background: #0f766e; color: white; }
    #stop  { background: #fee2e2; color: #991b1b; }
    #mute  { background: #f3f4f6; color: #374151; }
  </style>
</head>
<body>

<div class="avatar-wrap">
  <video id="avatar" autoplay playsinline muted></video>
  <div class="controls">
    <button id="start">Start</button>
    <button id="stop">Stop</button>
    <button id="mute">Mute</button>
  </div>
</div>

<script type="module">
  import { LiveAvatarSession } from 'https://esm.sh/@heygen/liveavatar-web-sdk@0.0.17';

  const WORKER_URL = 'https://your-worker.your-subdomain.workers.dev/token';
  const ELEVENLABS_AGENT_ID = 'your-elevenlabs-agent-id';

  const video = document.getElementById('avatar');
  const startBtn = document.getElementById('start');
  const stopBtn  = document.getElementById('stop');
  const muteBtn  = document.getElementById('mute');

  let session = null;

  startBtn.addEventListener('click', async () => {
    const res = await fetch(WORKER_URL);
    const { token } = await res.json();

    session = new LiveAvatarSession({
      token,
      videoElement: video,
      elevenlabs: {
        agentId: ELEVENLABS_AGENT_ID
      }
    });

    await session.start();
  });

  stopBtn.addEventListener('click', () => {
    session?.stop();
    session = null;
  });

  muteBtn.addEventListener('click', () => {
    video.muted = !video.muted;
    muteBtn.textContent = video.muted ? 'Unmute' : 'Mute';
  });
</script>

</body>
</html>
Gotcha — aspect ratio on mobile. If the video container has aspect-ratio but the video element itself does not, the placeholder background will show before the stream starts and may not fill on mobile. Apply aspect-ratio directly to the video element, not just the wrapper.

Step 4 — Verify end-to-end

  1. Deploy the Worker and open its /token endpoint in a browser. You should get JSON with a token field.
  2. Open the avatar page on your site. Click Start. The video element should show the HeyGen avatar within a few seconds.
  3. Say something. The ElevenLabs agent should respond, and the avatar's lips should move in sync with the audio.
  4. Click Stop. The stream should end and the video should return to the placeholder state.
Audio format. The stream expects PCM 24K audio from the agent. If your ElevenLabs voice settings are different, the avatar may look correct but sound garbled. Check the agent's output format in the ElevenLabs dashboard.

What this gets you

When not to use this

If you only need text-based interaction, a standard chat widget is simpler, cheaper, and more accessible. The avatar adds latency, bandwidth, and cost. Use it when the speaking, lip-synced presence is load-bearing to the experience — not when it's decoration.

What's next

Seth Shoultes builds at garagedoorscience.com and writes about it at sethshoultes.com/blog.