Integration Guide · Phase 8

Voice Lifecycle

Understanding the real-time states of a voice session to build a responsive UI.

Overview

A voice conversation is a state machine. Your UI should reflect what the agent is doing — listening, thinking, or speaking — to feel alive and responsive.

Listening
User speaking
Thinking
AI processing
Speaking
Agent responding

Voice Events

All events arrive on the Socket.IO event channel. Use these to drive your UI state.

EventWhenUI Action
voice-status: joinedAgent joined the roomShow "Ready" state, enable mic
transcript (isFinal: true)User finished a sentenceShow "Thinking..." indicator
speak-status: startedAgent started speakingShow "Speaking" animation
speak-status: finishedAgent finished speakingReturn to "Listening"
speak-status: interruptedUser interrupted the agentCut animation, show "Listening"
voice-status: leftAgent left the roomReset UI to disconnected state
voice-status: errorSomething went wrongShow error, offer reconnect

State Machine Implementation

React Example
const [voiceState, setVoiceState] = useState<
  'disconnected' | 'connecting' | 'listening' | 'thinking' | 'speaking'
>('disconnected');

socket.on('event', (event) => {
  switch (event.type) {
    case 'voice-status':
      if (event.status === 'joined') setVoiceState('listening');
      if (event.status === 'left') setVoiceState('disconnected');
      if (event.status === 'error') setVoiceState('disconnected');
      break;

    case 'transcript':
      if (event.isFinal) setVoiceState('thinking');
      break;

    case 'speak-status':
      if (event.status === 'started') setVoiceState('speaking');
      if (event.status === 'finished') setVoiceState('listening');
      if (event.status === 'interrupted') setVoiceState('listening');
      break;
  }
});

// Render based on state
function VoiceIndicator() {
  switch (voiceState) {
    case 'listening':  return "animate-pulse" />;
    case 'thinking':   return ;
    case 'speaking':   return "animate-pulse" />;
    default:           return ;
  }
}

Handling Interruption

Real conversations involve interruptions. When a user speaks while the agent is talking, the agent stops immediately.

What happens automatically:

  1. 1

    User speaks: Speech is detected while the agent is playing audio.

  2. 2

    Agent stops: HUMA immediately stops the agent's speech and clears the audio buffer.

  3. 3

    Event sent: You receive speak-status: interrupted. Update your UI immediately.

  4. 4

    Agent listens: The user's speech is transcribed and the agent responds to what they said, not to what it was previously saying.

Snappy UI

When you receive speak-status: interrupted, immediately switch from "Speaking" to "Listening" animation. This makes the interruption feel natural and responsive.

Example Flow

Here's a full lifecycle trace for a single conversational turn.

User speaks0ms

"What's the weather like today?"

Event: transcript (isFinal: true)~800ms

Speech finalized. UI: Thinking

Event: speak-status: started~2000ms

Agent starts speaking. UI: Speaking

Event: speak-status: finished~4000ms

Agent done. UI: Listening

Common Pitfalls

Missing "Thinking" state

There's ~1-2s of silence between the user finishing and the agent responding. Without a visual indicator, users think the app is broken. Always show "Thinking" after receiving a final transcript.

Ignoring interruption events

If your UI keeps showing "Speaking" after an interruption, it feels laggy. Listen for speak-status: interrupted and immediately reset to "Listening".

Not handling voice-status: error

If the voice connection drops, you'll get a voice-status: error event. Show the user what happened and offer a reconnect button.

Congratulations!

Integration Guide Complete

You've learned how to build voice-enabled agents with real-time speech, turn-taking, and interruption handling. Check the full Voice Implementation guide for a complete reference.