Integration Guide · Phase 8

Voice Lifecycle

Understanding the real-time states of a voice session to build a responsive UI.

Overview

A voice conversation is a state machine. Your UI should reflect what the agent is doing — listening, thinking, or speaking — to feel alive and responsive.

Listening

User speaking

Thinking

AI processing

Speaking

Agent responding

Voice Events

All events arrive on the Socket.IO event channel. Use these to drive your UI state.

Event	When	UI Action
voice-status: joined	Agent joined the room	Show "Ready" state, enable mic
transcript (isFinal: true)	User finished a sentence	Show "Thinking..." indicator
speak-status: started	Agent started speaking	Show "Speaking" animation
speak-status: finished	Agent finished speaking	Return to "Listening"
speak-status: interrupted	User interrupted the agent	Cut animation, show "Listening"
voice-status: left	Agent left the room	Reset UI to disconnected state
voice-status: error	Something went wrong	Show error, offer reconnect

State Machine Implementation

React Example

const [voiceState, setVoiceState] = useState<
  'disconnected' | 'connecting' | 'listening' | 'thinking' | 'speaking'
>('disconnected');

socket.on('event', (event) => {
  switch (event.type) {
    case 'voice-status':
      if (event.status === 'joined') setVoiceState('listening');
      if (event.status === 'left') setVoiceState('disconnected');
      if (event.status === 'error') setVoiceState('disconnected');
      break;

    case 'transcript':
      if (event.isFinal) setVoiceState('thinking');
      break;

    case 'speak-status':
      if (event.status === 'started') setVoiceState('speaking');
      if (event.status === 'finished') setVoiceState('listening');
      if (event.status === 'interrupted') setVoiceState('listening');
      break;
  }
});

// Render based on state
function VoiceIndicator() {
  switch (voiceState) {
    case 'listening':  return "animate-pulse" />;
    case 'thinking':   return ;
    case 'speaking':   return "animate-pulse" />;
    default:           return ;
  }
}

Handling Interruption

Real conversations involve interruptions. When a user speaks while the agent is talking, the agent stops immediately.

What happens automatically:

1
User speaks: Speech is detected while the agent is playing audio.
2
Agent stops: HUMA immediately stops the agent's speech and clears the audio buffer.
3
Event sent: You receive speak-status: interrupted. Update your UI immediately.
4
Agent listens: The user's speech is transcribed and the agent responds to what they said, not to what it was previously saying.

Snappy UI

When you receive speak-status: interrupted, immediately switch from "Speaking" to "Listening" animation. This makes the interruption feel natural and responsive.

Example Flow

Here's a full lifecycle trace for a single conversational turn.

User speaks0ms

"What's the weather like today?"

Event: transcript (isFinal: true)~800ms

Speech finalized. UI: Thinking

Event: speak-status: started~2000ms

Agent starts speaking. UI: Speaking

Event: speak-status: finished~4000ms

Agent done. UI: Listening

Common Pitfalls

Missing "Thinking" state

There's ~1-2s of silence between the user finishing and the agent responding. Without a visual indicator, users think the app is broken. Always show "Thinking" after receiving a final transcript.

Ignoring interruption events

If your UI keeps showing "Speaking" after an interruption, it feels laggy. Listen for speak-status: interrupted and immediately reset to "Listening".

Not handling voice-status: error

If the voice connection drops, you'll get a voice-status: error event. Show the user what happened and offer a reconnect button.

Congratulations!

Integration Guide Complete

You've learned how to build voice-enabled agents with real-time speech, turn-taking, and interruption handling. Check the full Voice Implementation guide for a complete reference.

Voice Reference Back to Docs

Phase 7: Enable Voice