Voice Lifecycle
Understanding the real-time states of a voice session to build a responsive UI.
Overview
A voice conversation is a state machine. Your UI should reflect what the agent is doing — listening, thinking, or speaking — to feel alive and responsive.
Voice Events
All events arrive on the Socket.IO event channel. Use these to drive your UI state.
| Event | When | UI Action |
|---|---|---|
| voice-status: joined | Agent joined the room | Show "Ready" state, enable mic |
| transcript (isFinal: true) | User finished a sentence | Show "Thinking..." indicator |
| speak-status: started | Agent started speaking | Show "Speaking" animation |
| speak-status: finished | Agent finished speaking | Return to "Listening" |
| speak-status: interrupted | User interrupted the agent | Cut animation, show "Listening" |
| voice-status: left | Agent left the room | Reset UI to disconnected state |
| voice-status: error | Something went wrong | Show error, offer reconnect |
State Machine Implementation
const [voiceState, setVoiceState] = useState<
'disconnected' | 'connecting' | 'listening' | 'thinking' | 'speaking'
>('disconnected');
socket.on('event', (event) => {
switch (event.type) {
case 'voice-status':
if (event.status === 'joined') setVoiceState('listening');
if (event.status === 'left') setVoiceState('disconnected');
if (event.status === 'error') setVoiceState('disconnected');
break;
case 'transcript':
if (event.isFinal) setVoiceState('thinking');
break;
case 'speak-status':
if (event.status === 'started') setVoiceState('speaking');
if (event.status === 'finished') setVoiceState('listening');
if (event.status === 'interrupted') setVoiceState('listening');
break;
}
});
// Render based on state
function VoiceIndicator() {
switch (voiceState) {
case 'listening': return "animate-pulse" />;
case 'thinking': return ;
case 'speaking': return "animate-pulse" />;
default: return ;
}
} Handling Interruption
Real conversations involve interruptions. When a user speaks while the agent is talking, the agent stops immediately.
What happens automatically:
- 1
User speaks: Speech is detected while the agent is playing audio.
- 2
Agent stops: HUMA immediately stops the agent's speech and clears the audio buffer.
- 3
Event sent: You receive
speak-status: interrupted. Update your UI immediately. - 4
Agent listens: The user's speech is transcribed and the agent responds to what they said, not to what it was previously saying.
Snappy UI
When you receive speak-status: interrupted, immediately switch from "Speaking" to "Listening" animation. This makes the interruption feel natural and responsive.
Example Flow
Here's a full lifecycle trace for a single conversational turn.
"What's the weather like today?"
Speech finalized. UI: Thinking
Agent starts speaking. UI: Speaking
Agent done. UI: Listening
Common Pitfalls
Missing "Thinking" state
There's ~1-2s of silence between the user finishing and the agent responding. Without a visual indicator, users think the app is broken. Always show "Thinking" after receiving a final transcript.
Ignoring interruption events
If your UI keeps showing "Speaking" after an interruption, it feels laggy. Listen for speak-status: interrupted and immediately reset to "Listening".
Not handling voice-status: error
If the voice connection drops, you'll get a voice-status: error event. Show the user what happened and offer a reconnect button.
Congratulations!
Integration Guide Complete
You've learned how to build voice-enabled agents with real-time speech, turn-taking, and interruption handling. Check the full Voice Implementation guide for a complete reference.