Voice Implementation
HUMA-0.1 OnlyOverview
HUMA enables natural multi-party voice conversations that are a great fit for use cases like online debates, NPCs in video games, interview practice, and collaborative brainstorming.
Multiple participants can speak simultaneously in the same room
Low-latency speech recognition and text-to-speech
All voice events flow through the same WebSocket connection
voice.enabled: true in your metadata, the join-daily-room command will fail silently. See the "Enable Voice in Metadata" section below.Implementation Plan
Set Up Daily.co Voice Rooms
Implement Daily.co voice room services in your app or game. You'll need to create rooms, manage participants, and handle audio streams.
Standard HUMA Integration
HUMA in voice mode is an extension of standard HUMA. Start by following the general implementation guide.
Read the Integration GuideOrchestrate Agent Join/Leave
Control when agents join and leave voice chats based on your application logic. This could be triggered by user actions, game events, or automated rules.
Enable Voice in Metadata
join-daily-room will fail silently and the agent will not join the voice call.When creating an agent that will use voice features, you must include the voice property in the agent metadata:
const agent = await fetch(`${API}/api/agents`, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-API-Key': API_KEY },
body: JSON.stringify({
name: 'Voice Agent',
agentType: 'HUMA-0.1',
metadata: {
className: 'Assistant',
personality: 'Friendly voice assistant.',
instructions: 'Engage in natural conversation.',
tools: [...],
// ⚠️ REQUIRED FOR VOICE - Without this, join-daily-room will fail!
voice: {
enabled: true, // Must be true
voiceId: 'EXAVITQu4vr4xnSDxMaL' // ElevenLabs voice ID (optional)
}
}
})
}).then(r => r.json());Required. Must be set to true to enable voice features. If missing or false, the agent cannot join voice rooms.
Optional. ElevenLabs voice ID for text-to-speech. If not provided, a default voice will be used.
Voice Lifecycle
Voice is a sub-phase within the Active state. The agent must be connected via WebSocket before joining a call.
Send a join-daily-room event with:
- •
roomUrl- Daily.co room URL - •
roomToken- Authentication token
Send a leave-daily-room event.
Always leave the room before disconnecting the agent to ensure clean cleanup.
Voice Events
Client → Server Events
Join Voice Room
socket.emit('message', {
type: 'join-daily-room',
content: {
roomUrl: 'https://your-domain.daily.co/room-name',
roomToken: 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...'
}
});Leave Voice Room
socket.emit('message', {
type: 'leave-daily-room'
});Server → Client Events
All server events are received on the event channel:
| Event Type | Description | Key Fields |
|---|---|---|
| room_joined | Agent successfully joined the call | roomUrl |
| room_left | Agent left the call | roomUrl |
| participant_joined | Someone joined the call | participantId, userName |
| participant_left | Someone left the call | participantId |
| transcript | Speech-to-text from a participant | participantId, text, isFinal |
| speak | Agent is speaking | text |
| vad_start | Voice activity started (someone is speaking) | participantId |
| vad_stop | Voice activity stopped | participantId |
| transcriber_fatal_error | Speech recognition failed permanently | error |
transcriber_fatal_error, the speech recognition has failed permanently. The agent will disconnect from the call. You should notify the user and optionally reconnect.Example Integration
import { io } from 'socket.io-client';
const API = 'https://api.humalike.tech';
const API_KEY = 'ak_your_api_key';
// 1. Create agent with voice ENABLED in metadata
const agent = await fetch(`${API}/api/agents`, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-API-Key': API_KEY },
body: JSON.stringify({
name: 'Voice Assistant',
agentType: 'HUMA-0.1',
metadata: {
className: 'Assistant',
personality: 'Friendly and helpful voice assistant.',
instructions: 'Engage in natural conversation. Use the speak tool to respond.',
tools: [
{
name: 'speak',
description: 'Say something out loud in the voice call',
parameters: [
{ name: 'text', type: 'string', description: 'What to say', required: true }
]
}
],
// ⚠️ REQUIRED: Enable voice in metadata!
voice: {
enabled: true, // Must be true for voice to work
voiceId: 'EXAVITQu4vr4xnSDxMaL' // Optional: ElevenLabs voice ID
}
}
})
}).then(r => r.json());
// 2. Connect WebSocket
const socket = io(API, {
query: { agentId: agent.id, apiKey: API_KEY },
transports: ['websocket']
});
// 3. Handle voice events
socket.on('event', (data) => {
switch (data.type) {
case 'room_joined':
console.log('Agent joined voice room');
break;
case 'room_left':
console.log('Agent left voice room');
break;
case 'participant_joined':
console.log(`${data.userName} joined the call`);
break;
case 'transcript':
console.log(`${data.participantId}: ${data.text}`);
// Send as context update to trigger agent response
socket.emit('message', {
type: 'huma-0.1-event',
content: {
name: 'voice-transcript',
context: {
recentTranscripts: [...],
participants: [...]
},
description: `User said: "${data.text}"`
}
});
break;
case 'speak':
console.log(`Agent speaking: ${data.text}`);
break;
case 'transcriber_fatal_error':
console.error('Voice recognition failed:', data.error);
// Handle reconnection or notify user
break;
}
});
// 4. Join voice room (after WebSocket connected)
socket.on('connect', () => {
// Get room credentials from Daily.co
const dailyRoom = await createDailyRoom();
socket.emit('message', {
type: 'join-daily-room',
content: {
roomUrl: dailyRoom.url,
roomToken: dailyRoom.token
}
});
});
// 5. Clean up on disconnect
function cleanup() {
socket.emit('message', { type: 'leave-daily-room' });
socket.disconnect();
}Best Practices
Audio Quality
- Ensure stable internet connection for low latency
- Test with different devices and browsers
- Handle audio permission prompts gracefully
User Experience
- Show visual indicators when agent is speaking
- Display participant list and status
- Provide mute/unmute controls
Room Management
- Always send
leave-daily-roombefore disconnecting - Handle room expiration and cleanup
- Consider room capacity limits
Error Handling
- Handle transcriber errors by notifying users
- Implement reconnection logic for dropped calls
- Log events for debugging