How to get Separate Audio per Participant (Realtime)
Receive audio data for each participant in realtime over websocket
You can use this sample app to see how to receive and download separate audio per participant from a bot in real-time.
This guide is for you if:
- You want to process audio data for each participant in realtime
- You want to diarize/analyze each participant in the call individually in realtime
Real-time screenshare audio isn't captured at this time. This means the separate audio streams will not contain the screenshare audio. The screenshare audio will still be available in the final recording.
Audio data streaming is currently supported in raw pcm formatAudio format is mono 16 bit signed little-endian PCM at 16khz.
Platforms Support
| Platform | Number of concurrent streams | |
|---|---|---|
| Zoom | ✅ | 16 |
| Microsoft Teams | ✅ | 9 |
| Google Meet | ✅ | 16 |
| Webex | ❌ | |
| Slack Huddles (Beta) | ❌ | |
| Go-To Meeting (Beta) | ❌ |
This is a compute heavy feature and we recommend using 4 core bots to ensure the bot has enough resources to process the separate streams
Implementation
Step 1: Create a bot
To get separate audio per participant, you must set recording_config.audio_separate_raw = {}. Below is an example of what it would look like in your request
Make sure the
urlinrealtime_endpointsuses the WebSocket protocol (ws://orwss://) instead of HTTP (e.g.http://orhttps://.
curl --request POST \
--url https://us-west-2.recall.ai/api/v1/bot \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--header 'authorization: YOUR_RECALL_API_KEY' \
--data '
{
"meeting_url": "YOUR_MEETING_URL",
"recording_config": {
"audio_separate_raw": {}, // Add this to your request body
"realtime_endpoints": [
{
type: "websocket", // only websocket is supported for realtime audio data
url: YOUR_WEBSOCKET_RECEIVER_URL,
events: ["audio_separate_raw.data"]
}
]
}
}
'const response = await fetch("https://us-west-2.recall.ai/api/v1/bot", {
method: "POST",
headers: {
"accept": "application/json",
"content-type": "application/json",
"authorization": "YOUR_RECALL_API_KEY" // Update this
},
body: JSON.stringify({
meeting_url: "YOUR_MEETING_URL", // Update this
recording_config: {
video_mixed_layout: "gallery_view_v2", // Add this to your request body
video_separate_mp4: {} // Add this to your request body
}
})
});
if (!response.ok) {
throw new Error(`Error: ${response.status} ${response.statusText}`);
}
const data = await response.json();import requests
response = requests.post(
"https://us-west-2.recall.ai/api/v1/bot",
json={
"meeting_url": "YOUR_MEETING_URL", # Update this
"recording_config": {
"video_mixed_layout": "gallery_view_v2", # Add this to your request body
"video_separate_mp4": {} # Add this to your request body
}
},
headers={
"accept": "application/json",
"content-type": "application/json",
"authorization": "YOUR_RECALL_API_KEY" # Update this
}
)
if not response.ok:
errorMessage = f"Error: {response.status_code} - {response.text}"
raise requests.RequestException(errorMessage)
result = response.json()Step 2: Receive websocket messages with audio data
Setup a websocket server and ensure it is publicly accessible. You will receive messages in the following payload format:
{
"event": "audio_separate_raw.data",
"data": {
"data": {
"buffer": string, // base64-encoded raw audio 16 kHz mono, S16LE(16-bit PCM LE)
"timestamp": {
"relative": float,
"absolute": string
},
"participant": {
"id": number,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object,
"email": string | null
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"audio_separate": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
},
}
}Connection Behaviors
Each real-time endpoint you configure in the Create Bot config establishes its own WebSocket connection. That connection remains open until your server explicitly closes it or the call ends and the bot disconnects.
Muting or unmuting does not close the connection. When muted, the bot simply pauses binary media streaming for that endpoint, and unmuting resumes the stream on the same socket.
FAQ
Do muted participants produce audio?
No, muted participants do not produce any audio.
Why am I receiving empty audio packets?
If a participant is unmuted but silent, you will receive empty audio packets.
Will bots receive audio from other bots?
Since bots are participants, if there are other bots in a call, the bot will receive audio from the bot like any other participant.
Since bots are muted by default, unless another bot is outputting audio, the bot will not receive audio packets from other bots.
What is the retry behavior?
See the retry behaviors in Real-Time Websocket Endpoints for details.
Will the bot's audio/video/transcript be included in the final recording?
You can configure the bot's audio to be included in the final recording by setting the recording_config.include_bot_in_recording.audio: true in the Create Bot request.
You cannot include the bots video or transcript in the final recording at this time. If you want to workaround this, we recommend:
- If you only need the transcript:
- Real-time - you will need to fetch it from your TTS provider (if applicable). You can then merge it by aligning the timestamps from the TTS provider's transcript with the generated transcript of the other participants
- Async - you will need to:
- Include the bot's audio in the recording and enable Perfect Diarization in the Create Bot request config
{ "recording_config": { "include_bot_in_recording": { "audio": true }, "transcript": { "diarization": { "use_separate_streams_when_available": true } } } } ' - Use Async Transcription to transcribe the call. Note that the bot's name will return as
nullbut you can fetch the bot's name by querying the Retrieve Bot api instead
- Include the bot's audio in the recording and enable Perfect Diarization in the Create Bot request config
- If you need the video: you will need to send another bot to the call to record the meeting. This bot will capture all participants, including the bot agent that is outputting video. This method can also transcribe as the bot agent is registered as its own participant
Updated 5 days ago