How to get Separate Audio per Participant (Realtime)
Receive audio data for each participant in realtime over websocket
Audio data streaming is currently supported in raw pcm formatAudio format is mono 16 bit signed little-endian PCM at 16khz.
This guide is for you if:
- You want to process audio data for each participant in realtime
- You want to diarize/analyze each participant in the call individually in realtime
Real-time screenshare audio isn't captured at this time. This means the separate audio streams will not contain the screenshare audio. The screenshare audio will still be available in the final recording.
Platforms Support
| Platform | |
|---|---|
| Zoom | ✅ |
| Microsoft Teams | ✅ |
| Google Meet | ✅ |
| Webex | ❌ |
| Slack Huddles (Beta) | ❌ |
| Go-To Meeting (Beta) | ❌ |
This is a compute heavy feature and we recommend using 4 core bots to ensure the bot has enough resources to process the separate streams
Implementation
Step 1: Create a bot
To get separate audio per participant, you must set recording_config.audio_separate_raw = {}. Below is an example of what it would look like in your request
curl --request POST \
--url https://us-west-2.recall.ai/api/v1/bot \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--header 'authorization: YOUR_RECALL_API_KEY' \
--data '
{
"meeting_url": "YOUR_MEETING_URL",
"recording_config": {
"audio_separate_raw": {}, // Add this to your request body
"realtime_endpoints": [
{
type: "websocket", // only websocket is supported for realtime audio data
url: YOUR_WEBSOCKET_RECEIVER_URL,
events: ["audio_separate_raw.data"]
}
]
}
}
'const response = await fetch("https://us-west-2.recall.ai/api/v1/bot", {
method: "POST",
headers: {
"accept": "application/json",
"content-type": "application/json",
"authorization": "YOUR_RECALL_API_KEY" // Update this
},
body: JSON.stringify({
meeting_url: "YOUR_MEETING_URL", // Update this
recording_config: {
video_mixed_layout: "gallery_view_v2", // Add this to your request body
video_separate_mp4: {} // Add this to your request body
}
})
});
if (!response.ok) {
throw new Error(`Error: ${response.status} ${response.statusText}`);
}
const data = await response.json();import requests
response = requests.post(
"https://us-west-2.recall.ai/api/v1/bot",
json={
"meeting_url": "YOUR_MEETING_URL", # Update this
"recording_config": {
"video_mixed_layout": "gallery_view_v2", # Add this to your request body
"video_separate_mp4": {} # Add this to your request body
}
},
headers={
"accept": "application/json",
"content-type": "application/json",
"authorization": "YOUR_RECALL_API_KEY" # Update this
}
)
if not response.ok:
errorMessage = f"Error: {response.status_code} - {response.text}"
raise requests.RequestException(errorMessage)
result = response.json()Step 2: Receive websocket messages with audio data
Setup a websocket server and ensure it is publicly accessible. You will receive messages in the following payload format:
{
"event": "audio_separate_raw.data",
"data": {
"data": {
"buffer": string, // base64-encoded raw audio 16 kHz mono, S16LE(16-bit PCM LE)
"timestamp": {
"relative": float,
"absolute": string
},
"participant": {
"id": number,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object,
"email": string | null
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"audio_separate": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
},
}
}Connection Behaviors
Each real-time endpoint you configure in the Create Bot config establishes its own WebSocket connection. That connection remains open until your server explicitly closes it or the call ends and the bot disconnects.
Muting or unmuting does not close the connection. When muted, the bot simply pauses binary media streaming for that endpoint, and unmuting resumes the stream on the same socket.
Updated about 6 hours ago