How to get Separate Audio per Participant (Realtime)

Receive audio data for each participant in realtime over websocket

📘

Audio data streaming is currently supported in raw pcm format

Audio format is mono 16 bit signed little-endian PCM at 16khz.

This guide is for you if:

  • You want to process audio data for each participant in realtime
  • You want to diarize/analyze each participant in the call individually in realtime
❗️

Real-time screenshare audio isn't captured at this time. This means the separate audio streams will not contain the screenshare audio. The screenshare audio will still be available in the final recording.

Platforms Support

Platform
Zoom
Microsoft Teams
Google Meet
Webex
Slack Huddles (Beta)
Go-To Meeting (Beta)
🚧

This is a compute heavy feature and we recommend using 4 core bots to ensure the bot has enough resources to process the separate streams

Implementation

Step 1: Create a bot

To get separate audio per participant, you must set recording_config.audio_separate_raw = {}. Below is an example of what it would look like in your request

curl --request POST \
     --url https://us-west-2.recall.ai/api/v1/bot \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'authorization: YOUR_RECALL_API_KEY' \
     --data '
{
  "meeting_url": "YOUR_MEETING_URL",
  "recording_config": {
    "audio_separate_raw": {}, // Add this to your request body
    "realtime_endpoints": [
      {
      	type: "websocket", // only websocket is supported for realtime audio data
        url: YOUR_WEBSOCKET_RECEIVER_URL,
        events: ["audio_separate_raw.data"]
      }
    ]
  }
}
'
const response = await fetch("https://us-west-2.recall.ai/api/v1/bot", {
  method: "POST",
  headers: {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "YOUR_RECALL_API_KEY" // Update this
  },
  body: JSON.stringify({
    meeting_url: "YOUR_MEETING_URL", // Update this
    recording_config: {
      video_mixed_layout: "gallery_view_v2", // Add this to your request body
      video_separate_mp4: {} // Add this to your request body
    }
  })
});

if (!response.ok) {
  throw new Error(`Error: ${response.status} ${response.statusText}`);
}

const data = await response.json();
import requests

response = requests.post(
    "https://us-west-2.recall.ai/api/v1/bot",
    json={
      "meeting_url": "YOUR_MEETING_URL", # Update this
      "recording_config": {
	      "video_mixed_layout": "gallery_view_v2", # Add this to your request body
		    "video_separate_mp4": {} # Add this to your request body
      }
    },
    headers={
      "accept": "application/json",
      "content-type": "application/json",
    	"authorization": "YOUR_RECALL_API_KEY" # Update this
    }
)

if not response.ok:
 	errorMessage = f"Error: {response.status_code} - {response.text}"
  raise requests.RequestException(errorMessage)
  
result = response.json()

Step 2: Receive websocket messages with audio data

Setup a websocket server and ensure it is publicly accessible. You will receive messages in the following payload format:

{
  "event": "audio_separate_raw.data", 
  "data": {
    "data": {
      "buffer": string, // base64-encoded raw audio 16 kHz mono, S16LE(16-bit PCM LE)
      "timestamp": {
      	"relative": float,
        "absolute": string
    	},
      "participant": {
      	"id": number,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      }
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "audio_separate": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    },
  }
}

Connection Behaviors

Each real-time endpoint you configure in the Create Bot config establishes its own WebSocket connection. That connection remains open until your server explicitly closes it or the call ends and the bot disconnects.

Muting or unmuting does not close the connection. When muted, the bot simply pauses binary media streaming for that endpoint, and unmuting resumes the stream on the same socket.