How to get Separate Audio per Participant (Realtime)

Receive audio data for each participant in realtime over websocket

📘

Audio data streaming is currently supported in raw pcm format

Audio format is mono 16 bit signed little-endian PCM at 16khz.

This guide is for you if:

  • You want to process audio data for each participant in realtime
  • You want to diarize/analyze each participant in the call individually in realtime

Platforms Support

Platform
Zoom * Native bot only
Microsoft Teams
Google Meet
Webex
Slack Huddles (Beta)
Go-To Meeting (Beta)

Implementation

Step 1: Create a bot

To get separate audio per participant, you must set recording_config.audio_separate_raw = {}. Below is an example of what it would look like in your request

curl --request POST \
     --url https://us-east-1.recall.ai/api/v1/bot \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'authorization: YOUR_RECALL_API_KEY' \
     --data '
{
  "meeting_url": "YOUR_MEETING_URL",
  "recording_config": {
    "audio_separate_raw": {} # Add this to your request body,
    "realtime_endpoints": [
      {
      	type: "websocket", // only websocket is supported for realtime audio data
        url: YOUR_WEBSOCKET_RECEIVER_URL,
        events: ["audio_separate_raw.data"]
      }
    ]
  }
}
'

const response = await fetch("https://us-east-1.recall.ai/api/v1/bot", {
  method: "POST",
  headers: {
    "accept": "application/json",
    "content-type": "application/json"
    "authorization": "YOUR_RECALL_API_KEY" // Update this
  },
  body: JSON.stringify({
    meeting_url: "YOUR_MEETING_URL", // Update this
    recording_config: {
      video_mixed_layout: "gallery_view_v2", // Add this to your request body
      video_separate_mp4: {} # Add this to your request body
    }
  })
});

if (!response.ok) {
  throw new Error(`Error: ${response.status} ${response.statusText}`);
}

const data = await response.json();

import requests

response = requests.post(
    "https://us-east-1.recall.ai/api/v1/bot",
    json={
      "meeting_url": "YOUR_MEETING_URL", # Update this
      "recording_config": {
	      "video_mixed_layout": "gallery_view_v2" # Add this to your request body
		    "video_separate_mp4": {} # Add this to your request body
      }
    },
    headers={
      "accept": "application/json",
      "content-type": "application/json",
    	"authorization": "YOUR_RECALL_API_KEY" # Update this
    }
)

if not response.ok:
 	errorMessage = f"Error: {response.status_code} - {response.text}"
  raise requests.RequestException(errorMessage)
  
result = response.json()

Step 2: Receive websocket messages with audio data

Setup a websocket server and ensure it is publicly accessible. You will receive messages in the following payload format:

{
  "event": "audio_separate_raw.data", 
  "data": {
    "data": {
      "buffer": string, // base64-encoded raw audio 16 kHz mono, S16LE(16-bit PCM LE)
      "timestamp": {
      	"relative": float,
        "absolute": string
    	},
      "participant": {
      	"id": number,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object
      }
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "audio_separate": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    },
  }
}

📘

Important: Connection behavior

In unmixed audio streaming, participants' audio streams connect and disconnect to your websocket endpoint according to their mute state.

For instance, a participant that remains muted on the call will only attempt to establish a websocket connection to your endpoint upon unmuting. When unmuting again, their corresponding websocket connection will be closed.