How to get Separate Audio per Participant (Realtime)

Receive audio data for each participant in realtime over websocket

📘

You can use this sample app to see how to receive and download separate audio per participant from a bot in real-time.

This guide is for you if:

  • You want to process audio data for each participant in realtime
  • You want to diarize/analyze each participant in the call individually in realtime
❗️

Real-time screenshare audio isn't captured at this time. This means the separate audio streams will not contain the screenshare audio. The screenshare audio will still be available in the final recording.

📘

Audio data streaming is currently supported in raw pcm format

Audio format is mono 16 bit signed little-endian PCM at 16khz.

Platforms Support

PlatformNumber of concurrent streams
Zoom16
Microsoft Teams9
Google Meet16
Webex
Slack Huddles (Beta)
Go-To Meeting (Beta)
🚧

This is a compute heavy feature and we recommend using 4 core bots to ensure the bot has enough resources to process the separate streams

Implementation

Step 1: Create a bot

To get separate audio per participant, you must set recording_config.audio_separate_raw = {}. Below is an example of what it would look like in your request

📘

Make sure the url in realtime_endpoints uses the WebSocket protocol (ws:// or wss://) instead of HTTP (e.g. http:// or https://.

curl --request POST \
     --url https://us-west-2.recall.ai/api/v1/bot \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'authorization: YOUR_RECALL_API_KEY' \
     --data '
{
  "meeting_url": "YOUR_MEETING_URL",
  "recording_config": {
    "audio_separate_raw": {}, // Add this to your request body
    "realtime_endpoints": [
      {
      	type: "websocket", // only websocket is supported for realtime audio data
        url: YOUR_WEBSOCKET_RECEIVER_URL,
        events: ["audio_separate_raw.data"]
      }
    ]
  }
}
'
const response = await fetch("https://us-west-2.recall.ai/api/v1/bot", {
  method: "POST",
  headers: {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "YOUR_RECALL_API_KEY" // Update this
  },
  body: JSON.stringify({
    meeting_url: "YOUR_MEETING_URL", // Update this
    recording_config: {
      video_mixed_layout: "gallery_view_v2", // Add this to your request body
      video_separate_mp4: {} // Add this to your request body
    }
  })
});

if (!response.ok) {
  throw new Error(`Error: ${response.status} ${response.statusText}`);
}

const data = await response.json();
import requests

response = requests.post(
    "https://us-west-2.recall.ai/api/v1/bot",
    json={
      "meeting_url": "YOUR_MEETING_URL", # Update this
      "recording_config": {
	      "video_mixed_layout": "gallery_view_v2", # Add this to your request body
		    "video_separate_mp4": {} # Add this to your request body
      }
    },
    headers={
      "accept": "application/json",
      "content-type": "application/json",
    	"authorization": "YOUR_RECALL_API_KEY" # Update this
    }
)

if not response.ok:
 	errorMessage = f"Error: {response.status_code} - {response.text}"
  raise requests.RequestException(errorMessage)
  
result = response.json()

Step 2: Receive websocket messages with audio data

Setup a websocket server and ensure it is publicly accessible. You will receive messages in the following payload format:

{
  "event": "audio_separate_raw.data", 
  "data": {
    "data": {
      "buffer": string, // base64-encoded raw audio 16 kHz mono, S16LE(16-bit PCM LE)
      "timestamp": {
      	"relative": float,
        "absolute": string
    	},
      "participant": {
      	"id": number,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      }
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "audio_separate": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    },
  }
}

Connection Behaviors

Each real-time endpoint you configure in the Create Bot config establishes its own WebSocket connection. That connection remains open until your server explicitly closes it or the call ends and the bot disconnects.

Muting or unmuting does not close the connection. When muted, the bot simply pauses binary media streaming for that endpoint, and unmuting resumes the stream on the same socket.

FAQ

Do muted participants produce audio?

No, muted participants do not produce any audio.

Why am I receiving empty audio packets?

If a participant is unmuted but silent, you will receive empty audio packets.

Will bots receive audio from other bots?

Since bots are participants, if there are other bots in a call, the bot will receive audio from the bot like any other participant.

Since bots are muted by default, unless another bot is outputting audio, the bot will not receive audio packets from other bots.

What is the retry behavior?

See the retry behaviors in Real-Time Websocket Endpoints for details.

Will the bot's audio/video/transcript be included in the final recording?

You can configure the bot's audio to be included in the final recording by setting the recording_config.include_bot_in_recording.audio: true in the Create Bot request.

You cannot include the bots video or transcript in the final recording at this time. If you want to workaround this, we recommend:

  • If you only need the transcript:
    • Real-time - you will need to fetch it from your TTS provider (if applicable). You can then merge it by aligning the timestamps from the TTS provider's transcript with the generated transcript of the other participants
    • Async - you will need to:
      • Include the bot's audio in the recording and enable Perfect Diarization in the Create Bot request config
        {
          "recording_config": {
            "include_bot_in_recording": {
              "audio": true
            },
            "transcript": {
              "diarization": {
                "use_separate_streams_when_available": true
              }
            }
          }
        }
        '
      • Use Async Transcription to transcribe the call. Note that the bot's name will return as null but you can fetch the bot's name by querying the Retrieve Bot api instead
  • If you need the video: you will need to send another bot to the call to record the meeting. This bot will capture all participants, including the bot agent that is outputting video. This method can also transcribe as the bot agent is registered as its own participant