Receive Real-Time Audio

Receive real-time audio from a bot.

📘

To start receiving real time audio streams, you need to include your websocket URL in create_bot.real_time_media .websocket_audio_destination_url.

This URL should have a ws:// or wss:// prefix depending on your server's requirements. We highly recommend using the websocket protocol over SSL/TLS (wss) since the connection is encrypted and much more secure.

Real Time Audio Protocol (Combined Streams)

📘

Combined audio streams are available on the Zoom Web Bot, Microsoft Teams Web Bot, Google Meet Bot, and Webex Bot.

The first message on websocket connection will be:

{
  protocol_version: 1,
  bot_id: '...',
  recording_id: '...',
  separate_streams: false,
  offset: 0.0 
}

The offset is the offset (in seconds) relative to the in_call_recording event on the bot.

The following websocket messages will be in binary format as follows:

  • All data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono
import asyncio
import websockets


async def echo(websocket):
    async for message in websocket:
        if isinstance(message, str):
            print(message)
        else:
            with open(f'output/output.raw', 'ab') as f:
                f.write(message)
                print("wrote message")


async def main():
    async with websockets.serve(echo, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Real Time Audio Protocol (Separate Streams)

📘

Separate audio streams per participant are only available on the Zoom Native Bot and the Teams Web Bot under a feature flag. Reach out to the Recall team over Slack if you'd like this enabled for your workspace.

When using separate-stream audio, participant audio streams will be separated via their own websocket connection.

The first message on each connection will be:

{
  protocol_version: 1,
  bot_id: '...',
  separate_streams: true,
  offset: 0.0 // Offset (in seconds) relative to the `in_call_recording` event on the bot
}

The following websocket messages will be in binary format as follows:

  • First 32 bits are a little-endian unsigned integer representing the participant_id.
  • The remaining data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono

The following is sample code to decode these messages:

import asyncio
import websockets


async def echo(websocket):
    async for message in websocket:
        if isinstance(message, str):
            print(message)
        else:
            stream_id = int.from_bytes(message[0:4], byteorder='little')
            with open(f'output/{stream_id}-output.raw', 'ab') as f:
                f.write(message[4:])
                print("wrote message")


async def main():
    async with websockets.serve(echo, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Upon muting/unmuting, a participant's corresponding websocket connection will disconnect/reconnect accordingly.

FAQ


Do muted participants produce audio?

No, muted participants do not produce any audio.

If a participant is unmuted but silent, you will receive empty audio packets.

Will bots receive audio from other bots?

Since bots are participants, if there are other bots in a call, the bot will receive audio from the bot like any other participant.

Since bots are muted by default, unless another bot is outputting audio, the bot will not receive audio packets from other bots.

What is the retry behavior?

If we are unable to connect to your endpoint, or are disconnected, we will re-try the connection every 3 seconds, while the bot is alive.