Receive Real-Time Video: Websockets

Receive real-time video streams from a bot via websockets.

📘

Video websockets are optimized for those doing real-time AI video analysis, providing 720p PNG image frames at 2fps.

If you're looking to receive real time video for human consumption instead, you should use RTMP by specifying the create_bot.real_time_media.rtmp_destination_url, which will give you back normal 30 frames per second video.

Quickstart


Setup

To configure a bot to receive real-time video, you should include your websocket URL in the Create Bot request by specifying the real_time_media.websocket_video_destination_url:

curl --request POST \
     --url https://us-east-1.recall.ai/api/v1/bot/ \
     --header 'Authorization: '"$RECALL_API_KEY"'' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "meeting_url": "https://meet.google.com/kbw-rphc-zsc",
  "real_time_media": {
    "websocket_video_destination_url": "wss://expert-puma-further.ngrok-free.app"
  }
}
'

📘

ws:// vs wss://

While WebSockets can connect either via HTTP (ws) or HTTPS (wss), we highly recommend establishing websocket connections over HTTPS (wss) since the connection is SSL/TLS encrypted and much more secure.

Message Format

Each video stream will connect to your websocket server as its own connection.

The first message on websocket connection will be a JSON containing the bot ID:
{"protocol_version": 1, "bot_id": "<BOT_ID>"}

The following websocket messages will be in binary format as follows:

  • First 32 bits are a little-endian unsigned integer representing the "participant_id". This participant ID is the same as the ID on the corresponding participant in the Bot's meeting_participants list.
  • Second 32 bits are a little-endian unsigned integer representing the millisecond timestamp of this frame. The timestamp is relative to the start of the video (not a unix timestamp).
  • The remaining data in the websocket packet is the PNG encoded frame. See below for dimensions.

The following is sample code to decode these messages:

import asyncio
import websockets


async def echo(websocket):
    async for message in websocket:
        if isinstance(message, str):
            print(message)
        else:
            stream_id = int.from_bytes(message[0:4], byteorder='little')
            timestamp = int.from_bytes(message[4:8], byteorder='little')
            with open(f'output/{stream_id}-{timestamp}.png', 'wb') as f:
                f.write(message[8:])
                print("wrote message")


async def main():
    async with websockets.serve(echo, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Image Frame Dimensions

The dimensions for the PNG images are the same for all meeting platforms.

Video streamImage Dimensions
Participant - Default1280x720
Participant - While screensharing256x144
Screenshare1024x576

Known Issues

  • If your bot is configured with recording_mode: speaker_view, you will always get stream_id=0, and you will also receive only a single stream of video corresponding to the active speaker. You must set recording_mode to gallery_view or gallery_view_v2 for this feature to work properly.

FAQ

What is the retry behavior?

If we are unable to connect to your endpoint, or are disconnected, we will re-try the connection every 3 seconds, while the bot is alive.