Receive Real-Time Video: Websockets

Receive real-time video streams from a bot via websockets.

📘

Video websockets are optimized for those doing real-time AI video analysis, providing either 360p PNG image frames at 2fps or H264 bitstreams at a resolution and framerate set by the meeting platform.

If you're looking to receive real time mixed video at 30fps instead, refer to Real-time Video: RTMP guide.

📘

Separate Participant Live Video in H264 is currently limited to bots with the web4core variant

Quickstart


To receive encoded video in real-time from a bot, you can leverage Real-Time Websocket Endpoints.

Setup a websocket endpoint

For demonstration purposes, we've set up a simple websocket receiver to receive and write video to a file:

import WebSocket from 'ws';
import fs from 'fs';

const wss = new WebSocket.Server({ port: 3456 });
let frameIdx = 0;

wss.on('connection', (ws) => {
  ws.on('message', (message: WebSocket.Data) => {
    console.log(message);

    try {
      const wsMessage = JSON.parse(message.toString());

      if (wsMessage.event === 'video_separate_png.data') {
        console.log(wsMessage);

        // Use the recording ID for the file name
        const recordingId = wsMessage.data.recording.id;
        const filePath = `${recordingId}.${frameIdx}.png`;
				frameIdx += 1;

        const encodedBuffer = Buffer.from(wsMessage.data.data.buffer, 'base64');
        const decodedBuffer = Buffer.from(encodedBuffer, 'utf8');
        fs.appendFileSync(filePath, decodedBuffer);
      } else if (wsMessage.event === 'video_separate_h264.data') {
        console.log(wsMessage);

        // Use the recording ID for the file name
        const recordingId = wsMessage.data.recording.id;
        const filePath = `${recordingId}.h264`;

        const encodedBuffer = Buffer.from(wsMessage.data.data.buffer, 'base64');
        const decodedBuffer = Buffer.from(encodedBuffer, 'utf8');
        fs.appendFileSync(filePath, decodedBuffer);
				// .h264 files can be played with ffplay: `ffplay my-recording.h264`
      }else {
        console.log("unhandled message", wsMessage.event);
      }
    } catch (e) {
      console.error('Error parsing JSON:', e);
    }
  });

  ws.on('error', (error) => {
    console.error('WebSocket Error:', error);
  });

  ws.on('close', () => {
    console.log('WebSocket Closed');
  });
});

console.log('WebSocket server started on port 3456');

For details on how to verify connections, see Verifying Real-Time Websocket Endpoints.

Once you have a basic server running locally, you'll want to expose it publicly through a tunneling tool such as ngrok. For a full setup guide, see Local Webhook Development.

Start a meeting

Now that we have our websocket server running locally and exposed through our ngrok tunnel, it's time to start a meeting and send a bot to it.

For simplicity, go to meet.new in a new tab to start an instant Google Meet call. Save this URL for the next step.

Configure the bot

Now it's time to send a bot to a meeting while configuring a real-time websocket endpoint.

To do this, call the Create Bot endpoint while providing a real-time endpoint object where:

  • type: websocket
  • url: Your publicly exposed ngrok tunnel URL
  • events: An array including video_separate_png.data and/or video_participant_h264.data events

To get separate video per participant, recording_config.video_mixed_layout to gallery_view_v2. And of course, don't forget to set meeting_url to your newly-created Google Meet call.

Example curl:

PNG Request

curl --request POST \
     --url https://us-east-1.recall.ai/api/v1/bot/ \
     --header "Authorization: $RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": "https://meet.google.com/xxx-xxxx-xxx",
  "recording_config": {
    "video_separate_png": {},
    "video_mixed_layout": "gallery_view_v2",
    "realtime_endpoints": [
      {
        "type": "websocket",
        "url": "wss://my-tunnel-domain.ngrok-free.app",
        "events": ["video_separate_png.data"]
      }
    ]
  }
}
'

H264 Request

curl --request POST \
     --url https://us-east-1.recall.ai/api/v1/bot/ \
     --header "Authorization: $RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": "https://meet.google.com/xxx-xxxx-xxx",
  "recording_config": {
    "video_separate_h264": {},
    "video_mixed_layout": "gallery_view_v2",
    "realtime_endpoints": [
      {
        "type": "websocket",
        "url": "wss://my-tunnel-domain.ngrok-free.app",
        "events": ["video_separate_h264.data"]
      }
    ],
    "variant": {
    	"zoom": "web_4_core",
    	"google_meet": "web_4_core",
    	"microsoft_teams": "web_4_core"
  	}
  }
}
'

📘

Make sure to set the url as a ws or wss endpoint.


Receive video frames

Once the bot is on the call and connected, it will begin producing video_separate_png.data and/or video_separate_h264.data events containing base64 encoded frames

These events have the following shapes:

PNG Format

{
  "event": "video_separate_png.data", 
  "data": {
    "data": {
      "buffer": string, // base64 encoded png at 2fps with resolution 360x640
      "timestamp": {
      	"relative": float,
        "absolute": string
    	},
      "type": "webcam" | "screenshare",
      "participant": {
      	"id": number,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object
      }
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "video_separate_png": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    },
  }
}

H264 Format

{
  "event": "video_separate_h264.data", 
  "data": {
    "data": {
      "buffer": string, // base64 h264 at a resolution and framerate set by the platform
      "timestamp": {
      	"relative": float,
        "absolute": string
    	},
      "type": "webcam" | "screenshare",
      "participant": {
      	"id": number,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object
      }
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "video_separate_h264": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    },
  }
}

PNG Frame Dimensions

data.buffer is the b64-encoded PNG frame. The dimensions for the PNG images are the same for all meeting platforms.

Video TypePNG Dimensions
Participant480x360
Screenshare1280×720

H264 Frame Dimensions

For H264, the resolution and framerate are set by the meeting platform. Resolution can vary during the meeting based on participant count and connection quality.

Video TypeH264 Frame Dimensions
Participant240px-1280px typical
Screenshare200px-1000px typical

🎉

And that's it! You're now streaming video in real-time to a websocket server.


FAQ


What is the retry behavior?

If we are unable to connect to your endpoint, or are disconnected, we will re-try the connection every 3 seconds, while the bot is alive.