Receive Real-Time Video: Websockets
Receive real-time video streams from a bot via websockets.
Video websockets are optimized for those doing real-time AI video analysis, providing either 360p PNG image frames at 2fps or H264 bitstreams at a resolution and framerate set by the meeting platform.
If you're looking to receive real time mixed video at 30fps instead, refer to Real-time Video: RTMP guide.
Separate Participant Live Video in H264 is currently limited to bots with the web4core variant
Quickstart
To receive encoded video in real-time from a bot, you can leverage Real-Time Websocket Endpoints.
Setup a websocket endpoint
For demonstration purposes, we've set up a simple websocket receiver to receive and write video to a file:
import WebSocket from 'ws';
import fs from 'fs';
const wss = new WebSocket.Server({ port: 3456 });
let frameIdx = 0;
wss.on('connection', (ws) => {
ws.on('message', (message: WebSocket.Data) => {
console.log(message);
try {
const wsMessage = JSON.parse(message.toString());
if (wsMessage.event === 'video_separate_png.data') {
console.log(wsMessage);
// Use the recording ID for the file name
const recordingId = wsMessage.data.recording.id;
const filePath = `${recordingId}.${frameIdx}.png`;
frameIdx += 1;
const encodedBuffer = Buffer.from(wsMessage.data.data.buffer, 'base64');
const decodedBuffer = Buffer.from(encodedBuffer, 'utf8');
fs.appendFileSync(filePath, decodedBuffer);
} else if (wsMessage.event === 'video_separate_h264.data') {
console.log(wsMessage);
// Use the recording ID for the file name
const recordingId = wsMessage.data.recording.id;
const filePath = `${recordingId}.h264`;
const encodedBuffer = Buffer.from(wsMessage.data.data.buffer, 'base64');
const decodedBuffer = Buffer.from(encodedBuffer, 'utf8');
fs.appendFileSync(filePath, decodedBuffer);
// .h264 files can be played with ffplay: `ffplay my-recording.h264`
}else {
console.log("unhandled message", wsMessage.event);
}
} catch (e) {
console.error('Error parsing JSON:', e);
}
});
ws.on('error', (error) => {
console.error('WebSocket Error:', error);
});
ws.on('close', () => {
console.log('WebSocket Closed');
});
});
console.log('WebSocket server started on port 3456');
For details on how to verify connections, see Verifying Real-Time Websocket Endpoints.
Once you have a basic server running locally, you'll want to expose it publicly through a tunneling tool such as ngrok. For a full setup guide, see Local Webhook Development.
Start a meeting
Now that we have our websocket server running locally and exposed through our ngrok tunnel, it's time to start a meeting and send a bot to it.
For simplicity, go to meet.new in a new tab to start an instant Google Meet call. Save this URL for the next step.
Configure the bot
Now it's time to send a bot to a meeting while configuring a real-time websocket endpoint.
To do this, call the Create Bot endpoint while providing a real-time endpoint object where:
type
:websocket
url
: Your publicly exposed ngrok tunnel URLevents
: An array includingvideo_separate_png.data
and/orvideo_participant_h264.data
events
To get separate video per participant, recording_config.video_mixed_layout
to gallery_view_v2. And of course, don't forget to set meeting_url
to your newly-created Google Meet call.
Example curl:
PNG Request
curl --request POST \
--url https://us-east-1.recall.ai/api/v1/bot/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"meeting_url": "https://meet.google.com/xxx-xxxx-xxx",
"recording_config": {
"video_separate_png": {},
"video_mixed_layout": "gallery_view_v2",
"realtime_endpoints": [
{
"type": "websocket",
"url": "wss://my-tunnel-domain.ngrok-free.app",
"events": ["video_separate_png.data"]
}
]
}
}
'
H264 Request
curl --request POST \
--url https://us-east-1.recall.ai/api/v1/bot/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"meeting_url": "https://meet.google.com/xxx-xxxx-xxx",
"recording_config": {
"video_separate_h264": {},
"video_mixed_layout": "gallery_view_v2",
"realtime_endpoints": [
{
"type": "websocket",
"url": "wss://my-tunnel-domain.ngrok-free.app",
"events": ["video_separate_h264.data"]
}
],
"variant": {
"zoom": "web_4_core",
"google_meet": "web_4_core",
"microsoft_teams": "web_4_core"
}
}
}
'
Make sure to set the
url
as aws
orwss
endpoint.
Receive video frames
Once the bot is on the call and connected, it will begin producing video_separate_png.data
and/or video_separate_h264.data
events containing base64 encoded frames
These events have the following shapes:
PNG Format
{
"event": "video_separate_png.data",
"data": {
"data": {
"buffer": string, // base64 encoded png at 2fps with resolution 360x640
"timestamp": {
"relative": float,
"absolute": string
},
"type": "webcam" | "screenshare",
"participant": {
"id": number,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"video_separate_png": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
},
}
}
H264 Format
{
"event": "video_separate_h264.data",
"data": {
"data": {
"buffer": string, // base64 h264 at a resolution and framerate set by the platform
"timestamp": {
"relative": float,
"absolute": string
},
"type": "webcam" | "screenshare",
"participant": {
"id": number,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"video_separate_h264": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
},
}
}
PNG Frame Dimensions
data.buffer
is the b64-encoded PNG frame. The dimensions for the PNG images are the same for all meeting platforms.
Video Type | PNG Dimensions |
---|---|
Participant | 480x360 |
Screenshare | 1280×720 |
H264 Frame Dimensions
For H264, the resolution and framerate are set by the meeting platform. Resolution can vary during the meeting based on participant count and connection quality.
Video Type | H264 Frame Dimensions |
---|---|
Participant | 240px-1280px typical |
Screenshare | 200px-1000px typical |
And that's it! You're now streaming video in real-time to a websocket server.
FAQ
What is the retry behavior?
If we are unable to connect to your endpoint, or are disconnected, we will re-try the connection every 3 seconds, while the bot is alive.
Updated about 13 hours ago