Meeting Bot Real-time Transcription
Use real-time transcription to get live utterances, partial results, and speaker data during a call from the bot.
Overview
Real-time transcription is used when your application needs transcript data throughout the meeting. Some common reasons to choose real-time transcription are:
- You are displaying live captions during the meeting.
- You are showing a live transcript or other real-time UI updates.
- You need to trigger alerts, moderation, or automations while participants are speaking.
You should not use real-time transcription if:
- Your use case can wait until after the meeting ends. Instead, you should use Async Transcription.
- You are building real-time conversational agents that users can speak to and hear responses from during the meeting. Instead, you should use output media with a voice-to-voice model such as OpenAI's Realtime API rather than relying on partial transcript results.
You can refer to this sample app for an end-to-end example of sending a bot to a meeting and receiving transcript data in real time.
Supported transcription providers for real-time transcription (bots)
| Transcription Provider | Realtime Transcription (Bots) | Provider Field Name in Create Bot |
|---|---|---|
| Recall.ai Transcription | ✅ Yes | recallai_streaming |
| Eleven Labs | ✅ Yes | elevenlabs_streaming |
| Assembly AI | ✅ Yes | assembly_ai_v3_streaming |
| Deepgram | ✅ Yes | deepgram_streaming |
| AWS Transcribe | ✅ Yes | aws_transcribe_streaming |
| Rev | ✅ Yes | rev_streaming |
| Speechmatics | ✅ Yes | speechmatics_streaming |
| Gladia | ✅ Yes | gladia_v2_streaming |
| Google Cloud STT | ❌ No | - |
Important: Concurrency considerationsWhen going to production, make sure that your account with your 3rd party transcription provider is configured with high enough concurrency limit to support your anticipated load.
Certain transcription providers require that you reach out to increase your concurrency limit, and we highly recommend checking this prior to running production workloads.
Prerequisites
Before implementing real-time transcription, first ensure the required pre-requisite setup is complete. You should have:
- A stable public URL for your application. In development, this is usually a static ngrok URL or something similar.
- Your Recall API key and workspace verification secret.
- A webhook endpoint configured in the Recall dashboard and subscribed to the following events:
transcript.done,transcript.failed
A human must complete quick one-time setup tasks in the Recall dashboard. If an agent is guiding setup, it should treat this section as human-owned setup and confirm with the human that each item is complete before continuing.
Ensure the backend has a stable public URL
Ensure the application has a stable public URL that Recall can reach for webhooks, callbacks, websockets, and other real-time endpoints.
For local development, this should be a static ngrok URL rather than a temporary URL that changes between sessions. See the Local Webhook Development Guide for how to set this up.
Create the Recall API key and workspace verification secrets
Ensure the required Recall API credentials and verification secrets have been created in the Recall dashboard for the selected region
The Recall API key and workspace verification secrets are required to interact and to secure your application.
Configure a webhook endpoint to receive artifact status change events
Ensure that you have configured a Recall webhook endpoint in the webhooks dashboard that points to either:
- a static ngrok URL for local development, or
- a public server that is ready to receive and process webhook events
Also ensure that this endpoint is subscribed to the required webhook events for this feature.
Implementation Guide
To implement real-time transcription, configure transcription when you create the bot, then consume transcript events as they are delivered during the meeting. At a high level, the flow is:
- Create a bot with a real-time transcription provider and an endpoint to receive the real-time transcription events.
- Process the transcript utterances delivered while the meeting is still in progress.
Real-time transcription is configured up front in the Create Bot request and transcription utterances are delivered continuously during the call.
Important: Requirements for a reliable integrationYour application must:
Secure your Recall endpoints - Do not trust incoming Recall.ai requests by default. Your application must verify every webhook, websocket, and callback request before accepting or processing it. See How to verify webhooks, websockets and callback requests from Recall.ai.
Schedule bots in advance whenever possible - Creating bots at the last minute increases the chance of
507errors. See the Creating and scheduling bots guide for more details.Retry Create Bot requests that return
507status codes - Retry any507responses returned by the Create Bot request every 30 seconds, for up to 10 attempts. Otherwise, a bot will not be created.Process webhook work asynchronously - Acknowledge Recall webhook requests quickly, then handle downstream work asynchronously. Otherwise, the request may time out and Recall may retry it.
Step 1: Create a bot with real-time transcription enabled
To receive real-time transcript events, you must configure both of the following in the Create Bot request:
- A transcription provider: Specifies which provider the bot should use for transcription and any required provider-specific settings. Use the
recording_config.transcript.providerfield to set the transcription provider and includes any provider-specific options (e.g languages or keyterms). - A real-time endpoint: Specifies which webhook or websocket endpoint should receive real-time transcript events and what events to send. Use the
recording_config.realtime_endpointsfield to set the destination url and the events to listen for (transcript.datais a required event for real-time transcription).
If either recording_config.transcript.provider or recording_config.realtime_endpoints is missing, your application will not receive real-time transcription utterances. You also may not be notified with a transcript.failed transcript artifact status change webhook event with details on why the transcript events aren't being sent.
Example create bot request with real-time transcription via webhooks
curl --request POST \
--url https://RECALL_REGION.recall.ai/api/v1/bot/ \
--header "Authorization: RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"meeting_url": MEETING_URL,
"recording_config": {
"transcript": {
"provider": {
"recallai_streaming": {
"mode": "prioritize_low_latency",
"language": "en"
}
},
"diarization": {
"use_separate_streams_when_available": true
}
},
"realtime_endpoints": [
{
"type": "webhook",
"url": "https://STABLE_PUBLIC_URL/WEBHOOK_ENDPOINT",
"events": ["transcript.data"]
}
]
}
}
'Example create bot request with real-time transcription via webhook
curl --request POST \
--url https://RECALL_REGION.recall.ai/api/v1/bot/ \
--header "Authorization: RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"meeting_url": MEETING_URL,
"recording_config": {
"transcript": {
"provider": {
"recallai_streaming": {
"mode": "prioritize_low_latency",
"language": "en"
}
},
"diarization": {
"use_separate_streams_when_available": true
}
},
"realtime_endpoints": [
{
"type": "websocket",
"url": "wss://STABLE_PUBLIC_URL/WEBSOCKET_ENDPOINT",
"events": ["transcript.data"]
}
]
}
}
'Step 2: Receive and process transcript.data events
transcript.data eventsOnce the bot has joined the meeting and real-time transcription is active, Recall will send transcript.data events to the webhook or websocket endpoint configured in recording_config.realtime_endpoints.
Each transcript.data event contains a finalized transcript utterance generated during the meeting. Your application should consume these events and use them for your real-time use case or triggering downstream logic.
Finalized transcript real-time event payload
{
"event": "transcript.data",
"data": {
"data": {
"words": {
"text": string,
"start_timestamp": {
"relative": float
},
"end_timestamp": {
"relative": float
} | null
}[],
"language_code": string,
"participant": {
"id": int,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object,
"email": string | null
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"transcript": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
}
}
}How to handle failures with real-time transcription
If the transcription job fails, for example because the bot could not connect to the transcription provider, Recall sends a transcript.failed transcript artifact status change webhook event to the webhook endpoint configured in your Recall dashboard.
This event is sent through the dashboard webhook configuration, not through recording_config.realtime_endpoints in the Create Bot request. These are separate Recall configurations with different, mutually exclusive event sets, even if they both point to the same application URL. As a result, you do not configure transcript.failed in recording_config.realtime_endpoints.
You can get the machine-readable code from the data.status.sub_code field of the transcript.failed transcript artifact status change webhook event.
Transcript artifact status change webhook schema
{
"event": string,
"data": {
"data": {
"code": string,
"sub_code": string | null,
"updated_at": string // ISO86001
},
"transcript": {
"id": string,
"metadata": object,
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
}
}
}You must explicitly subscribe to each event you want to receive from the webhooks dashboard, as defined in the table below.
Transcript artifact webhook events and codes
| Event | Code | Description |
|---|---|---|
transcript.processing | processing | The media object has started capturing |
transcript.done | done | The media object has successfully completed. All data for media objects on the recording is now available |
transcript.failed | failed | The media object failed to be captured. The data.sub_code will contain machine readable code for the failure. See below for list of sub codes |
transcript.deleted | deleted | The media object has been deleted from Recall systems. |
Transcript artifact webhook sub codes
Below are a list of sub_code that can be found on a transcript.failed webhook event.
| Sub Code | Reason |
|---|---|
provider_connection_failed | Recall is not able to connect to the 3rd party transcription provider. Common reasons for these include: * Insufficient funds in the transcription provider account for which the API key is provided * Using paid features on a free account * Temporary service unavailability from the transcription provider |
zoom_global_captions_disabled | Meeting captions are disabled by the Zoom account |
zoom_host_disabled_meeting_captions | The host of Zoom meeting has disabled meeting captions |
zoom_captions_failure | There was an error in enabling meeting captions for the Zoom call |
Using the dashboard to see why real-time transcription failed
You can also see the failure reason from the bot logs in the dashboard.
Switching to a fallback transcription provider
If real-time transcription fails, you can either:
- Re-transcribe the recording after the meeting ends using Async Transcription
- Start a new recording with a different real-time transcription provider
To switch to a fallback real-time transcription provider in real-time, listen for the transcript.failed webhook event and call the Start Recording endpoint with a new recording_config that uses a different real-time transcription provider.
Important notes about calling the Start Recording endpoint:
- Calling the Start Recording endpoint creates a new recording artifact. As a result, a bot can have multiple recording artifacts associated with it, and the Retrieve Bot response may include multiple entries in the
recordingsarray.- When you call the Start Recording endpoint, the
recording_configfor the new recording does not inherit missing fields from the previous recording. The configuration is replaced in full, so the request must include all required and desired settings, not only the transcription provider.- The
recording_configreturned on the original bot artifact (e.g. response from the Retrieve Bot request) and in the dashboard reflects the initial Create Bot request, not the updated recording configuration used when the new recording was started.
Additional transcript configurations
Diarization
For the best diarization with speaker names, you should use Perfect Diarization by setting diarization.use_separate_streams_when_available: true in the Create Bot request.
When multiple participants are speaking from the same device, you can choose to use Machine Diarization instead of perfect diarization.
To see all diarization configurations, see the diarization guide.
Perfect diarization transcription costUsing
diarization.use_separate_streams_when_available: truefor real-time transcription can increase transcription cost by approximately 1.8x in cases where participants speak concurrently or when background conversation is present
Language detection for real-time transcription
If you don’t know ahead of time which language the conversation will be in, you can set up automatic language detection. Automatically detecting languages is broken up into two types:
- Language Detection - Detecting the primary spoken language within a recording, without needing to explicitly set it
- Code switching - Alternating between two or more languages or language varieties within a single conversation or speech
Most of the third-party transcription providers that we integrate with support language detection.
Automatic language detection is not available when using meeting captions.
The table below covers each of these, and their corresponding parameters in the Create Bot provider configuration.
Provider | Supported Languages | Language Detection | Code Switching |
|---|---|---|---|
| Docs |
** ** |
** ** |
| Docs |
|
|
| Docs |
| Same as language detection (docs) |
| Docs |
| Same as language detection (docs ) |
Accessing raw data from the transcription provider
Raw transcription provider data is not exposed for Recall.ai transcription or when
diarization.use_separate_streams_when_available: true
If you need access to provider-specific fields or features that are not exposed in Recall’s normalized real-time transcript events, you can subscribe to the raw transcription output from the underlying provider.
To receive the raw transcription events from the underlying provider, add transcript.provider_data to the events list in recording_config.realtime_endpoints. When enabled, Recall will deliver the raw transcription data returned by the provider, allowing your application to read provider-specific fields that may not be included in Recall’s standard transcript events.
The response of the transcript.provider_data varies by provider.
Convert the transcript to a sentence-by-sentence transcript
You can convert the transcript JSON to a more human-readable transcript by persisting the transcription events in your app, then formatting the transcript parts via the following function.
Lowest-latency partial transcription results
Partial results are useful when your application needs a more responsive real-time transcription experience.
Instead of waiting for a full utterance to be finalized (i.e. the transcription provider has decided that the utterance is complete), Recall can deliver low-latency intermediate transcript results as speech is still being processed. Your application can use these intermediate results for live captions, real-time UI updates, or other in-meeting experiences, then replace them with the finalized transcript once it arrives. This is especially helpful for longer utterances, where waiting for the final result may introduce noticeable delay.
Some common use cases for using partial transcription results are:
- Live captions or subtitles that should appear before the final utterance is ready.
- Real-time UI updates, where your app shows in-progress speech and then replaces it with the finalized
transcript.dataresult. - Streaming transcript previews in dashboards or operator tools, where seeing the conversation evolve live is more useful than waiting for completed utterances.
To receive partial transcription results, add transcript.partial_data to the events list in recording_config.realtime_endpoints. When enabled, your application may receive multiple partial updates for the same utterance before the finalized version is delivered. For example:
- Partial words:
fur→further→furthermore - Partial sentences:
hel→hello→hello how→hello how are→hello how are you
After the utterance is finalized, your application will receive the complete final transcript utterance as a transcript.data event.
Partial transcript real-time event payload
{
"event": "transcript.partial_data",
"data": {
"data": {
"words": [{
"text": string,
"start_timestamp": {
"relative": float
},
"end_timestamp": {
"relative": float
} | null
}],
"participant": {
"id": int,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object,
"email": string | null
}
},
"realtime_endpoint": {
"id": string,
"metadata": object,
},
"transcript": {
"id": string,
"metadata": object
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
}
}
}Tracking active speakers throughout the meeting
To detect when a participant starts or stops speaking, listen to the participant_events.speech_on and participant_events.speech_off events.
These events act as speaker-turn signals. By correlating them with transcript utterance timestamps, your application can determine which participant was speaking during each portion of the meeting.
To receive these events, include participant_events.speech_on and participant_events.speech_off in the events list in recording_config.realtime_endpoints. Recall will then send an event whenever a participant begins speaking and whenever they stop speaking.
Participant events real-time event payload
{
"event": string,
"data": {
"data": {
"participant": {
"id": int,
"name": string | null,
"is_host": boolean,
"platform": string | null,
"extra_data": object,
"email": string | null
},
"timestamp": {
"absolute": string, // ISO86001 string
"relative": float
},
"data": null // Will always return null
},
"realtime_endpoint": { // Real-time endpoint artifact
"id": string,
"metadata": object,
},
"participant_events": { // Participant events artifact
"id": string,
"metadata": object
},
"recording": { // Recording artifact
"id": string,
"metadata": object
},
"bot": { // Bot artifact
"id": string,
"metadata": object
}
}
}Accessing the full transcript after the bot has left the meeting
If a bot is configured to use real-time transcription, the transcript will also be available on the bot's recording after the meeting has ended.
When the meeting has ended and the transcript is available for you to query, Recall will send you a transcript.done transcript artifact status change webhook event. This webhook event provides you the bot id, recording id, and the transcript id. Then you can query the transcript from the Recall API via the following methods.
Querying the transcript's data with a transcript id
Once you have the transcript id, call the Retrieve Transcript endpoint like so:
curl --request GET \
--url 'https://RECALL_REGION.recall.ai/api/v1/transcript/{TRANSCRIPT_ID}/' \
--header "Authorization: RECALLAI_API_KEY" \
--header "accept: application/json"The response schema can be found in the Retrieve Transcript``` API reference. The response payload will contain the data.download_url field which you can use to query the transcript data that was created for this recording.
Querying the transcript data with a recording id
Once you have the recording id, call the List Transcript endpoint like so:
curl --request GET \
--url 'https://us-east-1.recall.ai/api/v1/transcript/?recording_id=RECORDING_ID&status_code=done' \
--header 'Authorization: RECALL_API_KEY' \
--header 'accept: application/json'The response schema can be found in the List Transcript API reference. The response payload will contain a list of transcript artifacts, each will contain a data.download_url field which you can use to query the transcript data that was created for this recording.
Querying the transcript data with a bot id
Once you have the bot id, call the Retrieve Bot endpoint like so:
curl --request GET \
--url https://us-east-1.recall.ai/api/v1/bot/BOT_ID/ \
--header 'Authorization: RECALL_API_KEY' \
--header 'accept: application/json'The response schema can be found in the Retrieve BotAPI reference. The response payload will contain a recordings array where you can find the recordings[i].media_shortcuts.transcript.data.download_url which you can use to query the transcript data that was created for each recording.
Transcript download url data schema
The resulting data from querying the data.download_url will be returned as follows:
[
{
"participant": {
"id": number, // Id of the participant in the meeting. This id is not unique across meetings.
"name": string | null, // Display name of the participant.
"is_host": boolean | null, // Whether the participant is the host of the meeting.
"platform": string | null, // Meeting platform constant
"extra_data": json | null, // Extra data about the participant from the meeting platform.
"email": string | null, // Email, if participant identification is turned on
},
"language_code": str, // The language code from the transcription provider, normalized to BCP-47.
// The simple code is .split('-')[0], and beware that some languages require
// 3-character codes (e.g. yue and haw)
"words": [
{
"text": string, // The text of the word.
"start_timestamp": {
"absolute": string, // ISO 8601, will return null for async transcription
"relative": number // seconds
},
"end_timestamp": {
"absolute": string, // ISO 8601, will return null for async transcription
"relative": number // seconds
}
}
]
}
]FAQs
Why are transcription webhooks delayed?
Recall will POST any results from the configured transcription provider as they're received. When using partial results, the frequency is typically in the hundreds of ms to low seconds range but varies slightly by provider. We recommend testing each 3rd party provider to see which best fits your use case.
If you're seeing large delays in results, such as seconds, or even minutes, especially increasing over the duration of the call, this is likely due to the serial nature of how webhooks are sent. Since transcription utterances are sequential and rely on being in a particular order, blocking a webhook request will delay any subsequent requests.
For this reason, if you're running in a single-threaded environment, you should make sure that any processing of the transcription webhook happens asynchronously to prevent delaying future webhooks.
How to get the transcript up to a certain point while the bot is still in the meeting?
This is not currently possible. Instead, if you need transcript data during the meeting up to a certain point, subscribe to real-time transcription events and store the transcript utterances in your application as they arrive.
Updated 2 days ago