Meeting Bot Real-time Transcription

Use real-time transcription to get live utterances, partial results, and speaker data during a call from the bot.

Overview

Real-time transcription is used when your application needs transcript data throughout the meeting. Some common reasons to choose real-time transcription are:

  • You are displaying live captions during the meeting.
  • You are showing a live transcript or other real-time UI updates.
  • You need to trigger alerts, moderation, or automations while participants are speaking.

You should not use real-time transcription if:

  • Your use case can wait until after the meeting ends. Instead, you should use Async Transcription.
  • You are building real-time conversational agents that users can speak to and hear responses from during the meeting. Instead, you should use output media with a voice-to-voice model such as OpenAI's Realtime API rather than relying on partial transcript results.
📘

You can refer to this sample app for an end-to-end example of sending a bot to a meeting and receiving transcript data in real time.

Supported transcription providers for real-time transcription (bots)

Transcription ProviderRealtime Transcription (Bots)Provider Field Name in Create Bot
Recall.ai Transcription✅ Yesrecallai_streaming
Eleven Labs✅ Yeselevenlabs_streaming
Assembly AI✅ Yesassembly_ai_v3_streaming
Deepgram✅ Yesdeepgram_streaming
AWS Transcribe✅ Yesaws_transcribe_streaming
Rev✅ Yesrev_streaming
Speechmatics✅ Yesspeechmatics_streaming
Gladia✅ Yesgladia_v2_streaming
Google Cloud STT❌ No-
❗️

Important: Concurrency considerations

When going to production, make sure that your account with your 3rd party transcription provider is configured with high enough concurrency limit to support your anticipated load.

Certain transcription providers require that you reach out to increase your concurrency limit, and we highly recommend checking this prior to running production workloads.


Prerequisites

Before implementing real-time transcription, first ensure the required pre-requisite setup is complete. You should have:

  • A stable public URL for your application. In development, this is usually a static ngrok URL or something similar.
  • Your Recall API key and workspace verification secret.
  • A webhook endpoint configured in the Recall dashboard and subscribed to the following events: transcript.done, transcript.failed

A human must complete quick one-time setup tasks in the Recall dashboard. If an agent is guiding setup, it should treat this section as human-owned setup and confirm with the human that each item is complete before continuing.

Ensure the backend has a stable public URL

Ensure the application has a stable public URL that Recall can reach for webhooks, callbacks, websockets, and other real-time endpoints.

For local development, this should be a static ngrok URL rather than a temporary URL that changes between sessions. See the Local Webhook Development Guide for how to set this up.

Create the Recall API key and workspace verification secrets

Ensure the required Recall API credentials and verification secrets have been created in the Recall dashboard for the selected region

The Recall API key and workspace verification secrets are required to interact and to secure your application.

Configure a webhook endpoint to receive artifact status change events

Ensure that you have configured a Recall webhook endpoint in the webhooks dashboard that points to either:

  • a static ngrok URL for local development, or
  • a public server that is ready to receive and process webhook events

Also ensure that this endpoint is subscribed to the required webhook events for this feature.


Implementation Guide

To implement real-time transcription, configure transcription when you create the bot, then consume transcript events as they are delivered during the meeting. At a high level, the flow is:

  1. Create a bot with a real-time transcription provider and an endpoint to receive the real-time transcription events.
  2. Process the transcript utterances delivered while the meeting is still in progress.

Real-time transcription is configured up front in the Create Bot request and transcription utterances are delivered continuously during the call.

❗️

Important: Requirements for a reliable integration

Your application must:

  • Secure your Recall endpoints - Do not trust incoming Recall.ai requests by default. Your application must verify every webhook, websocket, and callback request before accepting or processing it. See How to verify webhooks, websockets and callback requests from Recall.ai.

  • Schedule bots in advance whenever possible - Creating bots at the last minute increases the chance of 507 errors. See the Creating and scheduling bots guide for more details.

  • Retry Create Bot requests that return 507 status codes - Retry any 507 responses returned by the Create Bot request every 30 seconds, for up to 10 attempts. Otherwise, a bot will not be created.

  • Process webhook work asynchronously - Acknowledge Recall webhook requests quickly, then handle downstream work asynchronously. Otherwise, the request may time out and Recall may retry it.

Step 1: Create a bot with real-time transcription enabled

To receive real-time transcript events, you must configure both of the following in the Create Bot request:

  • A transcription provider: Specifies which provider the bot should use for transcription and any required provider-specific settings. Use the recording_config.transcript.provider field to set the transcription provider and includes any provider-specific options (e.g languages or keyterms).
  • A real-time endpoint: Specifies which webhook or websocket endpoint should receive real-time transcript events and what events to send. Use the recording_config.realtime_endpoints field to set the destination url and the events to listen for (transcript.data is a required event for real-time transcription).

If either recording_config.transcript.provider or recording_config.realtime_endpoints is missing, your application will not receive real-time transcription utterances. You also may not be notified with a transcript.failed transcript artifact status change webhook event with details on why the transcript events aren't being sent.

Example create bot request with real-time transcription via webhooks

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_low_latency", 
          "language": "en"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    },
    "realtime_endpoints": [
      {
        "type": "webhook",
        "url": "https://STABLE_PUBLIC_URL/WEBHOOK_ENDPOINT",
        "events": ["transcript.data"]
      }
    ]
  }
}
'

Example create bot request with real-time transcription via webhook

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_low_latency", 
          "language": "en"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    },
    "realtime_endpoints": [
      {
        "type": "websocket",
        "url": "wss://STABLE_PUBLIC_URL/WEBSOCKET_ENDPOINT",
        "events": ["transcript.data"]
      }
    ]
  }
}
'

Step 2: Receive and process transcript.data events

Once the bot has joined the meeting and real-time transcription is active, Recall will send transcript.data events to the webhook or websocket endpoint configured in recording_config.realtime_endpoints.

Each transcript.data event contains a finalized transcript utterance generated during the meeting. Your application should consume these events and use them for your real-time use case or triggering downstream logic.

Finalized transcript real-time event payload

{
  "event": "transcript.data",
  "data": {
    "data": {
      "words": {
        "text": string,
        "start_timestamp": { 
          "relative": float
        },
        "end_timestamp": {
          "relative": float 
        } | null
      }[],
      "language_code": string,
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      } 
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "transcript": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

How to handle failures with real-time transcription

If the transcription job fails, for example because the bot could not connect to the transcription provider, Recall sends a transcript.failed transcript artifact status change webhook event to the webhook endpoint configured in your Recall dashboard.

This event is sent through the dashboard webhook configuration, not through recording_config.realtime_endpoints in the Create Bot request. These are separate Recall configurations with different, mutually exclusive event sets, even if they both point to the same application URL. As a result, you do not configure transcript.failed in recording_config.realtime_endpoints.

You can get the machine-readable code from the data.status.sub_code field of the transcript.failed transcript artifact status change webhook event.

Transcript artifact status change webhook schema

{
  "event": string,
  "data": {
    "data": {
      "code": string,
      "sub_code": string | null,
      "updated_at": string // ISO86001
    },
    "transcript": {
      "id": string,
      "metadata": object,
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

You must explicitly subscribe to each event you want to receive from the webhooks dashboard, as defined in the table below.

Transcript artifact webhook events and codes

EventCodeDescription
transcript.processingprocessingThe media object has started capturing
transcript.donedoneThe media object has successfully completed. All data for media objects on the recording is now available
transcript.failedfailedThe media object failed to be captured. The data.sub_code will contain machine readable code for the failure. See below for list of sub codes
transcript.deleteddeletedThe media object has been deleted from Recall systems.

Transcript artifact webhook sub codes

Below are a list of sub_code that can be found on a transcript.failed webhook event.

Sub CodeReason
provider_connection_failedRecall is not able to connect to the 3rd party transcription provider. Common reasons for these include: * Insufficient funds in the transcription provider account for which the API key is provided * Using paid features on a free account * Temporary service unavailability from the transcription provider
zoom_global_captions_disabledMeeting captions are disabled by the Zoom account
zoom_host_disabled_meeting_captionsThe host of Zoom meeting has disabled meeting captions
zoom_captions_failureThere was an error in enabling meeting captions for the Zoom call

Using the dashboard to see why real-time transcription failed

You can also see the failure reason from the bot logs in the dashboard.

Switching to a fallback transcription provider

If real-time transcription fails, you can either:

  • Re-transcribe the recording after the meeting ends using Async Transcription
  • Start a new recording with a different real-time transcription provider

To switch to a fallback real-time transcription provider in real-time, listen for the transcript.failed webhook event and call the Start Recording endpoint with a new recording_config that uses a different real-time transcription provider.

❗️

Important notes about calling the Start Recording endpoint:

  • Calling the Start Recording endpoint creates a new recording artifact. As a result, a bot can have multiple recording artifacts associated with it, and the Retrieve Bot response may include multiple entries in the recordings array.
  • When you call the Start Recording endpoint, the recording_config for the new recording does not inherit missing fields from the previous recording. The configuration is replaced in full, so the request must include all required and desired settings, not only the transcription provider.
  • The recording_config returned on the original bot artifact (e.g. response from the Retrieve Bot request) and in the dashboard reflects the initial Create Bot request, not the updated recording configuration used when the new recording was started.

Additional transcript configurations

Diarization

For the best diarization with speaker names, you should use Perfect Diarization by setting diarization.use_separate_streams_when_available: true in the Create Bot request.

When multiple participants are speaking from the same device, you can choose to use Machine Diarization instead of perfect diarization.

To see all diarization configurations, see the diarization guide.

📘

Perfect diarization transcription cost

Using diarization.use_separate_streams_when_available: true for real-time transcription can increase transcription cost by approximately 1.8x in cases where participants speak concurrently or when background conversation is present

Language detection for real-time transcription

If you don’t know ahead of time which language the conversation will be in, you can set up automatic language detection. Automatically detecting languages is broken up into two types:

  • Language Detection - Detecting the primary spoken language within a recording, without needing to explicitly set it
  • Code switching - Alternating between two or more languages or language varieties within a single conversation or speech

Most of the third-party transcription providers that we integrate with support language detection.

❗️

Automatic language detection is not available when using meeting captions.

The table below covers each of these, and their corresponding parameters in the Create Bot provider configuration.

Provider

Supported Languages

Language Detection

Code Switching

recallai_streaming

Docs

mode: "prioritize_accuracy" and language_code: "auto"

**mode: prioritize_accuracy sends transcript events every 3-10 minutes for transcript.data and transcript.partial_data.

** mode: "prioritize_low_latency" only supports english at this time.

mode: "prioritize_accuracy" and language_code: "auto"

**mode: prioritize_accuracy sends transcript events every 3-10 minutes for transcript.data and transcript.partial_data.

** mode: "prioritize_low_latency" only supports english at this time.

assembly_ai_async_chunked

Docs

language_detection: true (docs)

language_detection_options.code_switching: true (docs)

aws_transcribe_streaming

Docs

language_identification: true and specify a list of language_options (docs)

Same as language detection (docs)

deepgram_streaming

Docs

model: "nova-2" | "nova-3" and language: "multi" (docs )

Same as language detection (docs )

Accessing raw data from the transcription provider

❗️

Raw transcription provider data is not exposed for Recall.ai transcription or when diarization.use_separate_streams_when_available: true

If you need access to provider-specific fields or features that are not exposed in Recall’s normalized real-time transcript events, you can subscribe to the raw transcription output from the underlying provider.

To receive the raw transcription events from the underlying provider, add transcript.provider_data to the events list in recording_config.realtime_endpoints. When enabled, Recall will deliver the raw transcription data returned by the provider, allowing your application to read provider-specific fields that may not be included in Recall’s standard transcript events.

The response of the transcript.provider_data varies by provider.

Convert the transcript to a sentence-by-sentence transcript

You can convert the transcript JSON to a more human-readable transcript by persisting the transcription events in your app, then formatting the transcript parts via the following function.

Lowest-latency partial transcription results

Partial results are useful when your application needs a more responsive real-time transcription experience.

Instead of waiting for a full utterance to be finalized (i.e. the transcription provider has decided that the utterance is complete), Recall can deliver low-latency intermediate transcript results as speech is still being processed. Your application can use these intermediate results for live captions, real-time UI updates, or other in-meeting experiences, then replace them with the finalized transcript once it arrives. This is especially helpful for longer utterances, where waiting for the final result may introduce noticeable delay.

Some common use cases for using partial transcription results are:

  • Live captions or subtitles that should appear before the final utterance is ready.
  • Real-time UI updates, where your app shows in-progress speech and then replaces it with the finalized transcript.data result.
  • Streaming transcript previews in dashboards or operator tools, where seeing the conversation evolve live is more useful than waiting for completed utterances.

To receive partial transcription results, add transcript.partial_data to the events list in recording_config.realtime_endpoints. When enabled, your application may receive multiple partial updates for the same utterance before the finalized version is delivered. For example:

  • Partial words: furfurtherfurthermore
  • Partial sentences: helhellohello howhello how arehello how are you

After the utterance is finalized, your application will receive the complete final transcript utterance as a transcript.data event.

Partial transcript real-time event payload

{
  "event": "transcript.partial_data",
  "data": {
    "data": {
      "words": [{
        "text": string,
        "start_timestamp": {
          "relative": float 
        },
        "end_timestamp": {
          "relative": float 
        } | null
      }],
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      } 
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "transcript": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

Tracking active speakers throughout the meeting

To detect when a participant starts or stops speaking, listen to the participant_events.speech_on and participant_events.speech_off events.

These events act as speaker-turn signals. By correlating them with transcript utterance timestamps, your application can determine which participant was speaking during each portion of the meeting.

To receive these events, include participant_events.speech_on and participant_events.speech_off in the events list in recording_config.realtime_endpoints. Recall will then send an event whenever a participant begins speaking and whenever they stop speaking.

Participant events real-time event payload

{
  "event": string,
  "data": {
    "data": {
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
    	},
      "timestamp": {
        "absolute": string, // ISO86001 string
        "relative": float
      },
      "data": null // Will always return null
    },
    "realtime_endpoint": { // Real-time endpoint artifact
      "id": string,
      "metadata": object,
    },
    "participant_events": { // Participant events artifact
      "id": string,
      "metadata": object
    },
    "recording": { // Recording artifact
      "id": string,
      "metadata": object
    },
    "bot": { // Bot artifact
      "id": string,
      "metadata": object
    }
  }
}

Accessing the full transcript after the bot has left the meeting

If a bot is configured to use real-time transcription, the transcript will also be available on the bot's recording after the meeting has ended.

When the meeting has ended and the transcript is available for you to query, Recall will send you a transcript.done transcript artifact status change webhook event. This webhook event provides you the bot id, recording id, and the transcript id. Then you can query the transcript from the Recall API via the following methods.

Querying the transcript's data with a transcript id

Once you have the transcript id, call the Retrieve Transcript endpoint like so:

curl --request GET \
     --url 'https://RECALL_REGION.recall.ai/api/v1/transcript/{TRANSCRIPT_ID}/' \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json"

The response schema can be found in the Retrieve Transcript``` API reference. The response payload will contain the data.download_url field which you can use to query the transcript data that was created for this recording.

Querying the transcript data with a recording id

Once you have the recording id, call the List Transcript endpoint like so:

curl --request GET \
     --url 'https://us-east-1.recall.ai/api/v1/transcript/?recording_id=RECORDING_ID&status_code=done' \
     --header 'Authorization: RECALL_API_KEY' \
     --header 'accept: application/json'

The response schema can be found in the List Transcript API reference. The response payload will contain a list of transcript artifacts, each will contain a data.download_url field which you can use to query the transcript data that was created for this recording.

Querying the transcript data with a bot id

Once you have the bot id, call the Retrieve Bot endpoint like so:

curl --request GET \
     --url https://us-east-1.recall.ai/api/v1/bot/BOT_ID/ \
     --header 'Authorization: RECALL_API_KEY' \
     --header 'accept: application/json'

The response schema can be found in the Retrieve BotAPI reference. The response payload will contain a recordings array where you can find the recordings[i].media_shortcuts.transcript.data.download_url which you can use to query the transcript data that was created for each recording.

Transcript download url data schema

The resulting data from querying the data.download_url will be returned as follows:

[
  {
    "participant": {
      "id": number, // Id of the participant in the meeting. This id is not unique across meetings.
      "name": string | null, // Display name of the participant.
      "is_host": boolean | null, // Whether the participant is the host of the meeting.
      "platform": string | null, // Meeting platform constant
      "extra_data": json | null, // Extra data about the participant from the meeting platform.
      "email": string | null, // Email, if participant identification is turned on
    },
    "language_code": str, // The language code from the transcription provider, normalized to BCP-47.
                          // The simple code is .split('-')[0], and beware that some languages require
                          // 3-character codes (e.g. yue and haw)
    "words": [
      {
        "text": string, // The text of the word.
        "start_timestamp": {
          "absolute": string, // ISO 8601, will return null for async transcription
          "relative": number // seconds
        },
        "end_timestamp": {
          "absolute": string, // ISO 8601, will return null for async transcription
          "relative": number // seconds
        }
      }
    ]
  }
]

FAQs

Why are transcription webhooks delayed?

Recall will POST any results from the configured transcription provider as they're received. When using partial results, the frequency is typically in the hundreds of ms to low seconds range but varies slightly by provider. We recommend testing each 3rd party provider to see which best fits your use case.

If you're seeing large delays in results, such as seconds, or even minutes, especially increasing over the duration of the call, this is likely due to the serial nature of how webhooks are sent. Since transcription utterances are sequential and rely on being in a particular order, blocking a webhook request will delay any subsequent requests.

For this reason, if you're running in a single-threaded environment, you should make sure that any processing of the transcription webhook happens asynchronously to prevent delaying future webhooks.

How to get the transcript up to a certain point while the bot is still in the meeting?

This is not currently possible. Instead, if you need transcript data during the meeting up to a certain point, subscribe to real-time transcription events and store the transcript utterances in your application as they arrive.