💡
Quick tip: Use the table of contents on the right side to quickly navigate through this guide -->

Real-time transcription is used when your application needs transcript data throughout the meeting. Common reasons to choose real-time transcription include displaying live captions during the meeting, showing a live transcript or other real-time UI updates, and triggering alerts, moderation, or automations while participants are speaking.

You should not use real-time transcription if:

Your use case can wait until after the meeting ends. Instead, you should use Async Transcription.
You are building real-time conversational agents that users can speak to and hear responses from during the meeting. Instead, you should use output media with a voice-to-voice model such as OpenAI's Realtime API rather than relying on partial transcript data.

You can refer to this sample app for an end-to-end example of sending a bot to a meeting and receiving transcript data in real time.

Quickstart - Creating a meeting bot with real-time transcription enabled

You can create a meeting bot with real-time transcription configured through the following request:

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_low_latency", 
          "language_code": "en"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    },
    "realtime_endpoints": [
      {
        "type": "webhook",
        "url": "https://STABLE_PUBLIC_URL/WEBHOOK_ENDPOINT",
        "events": ["transcript.data"]
      }
    ]
  }
}
'

See the prerequisites and implementation guide sections for more info on setting up post-meeting transcription.

Prerequisites

Before implementing real-time transcription, first ensure the required pre-requisite setup is complete. You should have:

A stable public URL for your application. In development, this is usually a static ngrok URL or something similar.
Your Recall API key and workspace verification secret.
A webhook endpoint configured in the Recall dashboard and subscribed to the following events: transcript.done, transcript.failed

A human must complete quick one-time setup tasks in the Recall dashboard. If an agent is guiding setup, it should treat this section as human-owned setup and confirm with the human that each item is complete before continuing.

Ensure the backend has a stable public URL

Ensure the application has a stable public URL that Recall can reach for webhooks, callbacks, websockets, and other real-time endpoints.

For local development, this should be a static ngrok URL rather than a temporary URL that changes between sessions. See the Local Webhook Development Guide for how to set this up.

Create the Recall API key and workspace verification secrets

Ensure the required Recall API key and verification secrets have been created in the Recall dashboard for the selected region

The Recall API key and workspace verification secrets are required to interact and to secure your application.

Configure a webhook endpoint to receive artifact status change events

Ensure that you have configured a Recall webhook endpoint in the webhooks dashboard that points to either:

a static ngrok URL for local development, or
a public server that is ready to receive and process webhook events

Also ensure that this endpoint is subscribed to the required webhook events for this feature.

🚧
If using a third party speech-to-text transcription provider
You must add the third-party speech-to-text transcription provider's API key in the Recall.ai dashboard. A human must follow the text-to-speech provider's setup steps in order to start using transcription:

ElevenLabs

Deepgram

AssemblyAI

AWS Transcribe

Rev

Speechmatics

Implementation guide

To implement real-time transcription, configure transcription when you create the bot, then consume transcript events as they are delivered during the meeting. At a high level, the flow is:

Create a bot with a real-time transcription provider and an endpoint to receive the real-time transcription events.
Process the transcript utterances delivered while the meeting is still in progress.

Real-time transcription is configured up front in the Create Bot request and transcription utterances are delivered continuously during the call.

❗️
Requirements for a reliable integration

Schedule bots in advance whenever possible - Creating bots at the last minute increases the chance of 507 errors. See the Creating and scheduling bots guide for more details.

Retry Create Bot requests that return 507 status codes - Retry any 507 responses returned by the Create Bot request every 30 seconds, for up to 10 attempts. Otherwise, a bot will not be created.

Process webhook work asynchronously - Acknowledge Recall webhook requests quickly, then handle downstream work asynchronously. Otherwise, the request may time out and Recall may retry it.

Step 1: Create a bot with real-time transcription enabled

To receive real-time transcript events, you must configure both of the following in the Create Bot request:

A transcription provider: Specifies which provider the bot should use for transcription and any required provider-specific settings. Use the recording_config.transcript.provider field to set the transcription provider and includes any provider-specific options (e.g languages or keyterms).
A real-time endpoint: Specifies which webhook or websocket endpoint should receive real-time transcript events and what events to send. Use the recording_config.realtime_endpoints field to set the destination url and the events to listen for (transcript.data is a required event for real-time transcription).

If either recording_config.transcript.provider or recording_config.realtime_endpoints is missing, your application will not receive real-time transcription utterances. You also may not be notified with a transcript.failed transcript artifact status change webhook event with details on why the transcript events aren't being sent.

Example create bot request with real-time transcription via webhooks

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_low_latency", 
          "language_code": "en"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    },
    "realtime_endpoints": [
      {
        "type": "webhook",
        "url": "https://STABLE_PUBLIC_URL/WEBHOOK_ENDPOINT",
        "events": ["transcript.data"]
      }
    ]
  }
}
'

Example create bot request with real-time transcription via websocket

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_low_latency", 
          "language_code": "en"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    },
    "realtime_endpoints": [
      {
        "type": "websocket",
        "url": "wss://STABLE_PUBLIC_URL/WEBSOCKET_ENDPOINT",
        "events": ["transcript.data"]
      }
    ]
  }
}
'

Step 2: Verify the real-time event

Once the bot has joined the meeting and real-time transcription is active, Recall will send transcript.data events to the webhook or websocket endpoint configured in recording_config.realtime_endpoints. Before processing any transcript.data event, your backend must verify the request signature using your workspace verification secret.

Recall signs webhook and WebSocket upgrade requests using your workspace verification secret. Your endpoint must verify the webhook request or Websocket upgrade request using the verification helper function. For implementation details, follow the Verifying requests from Recall.ai guide.

Do not parse, store, enqueue, or process the webhook payload until verification succeeds. If verification fails, return a non-2xx response and stop processing.

Step 3: Process verified `transcript.data` events

Each transcript.data event contains a finalized transcript utterance generated during the meeting. Your application should consume these events and use them for your real-time use case or triggering downstream logic. You must return a 2xx response immediately and process all work asynchronously.

Finalized transcript real-time event payload

{
  "event": "transcript.data",
  "data": {
    "data": {
      "words": {
        "text": string,
        "start_timestamp": { 
          "relative": float
        },
        "end_timestamp": {
          "relative": float 
        } | null
      }[],
      "language_code": string,
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      } 
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "transcript": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

Step 4: Handle failures with real-time transcription

If the transcription job fails, for example because the bot could not connect to the transcription provider, Recall sends a transcript.failed transcript artifact status change webhook event to the webhook endpoint configured in your Recall dashboard.

This event is sent through the dashboard webhook configuration, not through recording_config.realtime_endpoints in the Create Bot request. These are separate Recall configurations with different, mutually exclusive event sets, even if they both point to the same application URL. As a result, you do not configure transcript.failed in recording_config.realtime_endpoints.

You can get the machine-readable code from the data.status.sub_code field of the transcript.failed transcript artifact status change webhook event.

Transcript artifact status change webhook schema

{
  "event": string,
  "data": {
    "data": {
      "code": string,
      "sub_code": string | null,
      "updated_at": string // ISO86001
    },
    "transcript": {
      "id": string,
      "metadata": object,
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

You must explicitly subscribe to each event you want to receive from the webhooks dashboard, as defined in the table below.

Transcript artifact webhook events and codes

Event	Code	Description
`transcript.processing`	`processing`	The media object has started capturing
`transcript.done`	`done`	The media object has successfully completed. All data for media objects on the recording is now available
`transcript.failed`	`failed`	The media object failed to be captured. The `data.sub_code` will contain machine readable code for the failure. See below for list of sub codes
`transcript.deleted`	`deleted`	The media object has been deleted from Recall systems.

Transcript artifact webhook sub codes

Below are a list of sub_code that can be found on a transcript.failed webhook event.

Sub Code	Reason
`provider_connection_failed`	Recall is not able to connect to the 3rd party transcription provider. Common reasons for these include: Insufficient funds in the transcription provider account for which the API key is provided Using paid features on a free account * Temporary service unavailability from the transcription provider
`zoom_global_captions_disabled`	Meeting captions are disabled by the Zoom account
`zoom_host_disabled_meeting_captions`	The host of Zoom meeting has disabled meeting captions
`zoom_captions_failure`	There was an error in enabling meeting captions for the Zoom call

Using the dashboard to see why real-time transcription failed

You can also see the failure reason from the bot logs in the dashboard.

Switching to a fallback transcription provider

If real-time transcription fails, you can either:

Re-transcribe the recording after the meeting ends using Async Transcription
Start a new recording with a different real-time transcription provider

To switch to a fallback real-time transcription provider in real-time, listen for the transcript.failed webhook event and call the Start Recording endpoint with a new recording_config that uses a different real-time transcription provider.

❗️
Important notes about calling the Start Recording endpoint:

Calling the Start Recording endpoint creates a new recording artifact. As a result, a bot can have multiple recording artifacts associated with it, and the Retrieve Bot response may include multiple entries in the recordings array.

When you call the Start Recording endpoint, the recording_config for the new recording does not inherit missing fields from the previous recording. The configuration is replaced in full, so the request must include all required and desired settings, not only the transcription provider.

The recording_config returned on the original bot artifact (e.g. response from the Retrieve Bot request) and in the dashboard reflects the initial Create Bot request, not the updated recording configuration used when the new recording was started.

Additional transcript configurations

Diarization

For the best diarization with speaker names, you should use Perfect Diarization by setting diarization.use_separate_streams_when_available: true in the Create Bot request.

When multiple participants are speaking from the same device, you can choose to use Machine Diarization instead of perfect diarization.

To see all diarization configurations, see the diarization guide.

📘
Perfect diarization transcription cost
Using diarization.use_separate_streams_when_available: true for real-time transcription can increase transcription cost by approximately 1.8x in cases where participants speak concurrently or when background conversation is present

Language detection for real-time transcription

Most of the third-party transcription providers that we integrate with support language detection. For more details of code switching see Multilingual Transcription.

Accessing provider-specific fields from the speech-to-text transcription provider

❗️
Provider data is not exposed for Recall.ai transcription or when diarization.use_separate_streams_when_available: true

If you need access to provider-specific fields or features that are not exposed in Recall’s normalized real-time transcript events, you can subscribe to the raw transcription output from the underlying provider.

To receive the provider data events, add transcript.provider_data to the events list in recording_config.realtime_endpoints. When enabled, Recall will deliver the raw transcription data returned by the provider, allowing your application to read provider-specific fields that may not be included in Recall’s standard transcript events.

The response of the transcript.provider_data varies by provider.

Convert the transcript to a sentence-by-sentence transcript

You can convert the transcript JSON to a more human-readable transcript by persisting the transcription events in your app, then formatting the transcript parts via the following function.

Lowest-latency partial transcript data

Partial data is useful when your application needs a more responsive real-time transcription experience.

Instead of waiting for a full utterance to be finalized (i.e. the transcription provider has decided that the utterance is complete), Recall can deliver low-latency intermediate transcript data as speech is still being processed. Your application can use these intermediate data for live captions, real-time UI updates, or other in-meeting experiences, then replace them with the finalized transcript once it arrives. This is especially helpful for longer utterances, where waiting for the final result may introduce noticeable delay.

Some common use cases for using partial transcript data are:

Live captions or subtitles that should appear before the final utterance is ready.
Real-time UI updates, where your app shows in-progress speech and then replaces it with the finalized transcript.data result.
Streaming transcript previews in dashboards or operator tools, where seeing the conversation evolve live is more useful than waiting for completed utterances.

To receive partial transcript data, add transcript.partial_data to the events list in recording_config.realtime_endpoints. When enabled, your application may receive multiple partial updates for the same utterance before the finalized version is delivered. For example:

Partial words: fur → further → furthermore
Partial sentences: hel → hello → hello how → hello how are → hello how are you

After the utterance is finalized, your application will receive the complete final transcript utterance as a transcript.data event.

Partial transcript data real-time event payload

{
  "event": "transcript.partial_data",
  "data": {
    "data": {
      "words": [{
        "text": string,
        "start_timestamp": {
          "relative": float 
        },
        "end_timestamp": {
          "relative": float 
        } | null
      }],
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
      } 
    },
    "realtime_endpoint": {
      "id": string,
      "metadata": object,
    },
    "transcript": {
      "id": string,
      "metadata": object
    },
    "recording": {
      "id": string,
      "metadata": object
    },
    "bot": {
      "id": string,
      "metadata": object
    }
  }
}

Tracking active speakers throughout the meeting

To detect when a participant starts or stops speaking, listen to the participant_events.speech_on and participant_events.speech_off events.

These events act as speaker-turn signals. By correlating them with transcript utterance timestamps, your application can determine which participant was speaking during each portion of the meeting.

To receive these events, include participant_events.speech_on and participant_events.speech_off in the events list in recording_config.realtime_endpoints. Recall will then send an event whenever a participant begins speaking and whenever they stop speaking.

Participant events real-time event payload

{
  "event": string,
  "data": {
    "data": {
      "participant": {
      	"id": int,
      	"name": string | null,
        "is_host": boolean,
        "platform": string | null,
        "extra_data": object,
        "email": string | null
    	},
      "timestamp": {
        "absolute": string, // ISO86001 string
        "relative": float
      },
      "data": null // Will always return null
    },
    "realtime_endpoint": { // Real-time endpoint artifact
      "id": string,
      "metadata": object,
    },
    "participant_events": { // Participant events artifact
      "id": string,
      "metadata": object
    },
    "recording": { // Recording artifact
      "id": string,
      "metadata": object
    },
    "bot": { // Bot artifact
      "id": string,
      "metadata": object
    }
  }
}

Accessing the full transcript after the bot has left the meeting

If a bot is configured to use real-time transcription, the transcript will also be available on the bot's recording after the meeting has ended.

When the meeting has ended and the transcript is available for you to query, Recall will send you a transcript.done transcript artifact status change webhook event. This webhook event provides you the bot id, recording id, and the transcript id. Then you can query the transcript from the Recall API via the following methods.

Querying the transcript's data with a transcript id

Once you have the transcript id, call the Retrieve Transcript endpoint like so:

curl --request GET \
     --url 'https://RECALL_REGION.recall.ai/api/v1/transcript/{TRANSCRIPT_ID}/' \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json"

The response schema can be found in the Retrieve Transcript``` API reference. The response payload will contain the data.download_url field which you can use to query the transcript data that was created for this recording.

Querying the transcript data with a recording id

Once you have the recording id, call the List Transcript endpoint like so:

curl --request GET \
     --url 'https://us-east-1.recall.ai/api/v1/transcript/?recording_id=RECORDING_ID&status_code=done' \
     --header 'Authorization: RECALL_API_KEY' \
     --header 'accept: application/json'

The response schema can be found in the List Transcript API reference. The response payload will contain a list of transcript artifacts, each will contain a data.download_url field which you can use to query the transcript data that was created for this recording.

Querying the transcript data with a bot id

Once you have the bot id, call the Retrieve Bot endpoint like so:

curl --request GET \
     --url https://us-east-1.recall.ai/api/v1/bot/BOT_ID/ \
     --header 'Authorization: RECALL_API_KEY' \
     --header 'accept: application/json'

The response schema can be found in the Retrieve BotAPI reference. The response payload will contain a recordings array where you can find the recordings[i].media_shortcuts.transcript.data.download_url which you can use to query the transcript data that was created for each recording.

Transcript download url data schema

The resulting data from querying the data.download_url will be returned as follows:

[
  {
    "participant": {
      "id": number, // Id of the participant in the meeting. This id is not unique across meetings.
      "name": string | null, // Display name of the participant.
      "is_host": boolean | null, // Whether the participant is the host of the meeting.
      "platform": string | null, // Meeting platform constant
      "extra_data": json | null, // Extra data about the participant from the meeting platform.
      "email": string | null, // Email, if participant identification is turned on
    },
    "language_code": str, // The language code from the transcription provider, normalized to BCP-47.
                          // The simple code is .split('-')[0], and beware that some languages require
                          // 3-character codes (e.g. yue and haw)
    "words": [
      {
        "text": string, // The text of the word.
        "start_timestamp": {
          "absolute": string, // ISO 8601, will return null for async transcription
          "relative": number // seconds
        },
        "end_timestamp": {
          "absolute": string, // ISO 8601, will return null for async transcription
          "relative": number // seconds
        }
      }
    ]
  }
]

FAQs

Why are transcription webhooks delayed?

Recall will POST any data from the configured transcription provider as they're received. When using partial transcript data, the frequency is typically in the hundreds of ms to low seconds range but varies slightly by provider. We recommend testing each 3rd party provider to see which best fits your use case.

If you're seeing large delays in transcription data, such as seconds, or even minutes, especially increasing over the duration of the call, this is likely due to the serial nature of how webhooks are sent. Since transcription utterances are sequential and rely on being in a particular order, blocking a webhook request will delay any subsequent requests.

For this reason, if you're running in a single-threaded environment, you should make sure that any processing of the transcription webhook happens asynchronously to prevent delaying future webhooks.

How to get the transcript up to a certain point while the bot is still in the meeting?

This is not currently possible. Instead, if you need transcript data during the meeting up to a certain point, subscribe to real-time transcription events and store the transcript utterances in your application as they arrive.

Supported transcription providers for real-time transcription (bots)

Speech-to-text provider	Real-time transcription (bots)	Provider field name in Create Bot
Recall.ai Transcription	✅ Yes	`recallai_streaming`
Eleven Labs	✅ Yes	`elevenlabs_streaming`
Assembly AI	✅ Yes	`assembly_ai_v3_streaming`
Deepgram	✅ Yes	`deepgram_streaming`
AWS Transcribe	✅ Yes	`aws_transcribe_streaming`
Rev	✅ Yes	`rev_streaming`
Speechmatics	✅ Yes	`speechmatics_streaming`
Gladia	✅ Yes	`gladia_v2_streaming`
Google Cloud STT	❌ No	-

What is the expected real-time transcription latency?

We send transcript utterances as soon as the transcription provider finalizes them so latency will vary by transcription provider and the configs you use. That said, you can typically expect webhook updates every 1–3 seconds.

Note that if you're using assembly_ai_async_chunked or recallai_streaming in prioritize_accuracy mode (default), then transcript utterances latency will differ because they're using async (pre-recorded, non-real-time) transcription models under the hood.

Why are real-time transcription webhooks come as single words instead of full sentences?

Real-time transcription webhooks often come as single words rather than full sentences because real-time transcription is streaming results as soon as it can (often word-by-word) to minimize latency.

Quick tip: Use the table of contents on the right side to quickly navigate through this guide -->

Quickstart - Creating a meeting bot with real-time transcription enabled

Prerequisites

If using a third party speech-to-text transcription provider

Implementation guide

Requirements for a reliable integration

Step 1: Create a bot with real-time transcription enabled

Example create bot request with real-time transcription via webhooks

Example create bot request with real-time transcription via websocket

Step 2: Verify the real-time event

Step 3: Process verified transcript.data events

Finalized transcript real-time event payload

Step 4: Handle failures with real-time transcription

Transcript artifact status change webhook schema

Transcript artifact webhook events and codes

Transcript artifact webhook sub codes

Using the dashboard to see why real-time transcription failed

Switching to a fallback transcription provider

Important notes about calling the Start Recording endpoint:

Additional transcript configurations

Diarization

Perfect diarization transcription cost

Language detection for real-time transcription

Accessing provider-specific fields from the speech-to-text transcription provider

Convert the transcript to a sentence-by-sentence transcript

Lowest-latency partial transcript data

Partial transcript data real-time event payload

Tracking active speakers throughout the meeting

Participant events real-time event payload

Accessing the full transcript after the bot has left the meeting

Querying the transcript's data with a transcript id

Querying the transcript data with a recording id

Querying the transcript data with a bot id

Transcript download url data schema

FAQs

Why are transcription webhooks delayed?

How to get the transcript up to a certain point while the bot is still in the meeting?

Supported transcription providers for real-time transcription (bots)

What is the expected real-time transcription latency?

Why are real-time transcription webhooks come as single words instead of full sentences?

Step 3: Process verified `transcript.data` events