Transcription is the process of converting the audio from a meeting recording into a written transcript that captures what was said at specific timestamps and which participant said it.

Developers use transcripts to power features like meeting summaries, action items, search, coaching, compliance review, and real-time AI experiences.

Note that transcription is not enabled by default. You will need to generate a transcript for a given recording using real-time or post-meeting transcription.

Implementing transcription

Quickstart: How to transcribe a meeting

If using meeting bots: create a bot with transcription enabled

You can create a meeting bot with a transcript through the following Create Bot request:

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_accuracy", 
          "language_code": "auto"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    }
  }
}
'

If using Desktop Recording SDK: Create a Desktop SDK recording with transcription enabled

You can create a Desktop SDK Recording with a transcript through the following Create Desktop SDK Upload request:

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/sdk_upload/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_accuracy", 
          "language_code": "auto"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    }
  }
}
'

Fetch transcript after the meeting

Then retrieve the transcript from the media_shortcuts.transcript.data.download_url field on the recording artifact using the Retrieve Recording endpoint.

Transcription implementation guides

Recall supports three transcription workflows for generating transcripts from meeting recordings. The right workflow depends on when you need transcript data and what you are using to capture the meeting or conversation.

Transcription workflow	Use when	Implementation Guide
Post-meeting transcription	You only need a final transcript after the recording completes. Works for recordings captured with Meeting Bots or Desktop Recording SDK.	Post-meeting transcription
Real-time transcription for meeting Bots	You need live transcript data from a Zoom, Google Meet, or Microsoft Teams meeting joined by a meeting bot.	Real-time Transcription for Meeting Bots
Real-time transcription for Desktop Recording SDK	You need live transcript data from audio captured locally with the Desktop Recording SDK, such as in-person meetings, browser-based calls, or desktop audio capture workflows.	Real-time Transcription for Desktop Recording SDK

Speech-to-text provider guides

Use the provider guides below to configure the speech-to-text provider you want to use for post-meeting or real-time transcription.

Speech-to-text provider	Setup guide	Post-meeting transcription guide	Real-time transcription guide
Recall.ai Transcription (recommended)	Not required	Jump here	Jump here
ElevenLabs	Jump here	Jump here	Jump here
Deepgram	Jump here	Jump here	Jump here
AssemblyAI	Jump here	Jump here	Jump here
AWS Transcribe	Jump here	Jump here	Jump here
Rev	Jump here	Jump here	Jump here
Speechmatics	Jump here	Jump here	Jump here

Key transcription features

Transcription behavior can vary depending on the transcription workflow, speech-to-text provider, and features you enable. Review the features below before choosing your transcription setup.

Feature	What it means	Learn more
Diarization	Identifies who said what in a transcript. Depending on the method and provider, speaker labels may be tied to known meeting participants or returned as generic speaker labels.	Diarization
Multilingual transcription	Determines how transcripts are generated when meetings include one or more spoken languages. Language support, language detection, code-switching, and multilingual behavior can vary by provider.	Multilingual Transcription
Provider data	Raw transcript data returned by the underlying speech-to-text provider. Use provider data when you need provider-specific fields that are not included in Recall.ai’s normalized transcript data.	Accessing text-to-speech provider-specific fields
Partial transcripts	Only for real-time transcription. Streams interim transcript data before an utterance is finalized. Use this when your application needs lower-latency live experiences.	Partial transcript data

Important transcription concepts

Before choosing a transcription workflow, it helps to understand the core concepts used throughout Recall.ai’s transcription docs.

Concept	Description
Recording	The captured audio and/or video from a meeting using Meeting Bots or Desktop Recording SDK. Transcripts are generated from recordings, either live during the meeting as it is being recorded or after the recording completes.
Transcript	The written text generated from meeting audio. It captures what was said and can include speaker labels, timestamps, and other metadata depending on the transcription workflow and provider.
Transcription workflow	When and how transcript data is generated. Recall.ai supports real-time transcription for receiving transcript data during a live meeting and post-meeting transcription for generating a final transcript after a recording completes.
Speech-to-text provider	The speech-to-text transcription service that generates the transcript from the recording. This can be Recall.ai Transcription or a supported third-party speech-to-text provider.
Utterance	A single piece of transcript text from one speaker. A transcript is usually made up of many utterances, each with its own text, speaker label, timestamps, and metadata.

Transcription overview