Transcription overview

Choose the right transcription workflow and speech-to-text provider so you can generate accurate meeting transcripts in real time or after the call.

Transcription is the process of converting the audio from a meeting recording into a written transcript that captures what was said at specific timestamps and which participant said it.

Developers use transcripts to power features like meeting summaries, action items, search, coaching, compliance review, and real-time AI experiences.

Note that transcription is not enabled by default. You will need to generate a transcript for a given recording using real-time or post-meeting transcription.


Implementing transcription

Quickstart: How to transcribe a meeting

If using meeting bots: create a bot with transcription enabled

You can create a meeting bot with a transcript through the following Create Bot request:

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/bot/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "meeting_url": MEETING_URL,
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_accuracy", 
          "language_code": "auto"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    }
  }
}
'

If using Desktop Recording SDK: Create a Desktop SDK recording with transcription enabled

You can create a Desktop SDK Recording with a transcript through the following Create Desktop SDK Upload request:

curl --request POST \
     --url https://RECALL_REGION.recall.ai/api/v1/sdk_upload/ \
     --header "Authorization: RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "recording_config": {
    "transcript": {
      "provider": {
        "recallai_streaming": {
          "mode": "prioritize_accuracy", 
          "language_code": "auto"
        }
      },
      "diarization": {
        "use_separate_streams_when_available": true
      }
    }
  }
}
'

Fetch transcript after the meeting

Then retrieve the transcript from the media_shortcuts.transcript.data.download_url field on the recording artifact using the Retrieve Recording endpoint.

Transcription implementation guides

Recall supports three transcription workflows for generating transcripts from meeting recordings. The right workflow depends on when you need transcript data and what you are using to capture the meeting or conversation.

Transcription workflowUse whenImplementation Guide
Post-meeting transcriptionYou only need a final transcript after the recording completes. Works for recordings captured with Meeting Bots or Desktop Recording SDK.Post-meeting transcription
Real-time transcription for meeting BotsYou need live transcript data from a Zoom, Google Meet, or Microsoft Teams meeting joined by a meeting bot.Real-time Transcription for Meeting Bots
Real-time transcription for Desktop Recording SDKYou need live transcript data from audio captured locally with the Desktop Recording SDK, such as in-person meetings, browser-based calls, or desktop audio capture workflows.Real-time Transcription for Desktop Recording SDK

Speech-to-text provider guides

Use the provider guides below to configure the speech-to-text provider you want to use for post-meeting or real-time transcription.

Speech-to-text providerSetup guidePost-meeting transcription guideReal-time transcription guide
Recall.ai TranscriptionNot requiredJump hereJump here
ElevenLabsJump hereJump hereJump here
DeepgramJump hereJump hereJump here
AssemblyAIJump hereJump hereJump here
AWS TranscribeJump hereJump hereJump here
RevJump hereJump hereJump here
SpeechmaticsJump hereJump hereJump here

Key transcription features

Transcription behavior can vary depending on the transcription workflow, speech-to-text provider, and features you enable. Review the features below before choosing your transcription setup.

FeatureWhat it meansLearn more
DiarizationIdentifies who said what in a transcript. Depending on the method and provider, speaker labels may be tied to known meeting participants or returned as generic speaker labels.Diarization
Multilingual transcriptionDetermines how transcripts are generated when meetings include one or more spoken languages. Language support, language detection, code-switching, and multilingual behavior can vary by provider.Multilingual Transcription
Provider dataRaw transcript data returned by the underlying speech-to-text provider. Use provider data when you need provider-specific fields that are not included in Recall.ai’s normalized transcript data.Accessing text-to-speech provider-specific fields
Partial transcriptsOnly for real-time transcription. Streams interim transcript data before an utterance is finalized. Use this when your application needs lower-latency live experiences.Partial transcript data

Important transcription concepts

Before choosing a transcription workflow, it helps to understand the core concepts used throughout Recall.ai’s transcription docs.

ConceptDescription
RecordingThe captured audio and/or video from a meeting using Meeting Bots or Desktop Recording SDK. Transcripts are generated from recordings, either live during the meeting as it is being recorded or after the recording completes.
TranscriptThe written text generated from meeting audio. It captures what was said and can include speaker labels, timestamps, and other metadata depending on the transcription workflow and provider.
Transcription workflowWhen and how transcript data is generated. Recall.ai supports real-time transcription for receiving transcript data during a live meeting and post-meeting transcription for generating a final transcript after a recording completes.
Speech-to-text providerThe speech-to-text transcription service that generates the transcript from the recording. This can be Recall.ai Transcription or a supported third-party speech-to-text provider.
UtteranceA single piece of transcript text from one speaker. A transcript is usually made up of many utterances, each with its own text, speaker label, timestamps, and metadata.