Transcription overview
Choose the right transcription workflow and speech-to-text provider so you can generate accurate meeting transcripts in real time or after the call.
Transcription is the process of converting the audio from a meeting recording into a written transcript that captures what was said at specific timestamps and which participant said it.
Developers use transcripts to power features like meeting summaries, action items, search, coaching, compliance review, and real-time AI experiences.
Note that transcription is not enabled by default. You will need to generate a transcript for a given recording using real-time or post-meeting transcription.
Implementing transcription
Quickstart: How to transcribe a meeting
If using meeting bots: create a bot with transcription enabled
You can create a meeting bot with a transcript through the following Create Bot request:
curl --request POST \
--url https://RECALL_REGION.recall.ai/api/v1/bot/ \
--header "Authorization: RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"meeting_url": MEETING_URL,
"recording_config": {
"transcript": {
"provider": {
"recallai_streaming": {
"mode": "prioritize_accuracy",
"language_code": "auto"
}
},
"diarization": {
"use_separate_streams_when_available": true
}
}
}
}
'
If using Desktop Recording SDK: Create a Desktop SDK recording with transcription enabled
You can create a Desktop SDK Recording with a transcript through the following Create Desktop SDK Upload request:
curl --request POST \
--url https://RECALL_REGION.recall.ai/api/v1/sdk_upload/ \
--header "Authorization: RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"recording_config": {
"transcript": {
"provider": {
"recallai_streaming": {
"mode": "prioritize_accuracy",
"language_code": "auto"
}
},
"diarization": {
"use_separate_streams_when_available": true
}
}
}
}
'
Fetch transcript after the meeting
Then retrieve the transcript from the media_shortcuts.transcript.data.download_url field on the recording artifact using the Retrieve Recording endpoint.
Transcription implementation guides
Recall supports three transcription workflows for generating transcripts from meeting recordings. The right workflow depends on when you need transcript data and what you are using to capture the meeting or conversation.
| Transcription workflow | Use when | Implementation Guide |
|---|---|---|
| Post-meeting transcription | You only need a final transcript after the recording completes. Works for recordings captured with Meeting Bots or Desktop Recording SDK. | Post-meeting transcription |
| Real-time transcription for meeting Bots | You need live transcript data from a Zoom, Google Meet, or Microsoft Teams meeting joined by a meeting bot. | Real-time Transcription for Meeting Bots |
| Real-time transcription for Desktop Recording SDK | You need live transcript data from audio captured locally with the Desktop Recording SDK, such as in-person meetings, browser-based calls, or desktop audio capture workflows. | Real-time Transcription for Desktop Recording SDK |
Speech-to-text provider guides
Use the provider guides below to configure the speech-to-text provider you want to use for post-meeting or real-time transcription.
| Speech-to-text provider | Setup guide | Post-meeting transcription guide | Real-time transcription guide |
|---|---|---|---|
| Recall.ai Transcription | Not required | Jump here | Jump here |
| ElevenLabs | Jump here | Jump here | Jump here |
| Deepgram | Jump here | Jump here | Jump here |
| AssemblyAI | Jump here | Jump here | Jump here |
| AWS Transcribe | Jump here | Jump here | Jump here |
| Rev | Jump here | Jump here | Jump here |
| Speechmatics | Jump here | Jump here | Jump here |
Key transcription features
Transcription behavior can vary depending on the transcription workflow, speech-to-text provider, and features you enable. Review the features below before choosing your transcription setup.
| Feature | What it means | Learn more |
|---|---|---|
| Diarization | Identifies who said what in a transcript. Depending on the method and provider, speaker labels may be tied to known meeting participants or returned as generic speaker labels. | Diarization |
| Multilingual transcription | Determines how transcripts are generated when meetings include one or more spoken languages. Language support, language detection, code-switching, and multilingual behavior can vary by provider. | Multilingual Transcription |
| Provider data | Raw transcript data returned by the underlying speech-to-text provider. Use provider data when you need provider-specific fields that are not included in Recall.ai’s normalized transcript data. | Accessing text-to-speech provider-specific fields |
| Partial transcripts | Only for real-time transcription. Streams interim transcript data before an utterance is finalized. Use this when your application needs lower-latency live experiences. | Partial transcript data |
Important transcription concepts
Before choosing a transcription workflow, it helps to understand the core concepts used throughout Recall.ai’s transcription docs.
| Concept | Description |
|---|---|
| Recording | The captured audio and/or video from a meeting using Meeting Bots or Desktop Recording SDK. Transcripts are generated from recordings, either live during the meeting as it is being recorded or after the recording completes. |
| Transcript | The written text generated from meeting audio. It captures what was said and can include speaker labels, timestamps, and other metadata depending on the transcription workflow and provider. |
| Transcription workflow | When and how transcript data is generated. Recall.ai supports real-time transcription for receiving transcript data during a live meeting and post-meeting transcription for generating a final transcript after a recording completes. |
| Speech-to-text provider | The speech-to-text transcription service that generates the transcript from the recording. This can be Recall.ai Transcription or a supported third-party speech-to-text provider. |
| Utterance | A single piece of transcript text from one speaker. A transcript is usually made up of many utterances, each with its own text, speaker label, timestamps, and metadata. |
Updated 8 days ago