Asynchronous Transcription
In addition to real-time transcription, Recall.ai also supports transcribing asynchronously after the call has ended. The async transcription process is the same for bots and the Desktop Recording SDK.
Quicklinks
Quickstart
Receive the recording.done webhook
recording.done webhookYou'll be notified when a given recording is ready for transcription by receiving a recording.done Status Change event:
{
"event": "recording.done",
"data": {
"data": {
"code": string,
"sub_code": string | null,
"updated_at": string
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
} | null
}
}Upon receiving this, you can kick off an async transcript job, assuming your recording has generated an artifact suitable for transcription (e.g. a video or audio artifact).
Start an async transcription job
To kick off an asynchronous transcription job, call the Create Async Transcript endpoint.
At minimum, you must specify a provider configuration that should be used to transcribe the recording.
Example:
curl --request POST \
--url https://us-west-2.recall.ai/api/v1/recording/{RECORDING_ID}/create_transcript/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"provider": {
"recallai_async": {
"language_code": "en"
}
}
}
'In this example, we choose Recall.ai as the provider, and configure the language as English. For a full list of providers and their options, please see the Create Async Transcript API reference.
Only 10 Successful Transcripts Allowed Per RecordingAs each transcript created using the above triggers a transcription on the underlying provider incurring usage costs, we've limited maximum number of successful transcripts per recording to 10. This helps avoiding cases where bad loop on the consumer end can lead to large number of transcripts being created for the same recording.
In case you run into this limit for a recording, remediation steps are to delete existing transcript on the recording and retry.
Note that the minimum recording time to generate a transcript varies between by the different providers. See their docs for more info.
Diarization
By default, async transcriptions use the mixed audio that is a single stream for the entire recording. Alternatively, on supported platforms we allow transcribing each participant's stream separately, allowing perfect diarization. To use this, add the diarization object with use_separate_streams_when_available set to true
curl --request POST \
--url https://us-west-2.recall.ai/api/v1/recording/{RECORDING_ID}/create_transcript/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"provider": {
"recallai_async": {
"language_code": "en"
}
},
"diarization": {
"use_separate_streams_when_available": true
}
}
'
Transcription CostFor async transcriptions with perfect diarization, we trim out any silence and send only the speaking portions of audio to the transcription provider. This means that even though we are sending multiple streams of audio to your transcription provider, the cost to transcribe is typically similar to the default transcription
However, if there are multiple users speaking concurrently or background conversation then the transcription cost could be greater
By default, transcripts are diarized using speaker-timeline diarization which means the transcript will include speaker names. If you enable machine-diarization, you will receive anonymous speaker labels instead (e.g. A, B, C).
Success
The transcript.done webhook
transcript.done webhookIf the async transcription job completes successfully, you will receive a transcript.done Artifact Status Change event when it completes:
{
"event": "transcript.done",
"data": {
"data": {
"code": "done",
"sub_code": null,
"updated_at": "2024-12-04T23:25:56.339940Z"
},
"transcript": {
"id": "7d7387b1-874f-4950-a5b9-1ba6660e2f95",
"metadata": {}
},
"recording": {
"id": "03d06804-0cb2-42f8-a255-5b950dde7c57",
"metadata": {}
},
"bot": {
"id": "0b85d2f9-d54a-47f6-b28d-4c63229f4035",
"metadata": {}
}
}
}Fetching the transcript
Once you receive the transcript.done webhook, you can fetch the transcript data by calling Retrieve Transcript endpoint using its ID:
curl --request GET \
--url https://us-west-2.recall.ai/api/v1/transcript/{TRANSCRIPT_ID}/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json"The response will contain details about the transcript, such as the configuration used, as well as a pre-signed URL to access the transcript data:
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"recording": {
"id": "03d06804-0cb2-42f8-a255-5b950dde7c57",
"source": {
"bot": {
"id": "0b85d2f9-d54a-47f6-b28d-4c63229f4035"
}
}
},
"created_at": "2024-11-27T20:10:19.719Z",
"expires_at": "2024-12-04T20:10:19.719Z",
"status": {
"code": "done",
"sub_code": null,
"updated_at": "2024-11-27T20:10:19.719Z"
},
"data": {
"download_url": "..."
},
"diarization": {
"use_separate_streams_when_available": false
},
"metadata": {
"custom_field": "some_value"
},
"provider": {
"assembly_ai_async": {
"language": "en"
}
}
}You can see the download_url's data schema here.
Error
The transcript.error webhook
transcript.error webhookIf an async transcription job fails, you will receive a transcript.failed Artifact Status Change webhook event notifying you about the failure:
{
"event": "transcript.failed",
"data": {
"data": {
"code": string,
"sub_code": string | null,
"updated_at": string
},
"transcript": {
"id": string,
"metadata": object,
},
"recording": {
"id": string,
"metadata": object
},
"bot": {
"id": string,
"metadata": object
} | null
}
}The reason for failure is included via a sub_code in the event payload (see Transcript Status Webhooks) and you can also check the bot logs in the dashboard to see why it failed.
If async transcription fails, you can retry the transcription with a backup transcription provider
Language Detection
If you don’t know ahead of time which language the conversation will be in, you can set up automatic language detection. Automatically detecting languages is broken up into two types:
- Language Detection - Detecting the primary spoken language within a recording, without needing to explictly set it
- Code switching - Alternating between two or more languages or language varieties within a single conversation or speech
Most of the third-party transcription providers that we integrate with support language detection.
The table below covers each of these, and their corresponding parameters in the Create Async Transcript provider configuration.
Provider | Create Bot | Supported Languages |
|---|---|---|
|
| |
| Docs | |
|
FAQs
How long does it take to transcribe a recording using async transcription?
This depends between providers and transcription configurations but a 1 hour recording takes ~1 minute with basic configurations (mixed audio using Recall.ai Transcription)
How to get the transcript summary?
To get the transcript summary, you will need to pass the transcript to a third-party LLM to analyze. Recall doesn't support transcription summaries out of the box.
Updated 5 days ago