Real-Time Transcription
Generate meeting transcripts in real-time.
Real-time transcription generates meeting transcripts in real-time and allows you to consume the transcripts in two ways:
- Receiving real-time transcription webhook events
- Calling the Get Bot Transcript endpoint
Fetching a transcript from Get Bot Transcript while the call is still in progress will return the transcript generated so-far.
Webhooks are completely optional but can be useful for building real-time experiences.
Quickstart: Enable AI real-time transcription
To enable transcription for a bot, specify the transcription_options
parameter when calling Create Bot.
At minimum, you must specify the provider
and have valid credentials set in the Recall dashboard. You can also set provider-specific parameters to customize the AI model & transcription behavior.
For a comprehensive list of provider-specific parameters, see the Create Bot reference.
{
"meeting_url: "https://meet.google.com/myy-meet-ing",
"transcription_options": {
"provider": "assembly_ai" | "assembly_ai_async_chunked" | "aws_transcribe" | "deepgram" | "gladia" | "rev" | "speechmatics" | "symbl"
// ...other provider-specific params
}
}
Call Get Bot Transcript to see the full transcript for the call. If the call isn't over yet, this endpoint will return the transcript produced thus far.
To receive transcription results as they're generated, you can leverage Transcription Events.
Transcription webhooks
Setup and verification
To receive live transcript events on your server, there are two things you need to do when Creating the Bot.
Configure transcription for the bot
Make sure you have transcription enabled by setting the transcription_options
parameter:
{
"transcription_options": {
"provider": "assembly_ai" | "assembly_ai_async_chunked" | "aws_transcribe" | "deepgram" | "gladia" | "rev" | "speechmatics" | "symbl"
},
...
}
Specifying this parameter will cause transcription to be done, which can be retrieved in-full using Get Bot Transcript.
This is also a prerequisite to receiving real-time transcription webhook events.
Specify the real_time_transcription.destination_url
parameter
real_time_transcription.destination_url
parameterTo start receiving transcription webhook events, specify the
real_time_transcription.destination_url
parameter in the Create Bot endpoint.
At this endpoint, you'll want to set up an endpoint to handle incoming transcription events.
For example, I might have an endpoint at https://api.my-app.com/webhook/transcription
that verifies incoming webhooks and fires off an event of your own in your application.
Webhook Verification
Verifying incoming webhooks
Speaking of verification, we recommend implementing verification by including a query parameter, such as token=your-random-token
, in the URL you provide.
When we make the request to your endpoint, we will use the exact url, including any query parameters. You will then be able to verify the query parameter in your server's webhook handler.
Events
A bot.transcription
webhook is sent when new real-time transcription is available.
{
"event": "bot.transcription",
"data": {
"bot_id": string,
"transcript": {
"speaker": string | null,
"speaker_id": string | null,
"transcription_provider_speaker": string | absent,
"language": string | null,
"original_transcript_id": number,
"words":[{
"text": string,
"start_time": number,
"end_time": number
}],
"is_final": boolean
} | absent,
"search": {
"speaker": string | null,
"original_transcript_id": number,
"hits":[{
"text": string,
"start_time": number,
"end_time": number,
"confidence": number
}]
} | absent
}
}
Transcription Errors
A bot.output_log
event is sent to your configured Bot Status Change Webhooks endpoint when a fatal error occur communicating with the transcription provider.
Listening to these events can be beneficial to trigger alerts when something goes wrong when generating a transcript.
For example, you provide an invalid API key for a transcription provider, you will receive an event to notify you about this:
{
"event": "bot.output_log",
"data": {
"bot_id": "fc75fbf8-4a87-4438-bca8-83200962a1eb",
"log": {
"created_at": "2024-03-10T19:16:16+00:00",
"level": "error",
"message": "Failed to connect to transcription provider: Error connecting to transcription provider: Http(Response { status: 401, version: HTTP/1.1, headers: {\"content-type\": \"application/json\", \"dg-error\": \"Invalid credentials.\", \"content-length\": \"112\", \"access-control-allow-credentials\": \"true\", \"vary\": \"origin\", \"vary\": \"access-control-request-method\", \"vary\": \"access-control-request-headers\", \"dg-request-id\": \"590e1e63-f959-42a2-bda1-eb91e7ae9124\", \"date\": \"Fri, 10 May 2024 19:16:35 GMT\"})",
"output_id": "9aac7fc7-f4fc-4d70-8845-d9cc371cbe17"
}
}
}
Recovering Transcripts From Failures
If real-time transcription fails, you'll still be able to transcribe the meeting using asynchronous transcription. After the meeting ends, you can call the Analyze Bot Media endpoint with the parameters appropriate for your transcription model. If you're using assembly_ai_async_chunked
for real-time transcription then you'll be able to reuse the same config, otherwise you'll need to make a different config for the asynchronous version of your transcription provider. Be sure to initiate this process within the 7-day media retention window, as recordings will not be available for transcription beyond that period.
Partial results
When using real-time transcription, the time to receive a transcription webhook can vary according to how long the utterance is. In cases of longer monologues, this delay can be quite significant and may hinder the real-time experience.
To alleviate this, you can leverage partial transcription results to decrease the latency of transcription events, even with longer utterances.
These are low-latency partial or intermediate results for an utterance and can be used as intermediates for the final transcript.
Enable partial results
To enable partial results, set partial_results
to true
in the real_time_transcription
object when calling Create Bot:
{
"real_time_transcription": {
"destination_url": "https://my-app.com/api/webhook/recall/transcription",
"partial_results": true
},
...
}
When partial_results
is set to true
, you will receive partial transcription results in addition to the finalized results.
Using partial results
The is_final
field indicates whether or not the block is a partial result or not.
For example, if I say "Hey, my name is John. It's really nice to meet you.", I might receive:
A series of partial results (is_final
= false):
Ay my name
Hey my name is John.
Hey my name is John, it's really nice
Then a short time after this, a more accurate final result (is_final
: true):
Hey, my name is John. It's really nice to meet you.
This is useful for cases where you may want to use partial results immediately, but then update it with the more accurate, finalized result after receiving it.
One common pattern is to to display partial results in your UI, and then replace them with the finalized version once received.
Languages
Automatic language detection
Certain AI transcription providers support automatic language detection for real-time transcription.
The table below covers each of these, and their corresponding parameters in the Create Bot transcription_options
configuration.
Provider | Create Bot transcription_options parameter | Supported Languages |
---|---|---|
assembly_ai_async_chunked | Set language_detection to true | Docs |
aws_transcribe | Set language_identification to true and specify a list of language_options | Docs |
gladia | language_behaviour should be set to either:- automatic single language - automatic multiple languages More info | Docs |
FAQ
What is the original_transcript_id
?
original_transcript_id
?Recall splits and rejoins transcription utterances for various purposes when processing the transcript.
The original_transcript_id
is a way to keep track of which chunks came back together from the transcription provider, in case you want to use this information yourself.
All transcript parts that have the same original_transcript_id
were from the same transcript part from the transcription provider and can be sorted accordingly.
What does is_final
represent?
is_final
represent?The is_final
field indicates whether or not the block is a partial result or not.
This is a feature that certain transcription providers use to give low-latency partial or intermediate results for an utterance. They're typically used as intermediates for the final transcript.
For example, if I say "Hey, my name is John", I might receive:
A partial result (is_final
= false):
Ay my name John
Then a short time after this, a more accurate final result (is_final
: true):
Hey, my name is John.
This is useful for cases where you may want to display a result immediately, but then update it with the more accurate, finalized result after receiving it.
Why are transcription webhooks so delayed?
Recall will POST any results from the configured transcription provider as they're received. When using partial results, the frequency is typically in the hundreds of ms to low seconds range but varies slightly by provider.
If you're seeing large delays in results, such as many seconds, or even minutes, this is likely due to the "single-threaded" nature of the transcription feed. Since transcription utterances are sequential and rely on being in a particular order, blocking a webhook request will delay any subsequent requests.
If you're running in a single-threaded environment, you should make sure that any processing of the transcription webhook happens asynchronously to prevent delaying future webhooks.
Updated about 1 month ago