Diarization
How to attribute speaker names to words in the transcript.
Diarization is how a transcript gets speaker attribution (which speaker each word belongs to).
Overview
There are three diarization behaviors you can use:
Speaker Timeline Diarization (default)
Recall takes the meeting platform’s active speaker timeline and matches transcript words into those time ranges.
Because the platform knows participant identity, this mode can attach participant names to the transcript.
Machine Diarization (via third-party transcription providers)
A third-party transcription provider (not Recall) identifies who speaks each word using voice signatures.
Recall simply requests diarization from the provider and returns the provider’s diarized transcript using generic speaker labels (e.g. A, B, C or 0, 1, 2).
Hybrid Diarization
You can use the Speaker Timeline Diarization when people are on their own devices, and falls back to Machine Diarization when multiple people are sharing a device/microphone
Comparing different diarization behaviors
| Mode | Attribution method | Speaker Names | Useful when | Not useful when |
|---|---|---|---|---|
| Speaker Timeline | Speaker change events | ✅ Yes | Fully remote (each person joins from their own device). Sufficient for most use-cases | Multiple people share one device/mic (conference room) |
| Machine (Provider) | Voice signature/attributes | ❌ No (anonymous speaker labels) | Conference rooms, shared mic setups, participants calling from the same device | When you need real participant names |
| Hybrid | Combination of both of the above | ⚠️ For participants calling from their own device | Mixed environments (sometimes remote, sometimes shared mic) | If provider machine diarization isn’t enabled (no fallback) |
Speaker timeline diarization (default)
Speaker Timeline Diarization uses “speaker change” events emitted by the meeting platform to determine when the active speaker changes, then assigns transcript text within each window to that participant.
This is why it works best when each person joins from their own device: the meeting platform can cleanly associate active-speaker events with a specific participant, so Recall can label transcript segments with participant names.
- Useful when: each participant is speaking from their own device (typical remote meetings)
- Not useful when: multiple people are speaking from the same device/microphone (conference room on one laptop)
Enabling it: It’s used whenever Machine Diarization is not enabled.
Machine diarization (via a third-party transcription provider)
Want participant names on the transcript?Remove provider diarization fields (listed above) so Recall uses Speaker Timeline Diarization.
Machine diarization is produced by your transcription provider, not Recall. Recall simply requests diarization from the provider and returns the provider’s diarized transcript.
This mode separates speakers by voice so it works better when multiple people share a mic, but it cannot know participant names - so it uses placeholder labels like A/B/C or 0/1/2. It may also produce inaccurate results if two participants sound the same as it uses voice attributes (varies by third-party transcription providers) to identify different speakers.
Enabling it: set a diarization flag on your transcription provider config:
| Real-time | Async | |
|---|---|---|
| Assembly | assembly_ai_async_chunked.speaker_labels: true | assembly_ai_async.speaker_labels: true |
| Deepgram | deepgram_streaming.diarize: true | deepgram_async.diarize: true |
| Rev | rev_streaming.enable_speaker_switch: true | - |
To access the machine diarized transcript utterances:
- Real-time: listen to
transcript.provider_dataevents to access the raw transcript provider data - Async: query the
recording.media_shortcuts.transcript.data.download_urlfrom the recording
Hybrid diarization
Hybrid diarization is only available for async transcriptionHybrid diarization relies on machine diarization which is only available for async transcription.
Hybrid diarization uses a combination of the Participants List and Machine Diarization to handle the meeting's join pattern:
- If participants are joining from their own devices, the anonymous speaker label is replaced with the real participant name from the participants list.
- If Recall detects that multiple participants are effectively coming from the same device/microphone, the anonymous speaker labels are kept (so voices can still be separated).
Enabling hybrid diarization
To use it, you will need to implement the logic/algorithm by doing the following:
- Enable Perfect Diarization by setting
diarization.use_separate_streams_when_available: trueas seen in the Create Async Transcript - Enable machine diarization through one of the options listed in the machine diarization section of this doc
You will then receive the transcript with a participant object that has no participant id and with participant name key instead. For example, you will receive something like:
[
{
"participant": {
"id": null,
"name": "200-0",
"is_host": false,
"platform": "mobile_app",
"extra_data": {...},
"email": null
},
"words": [...]
},
// ... other transcript utterances
]The participant name key follows the format {participant_id}-{anonymous_label}.
With this, you can then:
- Fetch the list of participants via the
recording.media_shortcuts.participant_events.data.participants_download_urlfield - Build a mapping of each
participant_idto its set of anonymous labels across all transcript parts that looks like this:
{
// participant_id: anonymous_labels[]
100: [0],
200: [0, 1]
// other participants
}
- Get the list of participant ids with only one anonymous label
- Iterate over the transcript and:
- For participants with exactly one anonymous label, replace the anonymous speaker with the real participant name and metadata
- For participants with multiple anonymous labels (multiple people sharing a device), leave the anonymous labels unchanged
You can use this sample app to see how to get the transcript using hybrid diarization
FAQ
Why am I seeing Speaker A, Speaker B, or 0, 1, 2 instead of names?
Speaker A, Speaker B, or 0, 1, 2 instead of names?That indicates Machine Diarization (via a third-party transcription provider) was used. Machine diarization can separate voices, but it can’t attach real participant names.
To get participant names, remove provider diarization flags such as:
assembly_ai_async.speaker_labelsdeepgram_async.diarize
Why do multiple speakers calling from the same device appear as the same participant in the transcript?
This usually happens when multiple people are sharing one device/microphone and Speaker Timeline is being used (or machine diarization isn’t enabled/available). For conference rooms, enable Machine Diarization (async) so voices can be separated.
Microsoft Teams: why are speaker names missing or diarization looks wrong?
Teams has a setting that affects whether speakers can be identified in captions/transcripts: Transcription Caption Identification. If this setting is turned off, transcripts will not get diarized properly with multiple speakers.
Where to find this setting in Teams: Captions and Transcripts -> Transcription -> Automatically identify me in meeting captions and transcripts
Be aware that org-wide Teams policies can override individual user settings.
Is there any additional costs for diarization?
Diarization (identifying participants by name across all recording media) comes at no additional cost.
Updated 3 days ago