Diarization
How to attribute speaker names to words in the transcript.
Diarization is how a transcript gets speaker attribution (which speaker each word belongs to).
Overview
There are three diarization behaviors you can use:
Speaker Timeline Diarization (default)
Recall takes the meeting platform’s active speaker timeline and matches transcript words into those time ranges.
Because the platform knows participant identity, this mode can attach participant names to the transcript.
Machine Diarization (via third-party transcription providers)
A third-party transcription provider (not Recall) identifies who speaks each word using voice signatures.
Recall simply requests diarization from the provider and returns the provider’s diarized transcript using generic speaker labels (e.g. A, B, C or 0, 1, 2).
Hybrid Diarization
You can use the Speaker Timeline Diarization when people are on their own devices, and falls back to Machine Diarization when multiple people are sharing a device/microphone
Comparing different diarization behaviors
| Mode | Attribution method | Speaker Names | Useful when | Not useful when |
|---|---|---|---|---|
| Speaker Timeline | Speaker change events | ✅ Yes | Fully remote (each person joins from their own device). Sufficient for most use-cases | Multiple people share one device/mic (conference room) |
| Machine (Provider) | Voice signature/attributes | ❌ No (anonymous speaker labels) | Conference rooms, shared mic setups, participants calling from the same device | When you need real participant names |
| Hybrid | Combination of both of the above | ⚠️ For participants calling from their own device | Mixed environments (sometimes remote, sometimes shared mic) | If provider machine diarization isn’t enabled (no fallback) |
Speaker timeline diarization (default)
Speaker Timeline Diarization uses “speaker change” events emitted by the meeting platform to determine when the active speaker changes, then assigns transcript text within each window to that participant.
This is why it works best when each person joins from their own device: the meeting platform can cleanly associate active-speaker events with a specific participant, so Recall can label transcript segments with participant names.
- Useful when: each participant is speaking from their own device (typical remote meetings)
- Not useful when: multiple people are speaking from the same device/microphone (conference room on one laptop)
Enabling it: It’s used whenever Machine Diarization is not enabled.
Machine diarization (via a third-party transcription provider)
Want participant names on the transcript?Remove provider diarization fields (listed above) so Recall uses Speaker Timeline Diarization.
Machine diarization is produced by your transcription provider, not Recall. Recall simply requests diarization from the provider and returns the provider’s diarized transcript.
This mode separates speakers by voice so it works better when multiple people share a mic, but it cannot know participant names - so it uses placeholder labels like A/B/C or 0/1/2. It may also produce inaccurate results if two participants sound the same as it uses voice attributes (varies by third-party transcription providers) to identify different speakers.
Enabling it: set a diarization flag on your transcription provider config (examples):
assembly_ai_async.speaker_labels: truedeepgram_async.diarize: truedeepgram_streaming.diarize: true
To access the machine diarized transcript utterances:
- Real-time: listen to
transcript.provider_dataevents to access the raw transcript provider data - Async: query the
recording.media_shortcuts.transcript.data.download_urlfrom the recording
Hybrid diarization
Hybrid diarization is only available for async transcriptionHybrid diarization relies on machine diarization which is only available for async transcription.
Hybrid diarization uses a combination of Speaker-Timeline and Machine Diarizations for the meeting’s join pattern:
- If participants are joining from their own devices, Recall uses Speaker Timeline Diarization (so you get participant names).
- If Recall detects that multiple participants are effectively coming from the same device/microphone, Recall uses Machine Diarization (so voices can be separated).
Enabling it: you will need to implement the logic/algorithm by doing the following:
- Use machine diarization to identify all unique speakers
- Use the speaker timeline to identify each period where a meeting participant (who is calling from their own device) is supposedly speaking
- Identify speaker timeline periods where the same single participant is speaking to attribute that anonymous speaker label to the participant name on the speaker timeline
You can use this sample app to see how to get the transcript using hybrid diarization
FAQ
Why am I seeing Speaker A, Speaker B, or 0, 1, 2 instead of names?
Speaker A, Speaker B, or 0, 1, 2 instead of names?That indicates Machine Diarization (via a third-party transcription provider) was used. Machine diarization can separate voices, but it can’t attach real participant names.
To get participant names, remove provider diarization flags such as:
assembly_ai_async.speaker_labelsdeepgram_async.diarize
Why do multiple speakers calling from the same device appear as the same participant in the transcript?
This usually happens when multiple people are sharing one device/microphone and Speaker Timeline is being used (or machine diarization isn’t enabled/available). For conference rooms, enable Machine Diarization (async) so voices can be separated.
Microsoft Teams: why are speaker names missing or diarization looks wrong?
Teams has a setting that affects whether speakers can be identified in captions/transcripts: Transcription Caption Identification. If this setting is turned off, transcripts will not get diarized properly with multiple speakers.
Where to find this setting in Teams: Captions and Transcripts -> Transcription -> Automatically identify me in meeting captions and transcripts
Be aware that org-wide Teams policies can override individual user settings.
Updated 3 days ago