Diarization

How to attribute speaker names to words in the transcript.

Diarization is how a transcript gets speaker attribution (which speaker each word belongs to).

Overview

There are three diarization behaviors you can use:

Speaker Timeline Diarization (default)

Recall takes the meeting platform’s active speaker timeline and matches transcript words into those time ranges.

Because the platform knows participant identity, this mode can attach participant names to the transcript.

Machine Diarization (via third-party transcription providers)

A third-party transcription provider (not Recall) identifies who speaks each word using voice signatures.

Recall simply requests diarization from the provider and returns the provider’s diarized transcript using generic speaker labels (e.g. A, B, C or 0, 1, 2).

Hybrid Diarization

You can use the Speaker Timeline Diarization when people are on their own devices, and falls back to Machine Diarization when multiple people are sharing a device/microphone

Comparing different diarization behaviors

ModeAttribution methodSpeaker NamesUseful whenNot useful when
Speaker TimelineSpeaker change events✅ YesFully remote (each person joins from their own device). Sufficient for most use-casesMultiple people share one device/mic (conference room)
Machine (Provider)Voice signature/attributes❌ No (anonymous speaker labels)Conference rooms, shared mic setups, participants calling from the same deviceWhen you need real participant names
HybridCombination of both of the above⚠️ For participants calling from their own deviceMixed environments (sometimes remote, sometimes shared mic)If provider machine diarization isn’t enabled (no fallback)

Speaker timeline diarization (default)

Speaker Timeline Diarization uses “speaker change” events emitted by the meeting platform to determine when the active speaker changes, then assigns transcript text within each window to that participant.

This is why it works best when each person joins from their own device: the meeting platform can cleanly associate active-speaker events with a specific participant, so Recall can label transcript segments with participant names.

  • Useful when: each participant is speaking from their own device (typical remote meetings)
  • Not useful when: multiple people are speaking from the same device/microphone (conference room on one laptop)

Enabling it: It’s used whenever Machine Diarization is not enabled.

Machine diarization (via a third-party transcription provider)

📘

Want participant names on the transcript?

Remove provider diarization fields (listed above) so Recall uses Speaker Timeline Diarization.

Machine diarization is produced by your transcription provider, not Recall. Recall simply requests diarization from the provider and returns the provider’s diarized transcript.

This mode separates speakers by voice so it works better when multiple people share a mic, but it cannot know participant names - so it uses placeholder labels like A/B/C or 0/1/2. It may also produce inaccurate results if two participants sound the same as it uses voice attributes (varies by third-party transcription providers) to identify different speakers.

Enabling it: set a diarization flag on your transcription provider config (examples):

  • assembly_ai_async.speaker_labels: true
  • deepgram_async.diarize: true
  • deepgram_streaming.diarize: true

To access the machine diarized transcript utterances:

  • Real-time: listen to transcript.provider_data events to access the raw transcript provider data
  • Async: query the recording.media_shortcuts.transcript.data.download_url from the recording

Hybrid diarization

🚧

Hybrid diarization is only available for async transcription

Hybrid diarization relies on machine diarization which is only available for async transcription.

Hybrid diarization uses a combination of Speaker-Timeline and Machine Diarizations for the meeting’s join pattern:

  • If participants are joining from their own devices, Recall uses Speaker Timeline Diarization (so you get participant names).
  • If Recall detects that multiple participants are effectively coming from the same device/microphone, Recall uses Machine Diarization (so voices can be separated).

Enabling it: you will need to implement the logic/algorithm by doing the following:

  • Use machine diarization to identify all unique speakers
  • Use the speaker timeline to identify each period where a meeting participant (who is calling from their own device) is supposedly speaking
  • Identify speaker timeline periods where the same single participant is speaking to attribute that anonymous speaker label to the participant name on the speaker timeline
📘

You can use this sample app to see how to get the transcript using hybrid diarization

FAQ

Why am I seeing Speaker A, Speaker B, or 0, 1, 2 instead of names?

That indicates Machine Diarization (via a third-party transcription provider) was used. Machine diarization can separate voices, but it can’t attach real participant names.

To get participant names, remove provider diarization flags such as:

  • assembly_ai_async.speaker_labels
  • deepgram_async.diarize

Why do multiple speakers calling from the same device appear as the same participant in the transcript?

This usually happens when multiple people are sharing one device/microphone and Speaker Timeline is being used (or machine diarization isn’t enabled/available). For conference rooms, enable Machine Diarization (async) so voices can be separated.

Microsoft Teams: why are speaker names missing or diarization looks wrong?

Teams has a setting that affects whether speakers can be identified in captions/transcripts: Transcription Caption Identification. If this setting is turned off, transcripts will not get diarized properly with multiple speakers.

Where to find this setting in Teams: Captions and Transcripts -> Transcription -> Automatically identify me in meeting captions and transcripts

Be aware that org-wide Teams policies can override individual user settings.