Diarization

How to attribute speaker names to words in the transcript.

Diarization is how a transcript gets speaker attribution (which speaker each word belongs to).

Overview

There are three diarization behaviors you can use:

Speaker Timeline Diarization (default)

Recall takes the meeting platform’s active speaker timeline and matches transcript words into those time ranges.

Because the platform knows participant identity, this mode can attach participant names to the transcript.

Machine Diarization (via third-party transcription providers)

A third-party transcription provider (not Recall) identifies who speaks each word using voice signatures.

Recall simply requests diarization from the provider and returns the provider’s diarized transcript using generic speaker labels (e.g. A, B, C or 0, 1, 2).

Hybrid Diarization

You can use the Speaker Timeline Diarization when people are on their own devices, and falls back to Machine Diarization when multiple people are sharing a device/microphone

Comparing different diarization behaviors

ModeAttribution methodSpeaker NamesUseful whenNot useful when
Speaker TimelineSpeaker change events✅ YesFully remote (each person joins from their own device). Sufficient for most use-casesMultiple people share one device/mic (conference room)
Machine (Provider)Voice signature/attributes❌ No (anonymous speaker labels)Conference rooms, shared mic setups, participants calling from the same deviceWhen you need real participant names
HybridCombination of both of the above⚠️ For participants calling from their own deviceMixed environments (sometimes remote, sometimes shared mic)If provider machine diarization isn’t enabled (no fallback)

Speaker timeline diarization (default)

Speaker Timeline Diarization uses “speaker change” events emitted by the meeting platform to determine when the active speaker changes, then assigns transcript text within each window to that participant.

This is why it works best when each person joins from their own device: the meeting platform can cleanly associate active-speaker events with a specific participant, so Recall can label transcript segments with participant names.

  • Useful when: each participant is speaking from their own device (typical remote meetings)
  • Not useful when: multiple people are speaking from the same device/microphone (conference room on one laptop)

Enabling it: It’s used whenever Machine Diarization is not enabled.

Machine diarization (via a third-party transcription provider)

📘

Want participant names on the transcript?

Remove provider diarization fields (listed above) so Recall uses Speaker Timeline Diarization.

Machine diarization is produced by your transcription provider, not Recall. Recall simply requests diarization from the provider and returns the provider’s diarized transcript.

This mode separates speakers by voice so it works better when multiple people share a mic, but it cannot know participant names - so it uses placeholder labels like A/B/C or 0/1/2. It may also produce inaccurate results if two participants sound the same as it uses voice attributes (varies by third-party transcription providers) to identify different speakers.

Enabling it: set a diarization flag on your transcription provider config:

Real-timeAsync
Assemblyassembly_ai_async_chunked.speaker_labels: trueassembly_ai_async.speaker_labels: true
Deepgramdeepgram_streaming.diarize: truedeepgram_async.diarize: true
Revrev_streaming.enable_speaker_switch: true-

To access the machine diarized transcript utterances:

  • Real-time: listen to transcript.provider_data events to access the raw transcript provider data
  • Async: query the recording.media_shortcuts.transcript.data.download_url from the recording

Hybrid diarization

🚧

Hybrid diarization is only available for async transcription

Hybrid diarization relies on machine diarization which is only available for async transcription.

Hybrid diarization uses a combination of the Participants List and Machine Diarization to handle the meeting's join pattern:

  • If participants are joining from their own devices, the anonymous speaker label is replaced with the real participant name from the participants list.
  • If Recall detects that multiple participants are effectively coming from the same device/microphone, the anonymous speaker labels are kept (so voices can still be separated).

Enabling hybrid diarization

To use it, you will need to implement the logic/algorithm by doing the following:

  • Enable Perfect Diarization by setting diarization.use_separate_streams_when_available: true as seen in the Create Async Transcript
  • Enable machine diarization through one of the options listed in the machine diarization section of this doc

You will then receive the transcript with a participant object that has no participant id and with participant name key instead. For example, you will receive something like:

[
  {
    "participant": {
      "id": null,
      "name": "200-0",
      "is_host": false,
      "platform": "mobile_app",
      "extra_data": {...},
      "email": null
    },
    "words": [...]
  },
  // ... other transcript utterances
]

The participant name key follows the format {participant_id}-{anonymous_label}.

With this, you can then:

  • Fetch the list of participants via the recording.media_shortcuts.participant_events.data.participants_download_url field
  • Build a mapping of each participant_id to its set of anonymous labels across all transcript parts that looks like this:
{
  // participant_id: anonymous_labels[]
  100: [0],
  200: [0, 1]
  // other participants
}
  • Get the list of participant ids with only one anonymous label
  • Iterate over the transcript and:
    • For participants with exactly one anonymous label, replace the anonymous speaker with the real participant name and metadata
    • For participants with multiple anonymous labels (multiple people sharing a device), leave the anonymous labels unchanged
📘

You can use this sample app to see how to get the transcript using hybrid diarization

FAQ

Why am I seeing Speaker A, Speaker B, or 0, 1, 2 instead of names?

That indicates Machine Diarization (via a third-party transcription provider) was used. Machine diarization can separate voices, but it can’t attach real participant names.

To get participant names, remove provider diarization flags such as:

  • assembly_ai_async.speaker_labels
  • deepgram_async.diarize

Why do multiple speakers calling from the same device appear as the same participant in the transcript?

This usually happens when multiple people are sharing one device/microphone and Speaker Timeline is being used (or machine diarization isn’t enabled/available). For conference rooms, enable Machine Diarization (async) so voices can be separated.

Microsoft Teams: why are speaker names missing or diarization looks wrong?

Teams has a setting that affects whether speakers can be identified in captions/transcripts: Transcription Caption Identification. If this setting is turned off, transcripts will not get diarized properly with multiple speakers.

Where to find this setting in Teams: Captions and Transcripts -> Transcription -> Automatically identify me in meeting captions and transcripts

Be aware that org-wide Teams policies can override individual user settings.

Is there any additional costs for diarization?

Diarization (identifying participants by name across all recording media) comes at no additional cost.