Diarization

Diarization is the process of determining which speaker each word of a transcript belongs to. There are 2 ways you can diarize a transcript through Recall.

Diarization types


Speaker Timeline Diarization

This is the default diarization mode for Recall. Recall captures the "active speaker timeline" generated by each meeting platform, and matches the transcribed words to the ranges in which each speaker is speaking.

Enabling Speaker Timeline Diarization

Speaker timeline diarization is the default, and is enabled automatically if Machine Diarization is not enabled.

Pros

  • Speaker timeline diarization can determine the speaker name for each set of words.
    • This is because the diarization is done with data directly from the meeting platform, which includes the name of the speaker.
  • Speaker timeline diarization can be more accurate in general.

Cons

  • Speaker timeline diarization does not work if multiple people have joined from the same computer.
    • For example if multiple people are sitting in a conference room with one camera and microphone, Speaker Timeline Diarization will not be able to tell the difference between them.

Machine Diarization

Some transcription providers offer AI diarization features as part of their product. This diarization method uses AI to recognize the voices of the speakers, and determine which words are spoken by which speaker.

Enabling Machine Diarization

If you request diarization from the transcription provider, this will enable machine diarization.

For example, setting any of these to true would result in a machine-diarized transcript with generic speaker labels:

Real-time transcription:

  • rev.enable­_speaker­_switch
  • deepgram.diarize

Asynchronous transcription:

  • assemblyai_async_transcription.speaker_labels
  • deepgram_async_transcription.diarize
  • gladia_v2_async_transcription.diarization

Pros

  • Machine diarization can be more accurate if multiple people are speaking into the same microphone.
    • For example, if many people are together in a conference room, the Speaker Timeline Diarization would recognize all of them as the same person, while Machine Diarization would be be able to separate them

Cons

  • Machine diarization can be less accurate in general
  • Machine diarization cannot determine the names of the speakers
    • A machine diarized transcript will have speakers listed as "A", "B", "C",... or "0", "1", "2".
    • This is because the AI cannot determine what the name of each speaker is, so assigns a placeholder.

Platform considerations


Microsoft Teams

For Teams, there's one important setting to know about that affects diarization: Transcription Caption Identification.

If this setting is turned off, transcripts will not get diarized properly with multiple speakers.

Where to find this:

Captions and Transcripts > Transcription > Automatically identify me in meeting captions and transcripts