Desktop Recording SDK Adhoc / In-Person Meetings

The Desktop Recording SDK's adhoc / in-person recording mode is useful if you want to record the audio for a meeting outside of a supported meeting platform (e.g. Zoom, Teams, Google Meet). This could either be to capture the audio of an in-person meeting, or to capture the mic and speaker streams of the desktop.

🚧
Given that adhoc and in-person meetings occur outside of a meeting provider, it is not possible to create a transcript with speaker names or give active speaker timelines in this mode. Instead, by default the local machine's mic stream will be labelled as "Host". Any other audio stream will be labelled as "Guest". You can also use machine diarization to get anonymous speaker labels.

Enabling adhoc / in-person recordings using the DSDK

To record using the adhoc / in-person recording mode, use the DSDK's prepareDesktopAudioRecording call. This function allows you to obtain a window_id to pass to the DSDK's startRecording call, instead of obtaining the ID from the meeting-detected or meeting-updated event.

Regular recording flow:

A meeting starts on a meeting provider
The DSDK sends a meeting-detected event, which contains a window ID
You pass the window ID to the startRecording function (along with your SDK upload token) to start recording that meeting provider

Adhoc / in-person flow:

You call prepareDesktopAudioRecording to obtain a window ID
You pass the window ID to the startRecording function (along with your SDK upload token) to start recording the system's Mic and Speaker audio streams.

📘
From a technical perspective, there is no difference between an adhoc recording and an in-person recording. Using the prepareDesktopAudioRecording call will mix and record the audio streams of your system's mic and speaker.

Machine diarization with adhoc / in-person meetings

In adhoc/in-person mode, speaker labels default to "Host" and "Guest" since there's no meeting platform to identify participants. To separate speakers by voice, you can enable machine diarization through your transcription provider. Machine diarization uses voice signatures to distinguish speakers, producing anonymous labels (e.g. 0, 1, 2) rather than participant names. This is handled by the transcription provider, not Recall. Follow the steps below to get this set up:

Step 1: Enable diarization in your SDK upload

When creating your SDK upload, set the provider's diarization flag and subscribe to both transcript.data and transcript.provider_data events. transcript.data delivers the transcribed text with Recall's default speaker labels ("Host"/"Guest"), while transcript.provider_data delivers the raw response from your transcription provider. This will include the machine-diarized speaker IDs. You can use both of these events to build a diarized transcript.

The example below uses Deepgram, but this will work with any transcription provider that supports machine diarization .

const options = {
  method: 'POST',
  headers: {
    accept: 'application/json',
    'content-type': 'application/json',
    Authorization: 'YOUR_API_KEY'
  },
  body: JSON.stringify({
    recording_config: {
      transcript: {
        provider: {
          deepgram_streaming: {
            diarize: true // enable machine diarization through Deepgram
          }
        }
      },
      realtime_endpoints: [
        {
          type: 'desktop_sdk_callback',
          events: ['transcript.data', 'transcript.provider_data']
        }
      ]
    }
  })
};

fetch('https://YOUR_RECALL_REGION.recall.ai/api/v1/sdk_upload/', options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error(err));

Step 2: Extract speaker labels from the provider data

The transcript.provider_data event and transcript.data events fire together for each utterance. transcript.provider_data contains the raw response from your transcription provider. For Deepgram, the speaker label is available on each word object:

RecallAiSdk.addEventListener('realtime-event', (event) => {
  if (event.event === 'transcript.provider_data') {
    const words = event.data?.data?.data?.payload?.channel?.alternatives?.[0]?.words;
    if (words && words.length > 0) {
      const speakerId = words[0].speaker; // e.g. 0, 1, 2
    }

The path event.data.data.data.payload reflects three layers of wrapping: the SDK event envelope, the Recall event payload, and the raw provider response. Everything inside .payload is the unmodified response from your transcription provider. The nesting above is specific to Deepgram, other providers will have a different structure. See your provider's documentation for their response format.

📘
Want to see a working implementation of machine diarization?
Check out our sample repository to see machine diarization implemented in a real Electron application.

FAQs

Why are speaker labels returned as "host"/"guest"?

The ad-hoc/in-person meeting feature does not provide speaker labels. Instead, the local machine's mic stream will be labelled as "Host". Any other audio stream will be labelled as "Guest".

For example - if you're using this feature to capture audio for an unsupported platform, then the audio picked up by the local machine's mic will be labelled "Host" and the audio from the other participants in the meeting will be labelled "Guest".

If you want the diarized speakers, consider using machine diarization instead.

Why does audio stop if I change my microphone or speaker during a whole-desktop recording?

Whole-desktop recordings bind to the system’s default microphone and speaker when recording starts. Changing audio devices mid-recording is not currently supported and may cause audio capture to stop.