Transcriptions & Captions

Stream supports transcriptions and live closed captions for audio and video calls. Both can be configured to run automatically or can be started and stopped with API calls. Closed captions are delivered to clients with WebSocket events, and transcriptions are uploaded after the call has ended or the process is stopped. If transcription is enabled automatically, the transcription process will start when the first user joins the call, and stop when all participants have left the call.

Quick Start

// start transcription with language
await call.startTranscription({ language: "en" });

// start closed captions with language
await call.startClosedCaptions({ language: "en" });

// stop transcription
await call.stopTranscription();

// stop closed captions
await call.stopClosedCaptions();

// you can also start or stop with a single API call
await call.startTranscription({ enable_closed_captions: true });
await call.stopTranscription({ stop_closed_captions: true });

By default, transcriptions are stored in Stream’s S3 bucket and retained for two weeks. You can also configure your application to store transcriptions on your own external storage, see the Storage section for more detail.

Note: While transcription occurs continuously during the call, and chunks of conversations are saved continuously, the complete transcription file is uploaded only once at the end of the call. This approach is used to avoid requiring additional permissions (such as delete permissions) when using external storage.

Transcription language

For best speech-to-text performance, it is recommended that you specify the language you are using. By default, the language is set to English (en) for all call types.

Alternatively, you can use automatic language detection, which is easier to set up but has some drawbacks:

  • Speech-to-text accuracy is lower
  • Closed caption events will have an additional latency

There are three ways to set the transcription language:

  1. call type level: this is the default language for all calls of the same type
  2. call level: when provided, it overrides the language set for its call type
  3. when starting closed captions or transcriptions using the API

Note: If you change the language for an active call, we will propagate the new language to the already running transcription/closed-caption process.

// 1. set the language for all calls of the default type to "fr"
await client.video.updateCallType("default", {
  settings: {
    transcription: {
      language: "fr",
    },
  },
});

// 2. create a call and set its language to "fr"
await call.getOrCreate({
  settings_override: {
    transcription: {
      language: "fr",
    },
  },
});

// 3. update an existing call and set its language to "fr"
await call.update({
  settings_override: {
    transcription: {
      language: "fr",
    },
  },
});

// 4. start transcription and set language to "fr"
await call.startTranscription({ language: "fr" });

List call transcriptions

Note: transcriptions stored on Stream’s S3 bucket (the default) will be returned with a signed URL.

call.listTranscriptions();

Delete call transcription

This endpoint allows to delete call transcription. Please note that transcriptions will be deleted only if they are stored on Stream side (default).

An error will be returned if the transcription doesn’t exist.

call.deleteTranscription({ session: "<session ID>", filename: "<filename>" });

Events

These events are sent to users connected to the call and your webhook/SQS:

Transcription JSONL file format

The transcription file is a JSONL, where each line is a JSON object containing a speech fragment, and each speech fragment contains timing and user information. It is trivial to convert this JSONL format to other simpler formats such as SRT.

{"type":"speech", "start_time": "2024-02-28T08:18:18.061031795Z", "stop_time":"2024-02-28T08:18:22.401031795Z", "speaker_id": "Sacha_Arbonel", "text": "hello"}
{"type":"speech", "start_time": "2024-02-28T08:18:22.401031795Z", "stop_time":"2024-02-28T08:18:26.741031795Z", "speaker_id": "Sacha_Arbonel", "text": "how are you"}
{"type":"speech", "start_time": "2024-02-28T08:18:26.741031795Z", "stop_time":"2024-02-28T08:18:31.081031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm good"}
{"type":"speech", "start_time": "2024-02-28T08:18:31.081031795Z", "stop_time":"2024-02-28T08:18:35.421031795Z", "speaker_id": "Tommaso_Barbugli", "text": "how about you"}
{"type":"speech", "start_time": "2024-02-28T08:18:35.421031795Z", "stop_time":"2024-02-28T08:18:39.761031795Z", "speaker_id": "Sacha_Arbonel", "text": "I'm good too"}
{"type":"speech", "start_time": "2024-02-28T08:18:39.761031795Z", "stop_time":"2024-02-28T08:18:44.101031795Z", "speaker_id": "Tommaso_Barbugli", "text": "that's great"}
{"type":"speech", "start_time": "2024-02-28T08:18:44.101031795Z", "stop_time":"2024-02-28T08:18:48.441031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm glad to hear that"}

User Permissions

The following permissions are available to grant/restrict access to this functionality when used client-side.

  • StartTranscription required to start the transcription
  • StopTranscription required to stop the transcription
  • ListTranscriptions required to retrieve the list of transcriptions
  • StartClosedCaptions required to start closed captions
  • StopClosedCaptions required to stop closed captions

Enabling, disabling, automatically start

Transcriptions and closed captions can be configured from the Dashboard (see the call type settings) or directly via the API. It is also possible to change the transcription settings for a call and override the default settings that come from its call type.

// Disable on call level
call.update({
  settings_override: {
    transcription: {
      mode: "disabled",
      closed_caption_mode: "disabled",
    },
  },
});

// Disable on call type level
client.video.updateCallType({
  name: "<call type name>",
  settings: {
    transcription: {
      language: "en",
      mode: "disabled",
      closed_caption_mode: "disabled",
    },
  },
});

// Enable
call.update({
  settings_override: {
    transcription: {
      language: "en",
      mode: "available",
      closed_caption_mode: "available",
    },
  },
});

// Other settings
call.update({
  settings_override: {
    transcription: {
      language: "en",
      quality: "auto-on",
      closed_caption_mode: "auto-on",
    },
  },
});

Supported languages

  • English (en) - default
  • French (fr)
  • Spanish (es)
  • German (de)
  • Italian (it)
  • Dutch (nl)
  • Portuguese (pt)
  • Polish (pl)
  • Catalan (ca)
  • Czech (cs)
  • Danish (da)
  • Greek (el)
  • Finnish (fi)
  • Indonesian (id)
  • Japanese (ja)
  • Russian (ru)
  • Swedish (sv)
  • Tamil (ta)
  • Thai (th)
  • Turkish (tr)
  • Hungarian (hu)
  • Romanian (to)
  • Chinese (zh)
  • Arabic (ar)
  • Tagalog (tl)
  • Hebrew (he)
  • Hindi (hi)
  • Croatian (hr)
  • Korean (ko)
  • Malay (ms)
  • Norwegian (no)
  • Ukrainian (uk)
© Getstream.io, Inc. All Rights Reserved.