Transcriptions & Captions

Stream supports transcriptions and live closed captions for audio and video calls. Both can be configured to run automatically or can be started and stopped with API calls. Closed captions are delivered to clients with WebSocket events, and transcriptions are uploaded after the call has ended or the process is stopped. If transcription is enabled automatically, the transcription process will start when the first user joins the call, and stop when all participants have left the call.

Quick Start

// start transcription with language
await call.startTranscription({ language: "en" });

// start closed captions with language
await call.startClosedCaptions({ language: "en" });

// stop transcription
await call.stopTranscription();

// stop closed captions
await call.stopClosedCaptions();

// you can also start or stop with a single API call
await call.startTranscription({ enable_closed_captions: true });
await call.stopTranscription({ stop_closed_captions: true });

# starts transcription
call.start_transcription(language="en")

# starts closed captions
call.start_closed_captions(language="en")

# stops the transcription for the call
call.stop_transcription()

# stops the transcriptions for the call
call.stop_closed_captions()

# you can also start or stop with a single API call
call.start_transcription(enable_closed_captions=True)
call.stop_transcription(stop_closed_captions=True)

// start transcription with language
call.StartTranscription(ctx, &getstream.StartTranscriptionRequest{
    Language: getstream.PtrTo("en"),
})

// start closed captions with language
call.StartClosedCaptions(ctx, &getstream.StartClosedCaptionsRequest{
    Language: getstream.PtrTo("en"),
})

// stop transcription
call.StopTranscription(ctx, &getstream.StopTranscriptionRequest{})

// stop closed captions
call.StopClosedCaptions(ctx, &getstream.StopClosedCaptionsRequest{})

// you can also start or stop with a single API call
call.StartTranscription(ctx, &getstream.StartTranscriptionRequest{
    EnableClosedCaptions: getstream.PtrTo(true),
})
call.StopTranscription(ctx, &getstream.StopTranscriptionRequest{
    StopClosedCaptions: getstream.PtrTo(true),
})

curl -X POST "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/start_transcription?api_key=${API_KEY}"\
    -H "Authorization: ${TOKEN}" \
    -H "stream-auth-type: jwt"

curl -X POST "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/stop_transcription?api_key=${API_KEY}"\
    -H "Authorization: ${TOKEN}" \
    -H "stream-auth-type: jwt"

By default, transcriptions are stored in Stream’s S3 bucket and retained for two weeks. You can also configure your application to store transcriptions on your own external storage, see the Storage section for more detail.

Note: While transcription occurs continuously during the call, and chunks of conversations are saved continuously, the complete transcription file is uploaded only once at the end of the call. This approach is used to avoid requiring additional permissions (such as delete permissions) when using external storage.

Transcription language

For best speech-to-text performance, it is recommended that you specify the language you are using. By default, the language is set to English (en) for all call types.

Alternatively, you can use automatic language detection, which is easier to set up but has some drawbacks:

Speech-to-text accuracy is lower
Closed caption events will have an additional latency

There are three ways to set the transcription language:

call type level: this is the default language for all calls of the same type
call level: when provided, it overrides the language set for its call type
when starting closed captions or transcriptions using the API

Note: If you change the language for an active call, we will propagate the new language to the already running transcription/closed-caption process.

// 1. set the language for all calls of the default type to "fr"
await client.video.updateCallType("default", {
  settings: {
    transcription: {
      language: "fr",
    },
  },
});

// 2. create a call and set its language to "fr"
await call.getOrCreate({
  settings_override: {
    transcription: {
      language: "fr",
    },
  },
});

// 3. update an existing call and set its language to "fr"
await call.update({
  settings_override: {
    transcription: {
      language: "fr",
    },
  },
});

// 4. start transcription and set language to "fr"
await call.startTranscription({ language: "fr" });

# 1. set the language for all calls of the default type to "fr"
client.video.update_call_type(call.call_type, settings=CallSettingsRequest(
    transcription=TranscriptionSettingsRequest(
        mode="auto-on",
        closed_caption_mode="auto-on",
        language="fr"
    ),
))

# 2. create a call and set its language to "fr"
call.get_or_create(
    data=CallRequest(
        created_by_id="user-id",
        settings_override=CallSettingsRequest(
            transcription=TranscriptionSettingsRequest(
                mode="auto-on",
                closed_caption_mode="auto-on",
                language="fr"
            ),
        ),
    )
)

# 3. update an existing call and set its
call.update(
    settings_override=CallSettingsRequest(
        transcription=TranscriptionSettingsRequest(
            mode="auto-on",
            closed_caption_mode="auto-on",
            language="it"
        ),
    )
)

# 4. start transcription and set language to "fr"
call.start_transcription(language="fr")

//set the language for all calls of the default type to "fr"
client.Video().UpdateCallType(ctx, "default", &getstream.UpdateCallTypeRequest{
    Settings: &getstream.CallSettingsRequest{
        Transcription: &getstream.TranscriptionSettingsRequest{
            Language: getstream.PtrTo("fr"),
        },
    },
})

// create a call and set its language to "fr"
call.GetOrCreate(ctx, &getstream.CallRequest{
    SettingsOverride: &getstream.CallSettingsRequest{
        Transcription: &getstream.TranscriptionSettingsRequest{
            Language: getstream.PtrTo("fr"),
        },
    },
})

// update an existing call and set its language to "fr"
call.Update(ctx, &getstream.UpdateCallRequest{
    SettingsOverride: &getstream.CallSettingsRequest{
        Transcription: &getstream.TranscriptionSettingsRequest{
            Language: getstream.PtrTo("fr"),
        },
    },
})

// start transcription and set language to "fr"
call.StartTranscription(ctx, &getstream.StartTranscriptionRequest{
    Language: getstream.PtrTo("fr"),
})

# TODO

List call transcriptions

Note: transcriptions stored on Stream’s S3 bucket (the default) will be returned with a signed URL.

call.listTranscriptions();

call.list_transcriptions()

call.ListTranscriptions(ctx, &getstream.ListTranscriptionsRequest{})

curl "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/transcriptions?api_key=${API_KEY}" \
    -H "Authorization: ${TOKEN}" \
    -H "stream-auth-type: jwt"

Delete call transcription

This endpoint allows to delete call transcription. Please note that transcriptions will be deleted only if they are stored on Stream side (default).

An error will be returned if the transcription doesn’t exist.

call.deleteTranscription({ session: "<session ID>", filename: "<filename>" });

call.delete_transcription(sessionID, filename)

call.DeleteTranscription(ctx, sessionID, filename, &getstream.DeleteTranscriptionRequest{})

curl -X DELETE "https://video.stream-io-api.com/video/call/${CALL_TYPE}/${CALL_ID}/${SESSION_ID}/transcriptions/${FILENAME}?api_key=${API_KEY}" \
     -H "Authorization: ${JWT_TOKEN}" \
     -H "stream-auth-type: jwt"

Events

These events are sent to users connected to the call and your webhook/SQS:

call.transcription_started sent when the transcription of the call has started
call.transcription_stopped this event is sent only when the transcription is explicitly stopped through an API call, not in cases where the transcription process encounters an error.
call.transcription_ready dispatched when the transcription is completed and available for download. An example payload of this event is detailed below.
call.transcription_failed sent if the transcription process encounters any issue
call.closed_captions_started sent when captioning has started
call.closed_caption an event containing transcribed speech from a participant
call.closed_captions_stopped sent when captioning is stopped
call.closed_captions_failed sent when the captioning process encounters any issue

Transcription JSONL file format

The transcription file is a JSONL, where each line is a JSON object containing a speech fragment, and each speech fragment contains timing and user information. It is trivial to convert this JSONL format to other simpler formats such as SRT.

{"type":"speech", "start_time": "2024-02-28T08:18:18.061031795Z", "stop_time":"2024-02-28T08:18:22.401031795Z", "speaker_id": "Sacha_Arbonel", "text": "hello"}
{"type":"speech", "start_time": "2024-02-28T08:18:22.401031795Z", "stop_time":"2024-02-28T08:18:26.741031795Z", "speaker_id": "Sacha_Arbonel", "text": "how are you"}
{"type":"speech", "start_time": "2024-02-28T08:18:26.741031795Z", "stop_time":"2024-02-28T08:18:31.081031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm good"}
{"type":"speech", "start_time": "2024-02-28T08:18:31.081031795Z", "stop_time":"2024-02-28T08:18:35.421031795Z", "speaker_id": "Tommaso_Barbugli", "text": "how about you"}
{"type":"speech", "start_time": "2024-02-28T08:18:35.421031795Z", "stop_time":"2024-02-28T08:18:39.761031795Z", "speaker_id": "Sacha_Arbonel", "text": "I'm good too"}
{"type":"speech", "start_time": "2024-02-28T08:18:39.761031795Z", "stop_time":"2024-02-28T08:18:44.101031795Z", "speaker_id": "Tommaso_Barbugli", "text": "that's great"}
{"type":"speech", "start_time": "2024-02-28T08:18:44.101031795Z", "stop_time":"2024-02-28T08:18:48.441031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm glad to hear that"}

User Permissions

The following permissions are available to grant/restrict access to this functionality when used client-side.

StartTranscription required to start the transcription
StopTranscription required to stop the transcription
ListTranscriptions required to retrieve the list of transcriptions
StartClosedCaptions required to start closed captions
StopClosedCaptions required to stop closed captions

Enabling, disabling, automatically start

Transcriptions and closed captions can be configured from the Dashboard (see the call type settings) or directly via the API. It is also possible to change the transcription settings for a call and override the default settings that come from its call type.

// Disable on call level
call.update({
  settings_override: {
    transcription: {
      mode: "disabled",
      closed_caption_mode: "disabled",
    },
  },
});

// Disable on call type level
client.video.updateCallType({
  name: "<call type name>",
  settings: {
    transcription: {
      language: "en",
      mode: "disabled",
      closed_caption_mode: "disabled",
    },
  },
});

// Enable
call.update({
  settings_override: {
    transcription: {
      language: "en",
      mode: "available",
      closed_caption_mode: "available",
    },
  },
});

// Other settings
call.update({
  settings_override: {
    transcription: {
      language: "en",
      quality: "auto-on",
      closed_caption_mode: "auto-on",
    },
  },
});

# Disable on call level
call.update(
    settings_override=CallSettingsRequest(
        transcription=TranscriptionSettingsRequest(
            mode="disabled",
            closed_caption_mode="disabled",
            language="en",
        ),
    ),
)

# Disable on call type level
call_type_name = "default"
client.video.update_call_type(call_type_name,
    settings=CallSettingsRequest(
        transcription=TranscriptionSettingsRequest(
            mode="disabled",
            closed_caption_mode="disabled",
            language="en",
        ),
    ),
)

# Automatically transcribe calls
client.video.update_call_type(
    settings=CallSettingsRequest(
        transcription=TranscriptionSettingsRequest(
            mode="disabled",
            closed_caption_mode="disabled",
            language="en",
        ),
    ),
)

# Enable
client.update(
    settings_override=CallSettingsRequest(
        transcription=TranscriptionSettingsRequest(
            mode="available",
            closed_caption_mode="available",
            language="en",
        ),
    ),
)

// Disable on call level
call.Update(ctx, &getstream.UpdateCallRequest{
  SettingsOverride: &getstream.CallSettingsRequest{
    Transcription: &getstream.TranscriptionSettingsRequest{
      Mode: "disabled",
    },
  },
})

// Disable on call type level
call_type_name := "default"

// Disable transcription
_, err := client.Video().UpdateCallType(ctx, call_type_name, &getstream.UpdateCallTypeRequest{
  Settings: &getstream.CallSettingsRequest{
    Transcription: &getstream.TranscriptionSettingsRequest{
      Mode: "disabled",
    },
  },
})

// Automatically transcribe calls
_, err = client.Video().UpdateCallType(ctx, call_type_name, &getstream.UpdateCallTypeRequest{
  Settings: &getstream.CallSettingsRequest{
    Transcription: &getstream.TranscriptionSettingsRequest{
      Mode: "auto-on",
    },
  },
})

// Enable transcription (available)
call := client.Video().Call("call_type", "call_id")
_, err = call.Update(ctx, &getstream.UpdateCallRequest{
  SettingsOverride: &getstream.CallSettingsRequest{
    Transcription: &getstream.TranscriptionSettingsRequest{
      Mode: "available",
    },
  },
})

# Disable on call level
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
  -H "Authorization: ${TOKEN}" \
  -H "stream-auth-type: jwt" \
  -H "Content-Type: application/json" \
  -d '{
  "settings_override": {
      "transcription": {
        "mode": "disabled"
      }
    }
  }'

# Disable on call type level
curl -X PUT "https://video.stream-io-api.com/api/v2/video/calltypes/${CALL_TYPE_NAME}?api_key=${API_KEY}" \
  -H "Authorization: ${TOKEN}" \
  -H "stream-auth-type: jwt" \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "transcription": {
        "mode": "disabled"
      }
    }
  }'

# Enable on call level
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
  -H "Authorization: ${TOKEN}" \
  -H "stream-auth-type: jwt" \
  -H "Content-Type: application/json" \
  -d '{
  "settings_override": {
      "transcription": {
        "mode": "available"
      }
    }
  }'

# Other settings
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
  -H "Authorization: ${TOKEN}" \
  -H "stream-auth-type: jwt" \
  -H "Content-Type: application/json" \
  -d '{
  "settings_override": {
      "transcription": {
        "mode": "available",
        "audio_only": false,
        "quality": "auto_on"
      }
    }
  }'

By default the transcriptions are stored on Stream’s S3 bucket and retained for 2-weeks. You can also configure your application to have transcriptions stored on your own external storage, see the storage section of tis document for more detail.

Supported languages

English (en) - default
French (fr)
Spanish (es)
German (de)
Italian (it)
Dutch (nl)
Portuguese (pt)
Polish (pl)
Catalan (ca)
Czech (cs)
Danish (da)
Greek (el)
Finnish (fi)
Indonesian (id)
Japanese (ja)
Russian (ru)
Swedish (sv)
Tamil (ta)
Thai (th)
Turkish (tr)
Hungarian (hu)
Romanian (to)
Chinese (zh)
Arabic (ar)
Tagalog (tl)
Hebrew (he)
Hindi (hi)
Croatian (hr)
Korean (ko)
Malay (ms)
Norwegian (no)
Ukrainian (uk)

Storage