Call Transcriptions
You can transcribe calls to text using API calls or configure your call types to be automatically transcribed. When transcription is enabled automatically, the transcription process will start when the first user joins the call, and then stop when all participant have left the call.
Transcriptions are structured as plain-text JSONL files and automatically uploaded to Stream managed storage or to your own configurable storage. Websocket and webhook events are also sent when transcription starts, stops and completes.
Stream supports transcribing calls in multiple languages as well as transcriptions for closed captions. You can find more information about both later in this document.
Note we transcribe 1 dominant speaker and 2 other participants at a time
Quick Start
- JavaScript
- Python
- cURL
// starts transcription
call.startTranscription();
// stops the transcription for the call
call.stopTranscription();
# starts transcribing
call.start_transcription()
# stops the transcription for the call
call.stop_transcription()
curl -X POST "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/start_transcription?api_key=${API_KEY}"\
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt"
curl -X POST "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/stop_transcription?api_key=${API_KEY}"\
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt"
By default the transcriptions are stored on Stream’s S3 bucket and retained for 2-weeks. You can also configure your application to have transcriptions stored on your own external storage, see the storage section of tis document for more detail.
Note: While transcription occurs continuously during the call, and chunks of conversations are continuously saved, the complete transcription file is only uploaded once at the end of the call. This approach is used to avoid requiring additional permissions (such as delete permissions) when using external storage.
List call transcriptions
Note: transcriptions stored on Stream’s S3 bucket (the default) will be returned with a signed URL.
- JavaScript
- Python
- cURL
call.listTranscriptions();
call.list_transcriptions()
curl "https://video.stream-io-api.com/api/v2/video/call/default/${CALL_ID}/transcriptions?api_key=${API_KEY}" \
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt"
Delete call transcription
This endpoint allows to delete call transcription. Please note that transcriptions will be deleted only if they are stored on Stream side (default).
An error will be returned if the transcription doesn't exist.
- JavaScript
- Python
- cURL
call.deleteTranscription({ session: '<session ID>', filename: '<filename>' });
call.delete_transcription(sessionID, filename)
curl -X DELETE "https://video.stream-io-api.com/video/call/${CALL_TYPE}/${CALL_ID}/${SESSION_ID}/transcriptions/${FILENAME}?api_key=${API_KEY}" \
-H "Authorization: ${JWT_TOKEN}" \
-H "stream-auth-type: jwt"
Events
These events are sent to users connected to the call and your webhook/SQS:
call.transcription_started
sent when the transcription of the call has startedcall.transcription_stopped
this event is sent only when the transcription is explicitly stopped through an API call, not in cases where the transcription process encounters an error.call.transcription_ready
dispatched when the transcription is completed and available for download. An example payload of this event is detailed below.call.transcription_failed
sent if the transcription process encounters any issues.
transcription.ready
event example
{
"type": "call.transcription_ready",
"created_at": "2024-03-18T08:24:14.769328551Z",
"call_cid": "default:mkzN17EUrgvn",
"call_transcription": {
"filename": "transcript_default_mkzN17EUrgvn_1710750207642.jsonl",
"url": "https://frankfurt.stream-io-cdn.com/1129528/video/transcriptions/default_mkzN17EUrgvn/transcript_default_mkzN17EUrgvn_1710750207642.jsonl?Expires=1710751154&Signature=OhdoTClQm5MT8ITPLAEJcKNflsJ7B2G3j7kx~kQyPrAETftrM2rzZy4IIT1XIC~8MrbPduWcj1tILXoSg3ldfZEHWRPqeMFr0caljPAVAL~mybUb4Kct2JoPjfsYfmj4FzSQbT7Iib38qPr7uiP0axTFm0VKRenkNwwCoS0F858u9Mdr8r6fTzILhiOZ1hOjw3V-TT1YbR20Yn4abKi6i50GAs5fqUDtSlo9DmEJgcS79Y0wUD1g18cGZvg3NiH3ogHQnmvoNrf28Cxc0JhBCe4wFErCMJ3pinewEOwDEEOMdHcRtcfWy72w6MTEwi0yomHYIU5flaYgUXCkkOJODw__&Key-Pair-Id=APKAIHG36VEWPDULE23Q",
"start_time": "2024-03-18T08:23:27.642688204Z",
"end_time": "2024-03-18T08:24:14.754731786Z"
},
"received_at": "2024-03-18T08:24:14.790Z"
}
Transcription JSONL file format
{"type":"speech", "start_time": "2024-02-28T08:18:18.061031795Z", "stop_time":"2024-02-28T08:18:22.401031795Z", "speaker_id": "Sacha_Arbonel", "text": "hello"}
{"type":"speech", "start_time": "2024-02-28T08:18:22.401031795Z", "stop_time":"2024-02-28T08:18:26.741031795Z", "speaker_id": "Sacha_Arbonel", "text": "how are you"}
{"type":"speech", "start_time": "2024-02-28T08:18:26.741031795Z", "stop_time":"2024-02-28T08:18:31.081031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm good"}
{"type":"speech", "start_time": "2024-02-28T08:18:31.081031795Z", "stop_time":"2024-02-28T08:18:35.421031795Z", "speaker_id": "Tommaso_Barbugli", "text": "how about you"}
{"type":"speech", "start_time": "2024-02-28T08:18:35.421031795Z", "stop_time":"2024-02-28T08:18:39.761031795Z", "speaker_id": "Sacha_Arbonel", "text": "I'm good too"}
{"type":"speech", "start_time": "2024-02-28T08:18:39.761031795Z", "stop_time":"2024-02-28T08:18:44.101031795Z", "speaker_id": "Tommaso_Barbugli", "text": "that's great"}
{"type":"speech", "start_time": "2024-02-28T08:18:44.101031795Z", "stop_time":"2024-02-28T08:18:48.441031795Z", "speaker_id": "Tommaso_Barbugli", "text": "I'm glad to hear that"}
User Permissions
The following permissions are available to grant/restrict access to this functionality when used client-side.
StartTranscription
required to start the transcriptionStopTranscription
required to stop the transcriptionListTranscriptions
required to retrieve the list of transcriptionss
Enabling / Disabling call transcription
Transcription can be configured from the Dashboard (see call type screen) or directly via the API. It is also possible to change the transcription settings for a call and override the default settings coming from the its call type.
- JavaScript
- Python
- cURL
- JavaScript@0.3 (deprecated)
// Disable on call level
call.update({
settings_override: {
transcription: {
mode: 'disabled',
},
},
});
// Disable on call type level
client.video.updateCallType({
name: '<call type name>',
settings: {
transcription: {
mode: 'disabled',
},
},
});
// Enable
call.update({
settings_override: {
transcription: {
mode: 'available',
},
},
});
// Other settings
call.update({
settings_override: {
transcription: {
audio_only: false,
quality: 'auto-on',
},
},
});
# Disable on call level
call.update(
settings_override=CallSettingsRequest(
transcription=TranscriptionSettingsRequest(
mode='disabled',
),
),
)
# Disable on call type level
call_type_name = 'default'
client.video.update_call_type(call_type_name,
settings=CallSettingsRequest(
transcription=TranscriptionSettingsRequest(
mode='disabled',
),
),
)
# Automatically transcribe calls
client.video.update_call_type(
settings=CallSettingsRequest(
transcription=TranscriptionSettingsRequest(
mode='auto-on',
),
),
)
# Enable
client.update(
settings_override=CallSettingsRequest(
transcription=TranscriptionSettingsRequest(
mode='available',
),
),
)
# Disable on call level
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt" \
-H "Content-Type: application/json" \
-d '{
"settings_override": {
"transcription": {
"mode": "disabled"
}
}
}'
# Disable on call type level
curl -X PUT "https://video.stream-io-api.com/api/v2/video/calltypes/${CALL_TYPE_NAME}?api_key=${API_KEY}" \
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"transcription": {
"mode": "disabled"
}
}
}'
# Enable on call level
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt" \
-H "Content-Type: application/json" \
-d '{
"settings_override": {
"transcription": {
"mode": "available"
}
}
}'
# Other settings
curl -X PATCH "https://video.stream-io-api.com/api/v2/video/call/${CALL_TYPE_NAME}/${CALL_ID}?api_key=${API_KEY}" \
-H "Authorization: ${TOKEN}" \
-H "stream-auth-type: jwt" \
-H "Content-Type: application/json" \
-d '{
"settings_override": {
"transcription": {
"mode": "available",
"audio_only": false,
"quality": "auto_on"
}
}
}'
By default the transcriptions are stored on Stream’s S3 bucket and retained for 2-weeks. You can also configure your application to have transcriptions stored on your own external storage, see the storage section of tis document for more detail.
// Disable on call level
call.update({
settings_override: {
transcription: {
mode: VideoTranscriptionSettingsRequestModeEnum.DISABLED,
},
},
});
// Disable on call type level
client.video.updateCallType('<call type name>', {
settings: {
transcription: {
mode: VideoTranscriptionSettingsModeEnum.DISABLED,
},
},
});
// Enable
call.update({
settings_override: {
transcription: {
mode: VideoTranscriptionSettingsRequestModeEnum.AVAILABLE,
},
},
});
// Other settings
call.update({
settings_override: {
transcription: {
audio_only: false,
quality: VideoTranscriptionSettingsRequestQualityEnum.AUTO_ON,
},
},
});
Multi language support
When using out of the box, transcriptions are optimized for calls with english speakers. You can configure call transcription to optimize for a different language than english. You can also specify as secondary language as well if you expect to have two languages used simultaneously in the same call.
Please note: the call transcription feature does not perform any language translation. When you select a different language, the trascription process will simply improve the speech-to-text detection for that language.
You can set the transcription languages in two ways: either as a call setting or you can provide them to the StartTranscription
API call. Languages are specified using their international language code (ISO639)
Please note: we currently don’t support changing language settings during the call.
Supported languages
- English (en) - default
- French (fr)
- Spanish (es)
- German (de)
- Italian (it)
- Dutch (nl)
- Portuguese (pt)
- Polish (pl)
- Catalan (ca)
- Czech (cs)
- Danish (da)
- Greek (el)
- Finnish (fi)
- Indonesian (id)
- Japanese (ja)
- Russian (ru)
- Swedish (sv)
- Tamil (ta)
- Thai (th)
- Turkish (tr)
- Hungarian (hu)
- Romanian (to)
- Chinese (zh)
- Arabic (ar)
- Tagalog (tl)
- Hebrew (he)
- Hindi (hi)
- Croatian (hr)
- Korean (ko)
- Malay (ms)
- Norwegian (no)
- Ukrainian (uk)