Deepgram

Deepgram is a powerful Speech-to-Text (STT) platform that provides fast, accurate, and customizable transcription services. It’s designed for real-time and batch audio processing, with support for features like word-level timestamps, speaker diarization, and multilingual transcription.

The Deepgram plugin in the Stream Python AI SDK enables real-time transcription of voice input, making it ideal for voice agents, call analysis, meeting transcriptions, and more.

Installation

Install the Stream Deepgram plugin with

uv add getstream-plugins-deepgram

Example

Check out our Deepgram example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.

Initialisation

The Deepgram plugin for Stream exists in the form of the DeepgramSTT class:

from getstream.plugins.deepgram import DeepgramSTT

stt = DeepgramSTT()

To initialise without passing in the API key, make sure the `DEEPGRAM_API_KEY` is available as an environment variable. You can do this either by defining it in a `.env` file or exporting it directly in your terminal.

Parameters

These are the parameters available in the DeepgramSTT plugin for you to customise:

Name	Type	Default	Description
`api_key`	`str` or `None`	`None`	Your Deepgram API key. If not provided, the plugin will use the `DEEPGRAM_API_KEY` environment variable.
`options`	`LiveOptions` or `None`	`None`	Optional Deepgram configuration options, such as tier, model, or features like punctuation or diarization.
`sample_rate`	`int`	`48000`	The sample rate (in Hz) of the audio stream being transcribed.
`language`	`str`	`"en-US"`	Language code for transcription.
`keep_alive_interval`	`float`	`3.0`	Interval (in seconds) for sending keep-alive messages to maintain the WebSocket connection.
`interim_results`	`bool`	`True`	Whether to receive `partial_transcript` events when speaking.

Functionality

Process Audio

Once you join the call, you can listen to the connection for audio events. You can then pass along the audio events for the STT class to process:

from getstream.video import rtc

async with rtc.join(call, bot_user_id) as connection:

    @connection.on("audio")
    async def on_audio(pcm: PcmData, user):
        # Process audio through Deepgram STT
        await stt.process_audio(pcm, user)

Events

Transcript Event

The transcript even is triggered when a final transcript is available from Deepgram:

@stt.on("transcript")
async def on_transcript(text: str, user: any, metadata: dict):
    # Process transcript event here

Partial Transcript Event

The partial transcript event is fired in real time as Deepgram generates intermediate (partial) transcriptions:

@stt.on("partial_transcript")
async def on_partial_transcript(text: str, user: any, metadata: dict):
    # Process partial transcript event here

Error Event

If an error occurs, an error event is fired:

@stt.on("error")
async def on_stt_error(error):
    # Process error event here

Close

You can close the STT connection with the close() method:

stt.close()

Cartesia

ElevenLabs