Introduction to Integrations
Stream’s Python AI SDK ships with a growing catalogue of plugins that connect third-party AI services to your live video calls. Each plugin wraps a specific AI provider and exposes a unified API so you can swap vendors without rewriting business logic.
Why Use an Plugin
- Quality & speed: Leverage specialised models (e.g. Deepgram’s high-accuracy STT, ElevenLabs’ realistic TTS) without hosting anything yourself.
- Drop-in architecture: Plugins of the same type share the same life-cycle (e.g. for STT:
process_audio()
,on("transcript")
,close()
, …). You can chain them or replace them in minutes. - Runs inside your call: The SDK streams PCM frames directly to the provider in real-time, then emits SDK events that can be listened for and acted upon.
What can you build?
Capability | Example providers |
---|---|
Speech-to-Text (STT) | Deepgram, Moonshine (local) |
Text-to-Speech (TTS) | ElevenLabs, Kokoro |
Voice Activity Detection (VAD) | Silero |
Combine them to create richer pipelines; e.g. VAD → STT → Moderation → TTS for a real-time, policy-aware voice agent. For an example of this, see our “building an LLM conversation pipeline” tutorial.
See the other pages in this section for our individual third-party integrations. We’ll add more over time, and you can even write your own!
Creating Your Own Plugins
You can absolutely write your own plugins to connect other AI providers to the AI Python SDK! Follow this guide to learn how.
- I'm working with the Stream Video Python AI SDK and would like to ask questions about this documentation page: https://getstream.io/video/docs/python-ai/integrations.md
- View as markdown
- Open in ChatGPT
- Open in Claude