Introduction to Integrations

Stream’s Python AI SDK ships with a growing catalogue of plugins that connect third-party AI services to your live video calls. Each plugin wraps a specific AI provider and exposes a unified API so you can swap vendors without rewriting business logic.

Why Use an Plugin

Quality & speed: Leverage specialised models (e.g. Deepgram’s high-accuracy STT, ElevenLabs’ realistic TTS) without hosting anything yourself.
Drop-in architecture: Plugins of the same type share the same life-cycle (e.g. for STT: process_audio(), on("transcript"), close(), …). You can chain them or replace them in minutes.
Runs inside your call: The SDK streams PCM frames directly to the provider in real-time, then emits SDK events that can be listened for and acted upon.

What can you build?

Capability	Example providers
Speech-to-Text (STT)	Deepgram, Moonshine (local)
Text-to-Speech (TTS)	ElevenLabs, Kokoro
Voice Activity Detection (VAD)	Silero

Combine them to create richer pipelines; e.g. VAD → STT → Moderation → TTS for a real-time, policy-aware voice agent. For an example of this, see our “building an LLM conversation pipeline” tutorial.

See the other pages in this section for our individual third-party integrations. We’ll add more over time, and you can even write your own!

Creating Your Own Plugins

You can absolutely write your own plugins to connect other AI providers to the AI Python SDK! Follow this guide to learn how.

Moderating Video Calls

Cartesia