Introduction

The Stream Python AI SDK helps you build real-time, voice-enabled applications without having to piece together low-level audio tools or complex AI services on your own. It’s a bridge between your Stream video calls and AI services that offer features you want to integrate in your video or voice application.

It handles things like transcription, speech synthesis and voice activity detection, so you can focus on building great user experiences!

The Stream Python AI SDK uses webRTC to allow you to stream your video and audio calls to and from an AI backend. While the core Stream Python SDK provides functionality for video call management, chat session handling, and user authentication, the AI SDK adds advanced capabilities tailored for video AI applications.

The SDK uses a plugin-based architecture, which means you can mix and match the functionality you want from the providers you want, so you can use services like Deepgram for transcription, ElevenLabs for realistic voice output, Silero for detecting speech, and more. It’s designed to be flexible, so you can swap providers or customize behavior based on your use case.

Explanation that the Stream Python AI SDK is a subset of the overall Python SDK. It uses webRTC and communicates with external AI providers via plugins.

Features

  1. Text-To-Speech (TTS): Convert text into natural-sounding speech using models from providers like ElevenLabs, Cartesia, and Kokoro for realistic voice synthesis.
  2. Speech-To-Text (STT): Transcribe real-time audio into text with high accuracy using models from providers like Deepgram and Moonshine for live transcription during calls.
  3. Speech-To-Speech (STS): Build conversational AI agents that can listen, process, and respond with synthesized speech in real-time using providers like OpenAI.
  4. Voice Activity Detection (VAD): Detect when someone is speaking using models from providers like Silero to identify speech segments and trigger AI processing.
  5. MCP Support: Add your MCP server into your Python AI SDK logic, meaning your AI models can call any tools you like!
  6. Video Call Recording: Record audio and video from your Stream video calls, either composed into a single audio/video stream or split into each individual participant’s streams.
  7. Events System: Listen for key SDK events to run your custom code.

Next Steps

To get started immediately, check out the quickstart tutorial where you will create a “Hello World” project using the Python AI SDK.

You can find the GitHub repository for the Python AI SDK here.

If you feel like anything is missing or could be improved, please don’t hesitate to contact us. We ❤️ to hear your feedback!

© Getstream.io, Inc. All Rights Reserved.