Introduction

The Stream Python AI SDK helps you build real-time, voice-enabled applications without having to piece together low-level audio tools or complex AI services on your own. Consider this SDK a bridge between your Stream calls and AI services that offer features you want to integrate in your video or voice application. Whether you’re building a voice assistant, a meeting bot, or just experimenting with speech interfaces, this SDK gives you everything you need to get up and running quickly. It handles things like transcription, speech synthesis, and voice activity detection, so you can focus on building great user experiences.

At its core, the SDK uses a plugin-based architecture, which means you can mix and match services like Deepgram for transcription, ElevenLabs for realistic voice output, Silero for detecting speech, and more. It’s designed to be flexible, so you can swap providers or customize behavior based on your use case.

Features

  1. Text-To-Speech (TTS): Convert text into natural-sounding speech using providers like ElevenLabs, Cartesia, and Kokoro for realistic voice synthesis.
  2. Speech-To-Text (STT): Transcribe real-time audio into text with high accuracy using providers like Deepgram and Moonshine for live transcription during calls.
  3. Speech-To-Speech (STS): Build conversational AI agents that can listen, process, and respond with synthesized speech in real-time using providers like OpenAI.
  4. Voice Activity Detection (VAD): Detect when someone is speaking using providers like Silero to identify speech segments and trigger AI processing.

To get started immediately, you can check out the quickstart guide which creates a project detailing basic call setup and plugin use.

Additionally, you can find the GitHub repository for the Python AI SDK here.

If you feel like anything is missing or could be improved, please don’t hesitate to contact us. We’re happy to help.

© Getstream.io, Inc. All Rights Reserved.