Quickstart

Get up and running with the Stream Python AI SDK in just a few minutes.
This guide will walk you through building a simple, real-time voice assistant in just a few minutes.

You’ll learn how to:

Install the SDK
Initialise the Stream client and user
Set up a Stream Call
Create a basic speech-to-speech pipeline with the OpenAI plugin

By the end, you’ll have a working bot that listens and talks, as well as a foundation to build more advanced agents.

Installing the SDK

You can use pip or uv to install the SDK:

pip install --pre "getstream[plugins]"

# or using uv
uv add "getstream[plugins]" --prerelease=allow

Next, add a .env file at the root of your project containing the following properties:

STREAM_API_KEY=your-stream-api-key
STREAM_API_SECRET=your-stream-api-secret
STREAM_BASE_URL=https://pronto.getstream.io/
OPENAI_API_KEY=sk-your-openai-api-key

Initialising the Stream Client and User

First, you need to set up the Stream client. The client gets the API key and other values from environment values. Hence, we also need to load the environment values before creating the client.

from dotenv import load_dotenv
from getstream import Stream

# Load environment variables
load_dotenv()

# Initialize Stream client from ENV
client = Stream.from_env()

Set up a Stream Call

Before we create a call, we need to create two users - one for us and one for the OpenAI bot. We can generate random user IDs using uuid4() and then create the users using client.upsert_users().

from getstream.models import UserRequest

# Generates a new user ID and creates a new user
user_id = f"user-{uuid4()}"
client.upsert_users(UserRequest(id=user_id, name="My User"))

# We can use this later to join the call
user_token = client.create_token(user_id, expiration=3600)

# Generate a user ID for the OpenAI bot that is added later
bot_user_id = f"openai-realtime-speech-to-speech-bot-{uuid4()}"
client.upsert_users(UserRequest(id=bot_user_id, name="OpenAI Realtime Speech to Speech Bot"))

To create a call, we can generate an ID and then use client.video.call() to create the call data. We can then use call.get_or_create() to signal the backend to create the call.

from uuid import uuid4

# Create a call with a new generated ID
call_id = str(uuid4())
call = client.video.call("default", call_id)
call.get_or_create(data={"created_by_id": bot_user_id})

Creating a Speech-To-Speech pipeline using OpenAI

To initialise the OpenAI plugin, you can use the OpenAIRealtime class. You can provide the API key, model, instructions, and the default voice to use. The API key will be fetched from the aforementioned .env file by default.

sts_bot = OpenAIRealtime(
    model="gpt-4o-realtime-preview",
    instructions="You are a friendly assistant; reply in a concise manner.",
    voice="alloy",
)

You can connect to the call using sts_bot.connect() and passing in the call details and bot user ID. You can send a message from the human side of the conversation using sts_bot.send_user_message() method.

try:
    # Connect OpenAI bot
    async with await sts_bot.connect(call, agent_user_id=bot_user_id) as connection:

        # Sends a message to OpenAI from the user side
        await sts_bot.send_user_message("Give a very short greeting to the user.")

except Exception as e:
    # Handle exception
finally:
    # Delete users when done
    client.delete_users([user_id, bot_user_id])

Post adding this, you will have a video call with an OpenAI bot integrated.

Testing it out

If needed, you can open the call in a browser using this snippet:

import webbrowser
from urllib.parse import urlencode

base_url = f"{os.getenv('EXAMPLE_BASE_URL')}/join/"

# The token is the user token we generated from the client before.
params = {"api_key": client.api_key, "token": user_token, "skip_lobby": "true"}

url = f"{base_url}{call_id}?{urlencode(params)}"

try:
    webbrowser.open(url)
except Exception as e:
    print(f"Failed to open browser: {e}")
    print(f"Please manually open this URL: {url}")

Wrapping Up

Congratulations, you’ve now built a basic speech-to-speech pipeline with the Stream Python AI SDK and the OpenAI Plugin. Your voice assistant can join a call, process live audio, and respond using OpenAI in real time.

You can look at the detailed documentation for the OpenAI plugin to understand all the possible functionality here.

Troubleshooting & Feedback

If you run into any issues:

Double-check that your environment variables in the .env file are correct.
Ensure you’ve installed all the required dependencies.

Technical Overview

Text To Speech (TTS)