Speech To Speech (STS)

Speech-to-Speech (STS) is the ultimate conversational AI experience: an AI listens to what you say, understands it, thinks about a response, and then speaks back to you naturally.

STS enables hands-free, voice-driven interactions between users and AI systems.

How does speech to speech work?

  1. Listen - Convert speech to text
  2. Understand - Process the meaning
  3. Think - Generate a response
  4. Speak - Convert text back to speech
  5. Respond - Talk back to you

STS Basics

How does it work with Stream?

The Stream Python AI SDK simplifies this entire process by providing a unified system that handles the conversation flow seamlessly within your calls. Instead of building complex pipelines that connect multiple services, you get everything you need in one integrated solution.

Here’s how it works in your Stream calls:

  1. Choose Your AI: Pick an AI model for intelligent, context-aware conversations.

  2. Configure Personality: Set up how your AI should behave.

  3. Start Conversations: Users can simply start talking, and your AI will listen, process, and respond naturally through the call.

  4. Real-time Interaction: The entire conversation happens in real-time, with minimal delay between what users say and how the AI responds.

  5. Seamless Integration: Everything works within your existing Stream call—no separate audio channels or complex routing needed.

STS with Stream

Worked example

Let’s walk through a real-world scenario to see how STS creates magical conversational experiences.

Imagine you’re building a virtual meeting assistant that helps teams stay organized and productive. Here’s how STS makes this possible:

The Scenario: A team meeting where the AI assistant helps manage the agenda and take notes.

What Happens:

  1. The meeting starts, and someone says “Hey assistant, can you help us stay on track today?”

  2. The AI responds naturally: “Of course! I’m here to help. I can take notes, track action items, and keep us on schedule. What’s on the agenda today?”

  3. A team member says “We need to discuss the Q3 budget and plan the product launch.”

  4. The AI processes this and responds: “Great! I’ll create agenda items for budget discussion and product launch planning. I’ll also track any decisions and action items we make. Should we start with the budget?”

  5. Throughout the meeting, the AI can interject with helpful reminders: “We have 10 minutes left for the budget discussion. Should we move to the product launch planning?”

The Result: Instead of a passive note-taker, you have an intelligent meeting participant that actively helps the team stay organized, on track, and productive—all through natural conversation.

© Getstream.io, Inc. All Rights Reserved.