Build multi-modal AI applications using our new open-source Vision AI SDK.
Skill, prompt, agent.
Production-grade voice agents on global edge. Pick any STT, LLM and TTS. Connect your data over MCP. Ship to phone, web, mobile and video.
npx skills add GetStream/agent-skills -s streamnpx skills add GetStream/agent-skills -s stream"I'd like a voice agent attached to a new phone number in the US to handle my restaurant bookings. Set up an agent that can handle calls and manage reservations."
A simulated reservation call that mirrors a deployed Stream agent: watch it greet the caller, capture the booking, and confirm by SMS in real time.
Scaffold your agent with the Stream skills: hosted STT, LLM and TTS models, or bring your own, and deploy to Stream's infra.
async def create_agent(**kwargs) -> Agent:
return Agent(
edge=stream.Edge(), # low-latency edge: React, iOS, Android, RN, Flutter
agent_user=User(name="Assistant", id="agent"),
instructions="You're a helpful voice assistant. Be concise.",
realtime=stream.Realtime(
model="models/gemini-3.5-flash",
stt="models/inworld-stt",
tts="models/inworld-tts",
number="+1-800-my-number",
),
)Same edge, same SDKs, same dashboard you already trust for chat and video.
Every call records latency, transcripts, model choice and cost. Replay any session, jump to the exact turn, and ship a fix without a debugger.
Scenario tests, guardrails and replays, the same way you ship the rest of your backend.
Free for the first 10,000 minutes a month. No credit card. Bring your own model keys, or use Stream's.
stream agents deploy