Build multi-modal AI applications using our new open-source Vision AI SDK.

Voice AI | Now in Early Access

Voice AI
Accelerated.

Skill, prompt, agent.
Production-grade voice agents on global edge. Pick any STT, LLM and TTS. Connect your data over MCP. Ship to phone, web, mobile and video.

$npx skills add GetStream/agent-skills -s stream
~240 ms
P50 voice round-trip
60+
Edge locations
99.999%
Uptime SLA
Build voice agents for
Restaurant bookingsHotel conciergesTelehealth intakeCustomer supportSales outreachBanking & insuranceField service dispatchRecruiting screensInsurance claimsRestaurant bookingsHotel conciergesTelehealth intakeCustomer supportSales outreachBanking & insuranceField service dispatchRecruiting screensInsurance claims
01Install the skill

Drop the Voice AI skill into your project.

$npx skills add GetStream/agent-skills -s stream
Claude CodeCursorCodex
02Prompt

Tell your agent what to do, in plain English.

"I'd like a voice agent attached to a new phone number in the US to handle my restaurant bookings. Set up an agent that can handle calls and manage reservations."

Agent scaffolded| agent.py generated | phone number provisioned
R1Result | Live demo

See the agent take a call.

A simulated reservation call that mirrors a deployed Stream agent: watch it greet the caller, capture the booking, and confirm by SMS in real time.

Ready
Demoagent-bella-01
Incoming call
+1 (415) 555-0142
00:00 | us-west-1 edge
Real-time metrics
healthy
Round-trip
-
STT first byte
-
LLM first tok.
-
TTS first byte
-
PipelineSTT -> LLM -> TTS | co-located
Reservation captured
Name-
Party-
Date-
Notes-
Status-
R2Result | Generated agent

One file. One function. Production-ready.

Scaffold your agent with the Stream skills: hosted STT, LLM and TTS models, or bring your own, and deploy to Stream's infra.

agent.py~12 lines
agent.pypython
async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=stream.Edge(),  # low-latency edge: React, iOS, Android, RN, Flutter
        agent_user=User(name="Assistant", id="agent"),
        instructions="You're a helpful voice assistant. Be concise.",
        realtime=stream.Realtime(
            model="models/gemini-3.5-flash",
            stt="models/inworld-stt",
            tts="models/inworld-tts",
            number="+1-800-my-number",
        ),
    )
R3Why it works well

Built on the same infrastructure as the rest of Stream.

Same edge, same SDKs, same dashboard you already trust for chat and video.

Global edge
Sessions route to the closest data center automatically. 60+ locations, sub-300 ms anywhere.
Any STT / LLM / TTS
Inworld, Deepgram, AssemblyAI, OpenAI, Gemini, Claude, ElevenLabs, Cartesia. Swap providers in a line.
MCP & RAG built-in
Plug your CRM, calendar, knowledge base or Postgres straight into the agent over MCP.
Voice anywhere
Phone numbers, browser, iOS, Android, video rooms: one agent, every channel, one transcript.
Smart co-location
STT, LLM and TTS run in the same region when the provider supports it. Fewer hops, lower P50.
HIPAA, SOC 2, GDPR
Telehealth-ready compliance posture. Per-region data residency on by default.
R4Observability

API & CLI driven. Beautiful monitoring out of the box.

Every call records latency, transcripts, model choice and cost. Replay any session, jump to the exact turn, and ship a fix without a debugger.

agent-bella-01 | us-west-1
CallsLatencyQualityCost
last 24hExport
Calls today
1,284+12.4%
Avg duration
1m 47s-4.1%
P50 round-trip
238 ms-18 ms
Resolution rate
92.3%+1.1%
End-to-end latency
238 ms P50
STTLLMTTSEdge
00:0006:0012:0018:00now0100200300400
Recent calls
c_8aF2+1 415 555 014202:14218 ms
c_8aE9+1 212 555 018801:08244 ms
c_8aDc+44 20 7946 072103:47312 ms
c_8aCa+1 510 555 010300:42201 ms
c_8aB6+1 415 555 014204:31229 ms
c_8aA1+1 646 555 019202:02246 ms
REST + gRPCEvery metric available as API
Slack / PagerDutyAlert on P95 latency drift
ReplaysStep through any turn in a call
Verifying it works

Ship voice agents you can trust in production.

Scenario tests, guardrails and replays, the same way you ship the rest of your backend.

QATesting scripts
Reservations | 12 scenarios
11 / 12 passing
Books a table for 2 on a weekday
1.8s
Handles 'do you have anything earlier?'
2.1s
Asks for dietary requirements
1.4s
Confirms via SMS in caller's number
0.9s
Politely declines off-menu requests
1.2s
Escalates when overbooked
1.6s
Handles caller switching to Spanishflaky
2.4s
Run on every PR | GitHub Actions11 / 12 | 91.7%
GRGuardrails
Live policy enforcement
2 triggered today
Never quote prices
Stops the agent committing to a number; defers to the manager.
PII redaction in logs
Phone numbers and emails masked in transcripts.
Off-topic deflection1x today
Politely steers caller back to reservations.
Profanity filter1x today
Single-strike warning, then hand-off.
Max 2-minute hold
Auto-callback if escalation queue is busy.
Get started

One CLI command from idea to phone call.

Free for the first 10,000 minutes a month. No credit card. Bring your own model keys, or use Stream's.

$stream agents deploy