Real-Time Vision AI Agents

Multi-modal AI agents that see, hear, & remember.
Open-source. Edge-agnostic. Low-latency.

Github Repo Discord

Vision Agents Playground

VOICE AGENTS

Gemini 3.1 Flash Live x Stream's Vision Agents SDK

VIBEVOICE X STREAM

Multi-Speaker Podcast & Long-Form Conversational Audio Apps

Join our

Partner Ecosystem

Building models, tools, or platforms that work with real-time voice or video AI?

We’re actively adding first-party integrations, co-building, and co-marketing with partners.

Model providers (STT, TTS, LLM, STS etc.)
Competing video edge networks
Avatar, visual effect companies
Hosting, both AI and regular

Become a Vision Agents Partner

See Vision Agents in Action

Selling Assistant

Create a product page for selling a used item that includes a product image, title, description, and a suggested price.

View GitHub

Security Camera

Facial recognition, package detection, automated package theft response, and posting to X.

View GitHub

Video Content Moderation

Detect and censor offensive gestures, and give three verbal warnings before kicking the user out.

View GitHub

I Want a Plugin To...

Handle calls and respond naturally by voice

Realtime : End-to-end voice agent with multimodal support, unified under one plugin and model.

Connect to my own tools, APIs, or knowledge base

Language Models : Function calling, RAG, and full control over STT/TTS choices

Transcribe what users say in real time

Speech-to-Text : Streaming transcription, some with built-in turn detection

Give my agent a distinct, natural voice

Text-to-Speech : Cloud and local options, from expressive to ultra-low latency

See and understand what’s on camera

Vision & Video : Object detection, video analysis, and style transfer

Put a face on my agent

Avatars : Real-time lip-synced visual characters

Make conversations feel natural, not robotic

Turn Detection : Smart interruption handling and silence detection

Run open-source models on my own infrastructure

Infrastructure : Self-hosted inference, model routing, and vector search

Community & Open Source

Join the Community

Follow Stream on X, star the Vision Agents GitHub repo, and join the discussion on Discord to try demos, share feedback, and contribute.

Discord Vision Agents X

Real-Time Vision AI Agents

What Can You Build?

Voice Agents

Video Understanding

Video Restyling & Avatars

Custom Video Models

Tools, MCP & Phone Calling

Partner Ecosystem

See Vision Agents in Action

Selling Assistant

Security Camera

Video Content Moderation

I Want a Plugin To...

Handle calls and respond naturally by voice

Connect to my own tools, APIs, or knowledge base

Transcribe what users say in real time

Give my agent a distinct, natural voice

See and understand what’s on camera

Put a face on my agent

Make conversations feel natural, not robotic

Run open-source models on my own infrastructure

Join the Community